You are on page 1of 34

UNSW Business School

School of Economics

Introductory Econometrics
ECON2206
Tutorial Program
Sample Answers

Session 2, 2015

Table of Contents
Week 2 Tutorial Exercises ............................................................................................................................................................................................... 3
Problem Set ...................................................................................................................................................................................................................... 3

STATA Hints ..................................................................................................................................................................................................................... 4

Week 3 Tutorial Exercises ............................................................................................................................................................................................... 5


Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 5

Problem Set ...................................................................................................................................................................................................................... 5

Week 4 Tutorial Exercises ............................................................................................................................................................................................... 7


Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 7

Problem Set (these will be discussed in tutorial classes) .......................................................................................................................... 8

Week 5 Tutorial Exercises ............................................................................................................................................................................................... 9


Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 9

Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 10

Week 6 Tutorial Exercises ............................................................................................................................................................................................ 11


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 11
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 11

Week 7 Tutorial Exercises ............................................................................................................................................................................................ 15


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 15

Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 15

Week 8 Tutorial Exercises ............................................................................................................................................................................................ 17


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 17
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 17

Week 9 Tutorial Exercises ............................................................................................................................................................................................ 20


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 20
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 20

Week 10 Tutorial Exercises ......................................................................................................................................................................................... 23

Readings .......................................................................................................................................................................................................................... 23
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 23
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 24

Week 11 Tutorial Exercises ......................................................................................................................................................................................... 26


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 26
Problem Set ................................................................................................................................................................................................................... 27

Week 12 Tutorial Exercises ......................................................................................................................................................................................... 29

Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 29
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 30

Week 13 Tutorial Exercises ......................................................................................................................................................................................... 32


Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 32
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 33

Week 2 Tutorial Exercises


Problem Set
Q1. Wooldridge 1.1
(i) Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics such
as ability and family background. For reasons we will see in Chapter 2, we would like
substantial variation in class sizes (subject, of course, to ethical considerations and resource
constraints).
(ii) A negative correlation means that larger class size is associated with lower performance.
We might find a negative correlation because larger class size actually hurts performance.
However, with observational data, there are other reasons we might find a negative
relationship. For example, children from more affluent families might be more likely to
attend schools with smaller class sizes, and affluent children generally score better on
standardized tests. Another possibility is that, within a school, a principal might assign the
better students to smaller classes. Or, some parents might insist their children are in the
smaller classes, and these same parents tend to be more involved in their childrens
education.
(iii) Given the potential for confounding factors some of which are listed in (ii) finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to
better performance. Some way of controlling for the confounding factors is needed, and this
is the subject of multiple regression analysis.

Q2. Wooldridge 1.2


(i) Here is one way to pose the question: If two firms, say A and B, are identical in all
respects except that firm A supplies job training one hour per worker more than firm B, by
how much would firm As output differ from firm Bs?
(ii) Firms are likely to choose job training depending on the characteristics of workers.
Some observed characteristics are years of schooling, years in the workforce, and
experience in a particular job. Firms might even discriminate based on age, gender, or race.
Perhaps firms choose to offer training to more or less able workers, where ability might be
difficult to quantify but where a manager has some idea about the relative abilities of
different employees. Moreover, different kinds of workers might be attracted to firms that
offer more job training on average, and this might not be evident to employers.
(iii) The amount of capital and technology available to workers would also affect output. So,
two firms with exactly the same kinds of employees would generally have different outputs
if they use different amounts of capital or technology. The quality of managers would also
have an effect.
(iv) No, unless the amount of training is randomly assigned. The many factors listed in parts
(ii) and (iii) can contribute to finding a positive correlation between output and training
even if job training does not improve worker productivity.
Q3. Wooldridge C1.3 (meap01_ch01.do)
(i) The largest is 100, the smallest is 0.
(ii) 38 out of 1,823, or about 2.1 percent of the sample.
(iii) 17
(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least in
2001, the reading test was harder to pass.
(v) The sample correlation between math4 and read4 is about 0.843, which is a very high
degree of (linear) association. Not surprisingly, schools that have high pass rates on one test
have a strong tendency to have high pass rates on the other test.

(vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which
shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the
maximum is $11,957.64.]
(vii) 8.7%. Note that, by Taylor expansion, log(x + h) log(x) = log(1 + h/x) h/x, for
positive x and small h/x (in absolute value).

Q4. Wooldridge C1.4 (jtrain2_ch01.do)


(i) 185/445 .416 is the fraction of men receiving job training, or about 41.6%.
(ii) For men receiving job training, the average of re78 is about 6.35, or $6,350. For men not
receiving job training, the average of re78 is about 4.55, or $4,550. The difference is $1,800,
which is very large. On average, the men receiving the job training had earnings about 40%
higher than those not receiving training.
(iii) About 24.3% of the men who received training were unemployed in 1978; the figure is
35.4% for men not receiving training. This, too, is a big difference.
(iv) The differences in earnings and unemployment rates suggest the training program had
strong, positive effects. Our conclusions about economic significance would be stronger if
we could also establish statistical significance (which is done in Computer Exercise C9.10 in
Chapter 9).

STATA Hints

Some of STATA do-files for solving the Problem Set are provided in doFilesUpload.zip. It is
important to understand the commands in the do-files so that you will be able write your
own do-files for your assignments and the course project.

Week 3 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

The minimum requirement for OLS to be carried out for the data set {(xi, yi), i=1,,n} with
the sample size n > 2 is that the sample variance of x is positive. In what circumstances is the
sample variance of x zero?
When all observations on x have the same value, there is no variation in x.
The OLS estimation of the simple regression model has the following properties:
a) the sum of the residuals is zero;
b) the sample covariance of the residuals and x is zero.
Why? How would you relate them to the least squares principle?
These are derived from the first-order conditions for minimising the sum of squared
residuals (SSR).
Convince yourself that the point ( x , y ) , the sample means of x and y, is on the sample

regression function (SRF), which is a straight line.


This can be derived from the first of the first-order conditions for minimising SSR.
How do you know that SST = SSE + SSR is true?
The dependent variable values can be expressed as the fitted value plus the residual. After
taking the average away from both side of the equation, we find ( ) = ( ) + .
The required SST = SSE + SSR is obtained by squaring and summing both sides of the
above equation, noting ( ) = 1 ( ), and using the second of the first-order
conditions of minimising SSR.
Which of the following models is (are) nonlinear model(s)?
a) sales = 0 /[1 + exp(-1 ad_expenditure)] + u;
b) sales = 0 + 1 log(ad_expenditure) + u;
c) sales = 0 + 1 exp(ad_expenditure) + u;
d) sales = exp(0 + 1 ad_expenditure + u).
The model in a) is nonlinear.
Can you follow the proofs of Theorems 2.1-2.3?
You should be able to follow the proofs. If not at this point, try again.

Problem Set

Q1. Wooldridge 2.4


(i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20, the predicted
bwght = 109.49. This is about an 8.6% drop.
(ii) Not necessarily. There are many other factors that can affect birth weight,
particularly overall health of the mother and quality of prenatal care. These could be
correlated with cigarette smoking during birth. Also, something such as caffeine
consumption can affect birth weight, and might also be correlated with cigarette
smoking.
(iii) If we want a predicted bwght of 125, then cigs = (125 119.77)/(.524) = 10.18, or
about 10 cigarettes! This is nonsense, of course, and it shows what happens when we
are trying to predict something as complicated as birth weight with only a single
explanatory variable. The largest predicted birth weight is necessarily 119.77. Yet
almost 700 of the births in the sample had a birth weight higher than 119.77.
(iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because
we are using only cigs to explain birth weight, we have only one predicted birth weight

at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the
observed birth weights at cigs = 0, and so we will under predict high birth rates.

Q2. Wooldridge 2.7


(i) When we condition on inc in computing an expectation, inc becomes a constant. So
E(u|inc) = E(e|inc) = E(e|inc) = 0 because E(e|inc) = E(e) = 0.
(ii) Again, when we condition on inc in computing a variance, inc becomes a constant. So
Var(u|inc) = Var(e|inc) = ()2Var(e|inc) = 2 inc because Var(e|inc) = 2 .
(iii) Families with low incomes do not have much discretion about spending; typically, a
low-income family must spend on food, clothing, housing, and other necessities. Higher
income people have more discretion, and some might choose more consumption while
others more saving. This discretion suggests wider variability in saving among higher
income families.
Q3. Wooldridge C2.6 (meap93_ch02.do)
(i) It seems plausible that another dollar of spending has a larger effect for low-spending
schools than for high-spending schools. At low-spending schools, more money can go
toward purchasing more books, computers, and for hiring better qualified teachers. At high
levels of spending, we would expect little, if any, effect because the high-spending schools
already have high-quality teachers, nice facilities, plenty of books, and so on.
(ii) If we take changes, as usual, we obtain
= 1 log() 1 (%),
10

100

just as in the second row of Table 2.3. So, if %expend = 10, math10 = 1/10.
(iii) The regression results are
= 69.34 + 11.16 log() , = 408, 2 = .0297
10
increases by about 1.12 points. This is not a
(iv) If expend increases by 10 percent, 10
huge effect, but it is not trivial for low-spending schools, where a 10 percent increase in
spending might be a fairly small dollar amount.
(v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100.
In fact, the largest fitted values is only about 30.2.

Q4. Wooldridge C2.7 (charity_ch02.do)


(i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not
give a gift, or about 60 percent.
(ii) The average mailings per year are about 2.05. The minimum value is .25 (which
maximum value is 3.5.
(iii) The estimated equation is
= 2.01 + 2.65 mailsyear, n = 4,268, 2 = .0138

(iv) The slope coefficient from part (iii) means that each mailing per year is associated with
perhaps even causes an estimated 2.65 additional guilders, on average. Therefore, if
each mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65
guilders. This is only the average, however. Some mailings generate no contributions, or a
contribution less than the mailing cost; other mailings generated much more than the
mailing cost.
(v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts
is 2.01 + 2.65(.25) 2.67. Even if we look at the overall population, where some people have
received no mailings, the smallest predicted value is about two. So, with this estimated
equation, we never predict zero charitable gifts.

Week 4 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What do we mean when we say regress wage on educ and expr?


Use OLS to estimate the multiple linear regression model: wage = 0 + 1educ + 2expr + u.
Why and under what circumstance do we need to control for expr in the regression model
in order to quantify the effect of educ on wage?
We need to control for expr in the model when the partial or ceteris paribus effect of educ on
wage is the parameter of interest and expr and educ are correlated.
What is the bias of an estimator?
bias =E( ) j.
What is the omitted variable bias?
It is the bias resulted from omitting relevant variables that are correlated with the
explanatory variables in a regression model.
What is the consequence of adding an irrelevant variable to a regression model?
Including an irrelevant variable will generally increase the variance of the OLS estimators.
What is the requirement of the ZCM assumption, in your own words?
You must do this by yourself.
Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression
model?
When the mean value of the disturbance is not zero, you can always define the new
disturbance as the original disturbance minus the mean value, and define the new intercept
as the original intercept plus the mean value.
In terms of notation, why do we need two subscripts for independent variables?
In our notation, the first subscript is for indexing observations (1st obs, 2nd obs, etc.) and
the second subscript is for indexing variables (1st variable, 2nd variable, etc.).
Using OLS to estimate a multiple regression model with k independent variables and an
intercept, how many first-order conditions are there?
k + 1.
How do you know that the OLS estimators are linear combinations of the observations on
the dependent variable?
By the first-order conditions, we see that the OLS estimators are the solution to k+1 linear
equations with k+1 unknowns. The right-hand-sides of the equations are linear functions
of observations on the dependent variable (say y). Hence the solution is linear combinations
of the y-observations.
What is Gauss-Markov theorem? Read the textbook.
Try to explain why R2 never decreases (it is likely to increase) when additional explanatory
variables are added to the regression model.
First, the smaller is the SSR, the greater is the R-squared. The OLS chooses parameter
estimates to minimise the SSR. With extra explanatory variables, the R-squared does not
decrease because the SSR does not increase, which is true because the OLS has a larger
parameter space to choose estimates (more possibilities to make SSR small).
What is an endogenous explanatory variable? Exogenous explanatory variable?
See the top of page 88.
What is multicollinearity and its likely effect on the OLS estimators? Read pages 95-99.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 3.2
(i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a
family, the less education any one child in the family has. To find the increase in the number
of siblings that reduces predicted education by one year, we solve 1 = .094(sibs), so sibs =
1/.094 10.6.
(ii) Holding sibs and feduc fixed, one more year of mothers education implies .131 years
more of predicted education. So if a mother has four more years of education, her son is
predicted to have about a half a year (.524) more years of education.
(iii) Since the number of siblings is the same, but meduc and feduc are both different, the
coefficients on meduc and feduc both need to be accounted for. The predicted difference in
education between B and A is .131(4) + .210(4) = 1.364.

Q2. Wooldridge 3.10


(i) Because x1 is highly correlated with x2 and x3, and these latter variables have large partial
effects on y, the simple and multiple regression coefficients on x1 can differ by large amounts.
We have not done this case explicitly, but given equation (3.46) and the discussion with a
single omitted variable, the intuition is pretty straightforward.
(ii) Here we would expect 1 and 1 to be similar (subject, of course, to what we mean by
almost uncorrelated). The amount of correlation between x2 and x3 does not directly effect
the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3.
(iii) In this case we are (unnecessarily) introducing multicollinearity into the regression: x2
and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1. Adding x2
and x3 likely increases the standard error of the coefficient on x1 substantially, so se(1 ) is
likely to be much larger than se(1 ).
(iv) In this case, adding x2 and x3 will decrease the residual variance without causing much
collinearity (because x1 is almost uncorrelated with x2 and x3 ), so we should see se(1 )
smaller than se(1 ). The amount of correlation between x and does not directly affect se(1 ).
Q3. Wooldridge C3.2 (hprice1_ch03.do)
(i) The estimated equation is

= 19.32 + .128 + 15.20 , = 88, 2 = .632


(ii) Holding square footage constant, (the price increase) = 15.20 (increase in bedroom).
Hence the price increases by 15.20, which means $15,200.
(iii) Now (the price increase) = .128 (increase in square feet) + 15.20 (increase in bedroom)
= .128 (140) + 15.20 (1) = 33.12. That is $33,120. Because the house size is increasing, the
effect is larger than that in (ii).
(iv) About 63.2%
(v) The predicted price is -19.32+.128(2438)+15.20(4) = 353.544, or $353,544.
(vi) The residual = 300,000 353,544 = - 53,544. The buyer under-paid, judged by the
predicted price. But, of course, there are many other features of a house (some that we
cannot even measure) that affect price. We have not controlled for those.

Q4. Wooldridge C3.6 (wage2_ch03_c3_6.do)


(i) The slope coefficient from regressing IQ on educ is 3.53383.
(ii) The slope coefficient from log(wage) on educ is .05984.
(iii) The slope coefficient from log(wage) on educ and IQ are .03912 and .00586 respectively.
~
(iv) The computed 1 + 1 2 = .03912 + 3.53383(.00586) .05984.

Week 5 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What are the CLM assumptions?


These are MLR1, MLR2, MLR3 and MLR6. MLR6 is a very strong assumption that implies
both MLR4 and MLR5.
What is the sampling distribution of the OLS estimators under the CLM assumptions?
The OLS estimators under the CLM assumptions follow the normal distribution.
What are the standard errors of the OLS estimators?
The standard error is the square root of the estimated variance of the estimator.
What is the null hypothesis about a parameter?
In this context, the null hypothesis is a statement that assigns a known value (or
hypothesised value) to the parameter of interest. Usually, the null is a maintained
proposition (from economic theories or experience) that will only be rejected on very
strong evidence.
What is a one-tailed (two-tailed) alternative hypothesis?
This should be clear once you finish your reading.
In testing hypotheses, what is a Type 1 (Type 2) error? What is the level of significance?
Type 1 error is the rejection of a true null. Type 2 error is the non-rejection of a false null.
The decision rule we use can be stated as reject the null if the t-statistic exceeds the critical
value. How is the critical value determined?
The critical value is determined by the level of significance (or level of confidence in the case
of confidence intervals) and the distribution of the test statistic. For example, for t-stat, we
use the t-distribution when the sample size is small, and the normal distribution when the
sample size is large (df > 120).
Justify the statement Given the observed test statistic, the p-value is the smallest significant
level at which the null hypothesis would be rejected.
If you used the observed t-stat as your critical value, you would just reject the null (because
the t-stat is equal to this value). This critical value (= the t-stat value) implies a level of
significance, say p, which is exactly the p-value. For any significance level smaller than p, the
corresponding critical value is more extreme than the t-stat value and does not lead to the
rejection of the null. Hence, p is the smallest significance level at which the null would be
rejected.
What is the 90% confidence interval for a parameter?
It is an interval [L, U] defined by two sample statistics: the lower bound L and the upper
bound U. The probability that [L, U] covers the parameter is 0.9.
In constructing a confidence interval for a parameter, what is the level of confidence?
It is the probability that the CI covers the parameter.
When the level of confidence increases, how would the width of the confidence interval
change (holding other things fixed)? The width will increase.
Try to convince yourself that the event the 90% confidence interval covers a hypothesised
value of the parameter is the same as the event the null of the parameter being the
hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10%
level of significance.
It becomes apparent if you start with P(|t-stat| > c) = 0.9 and disentangle the expression for
the t-stat.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 4.1
(i) and (iii) generally cause the t statistics not to have a t distribution under H0.
Homoskedasticity is one of the CLM assumptions. An important omitted variable violates
Assumption MLR.4 (hence MLR.6) in general. The CLM assumptions contain no mention of
the sample correlations among independent variables, except to rule out the case where the
correlation is one (MLR.3).

Q2. Wooldridge 4.2


(i) H0:3 = 0. H1:3 > 0.
(ii) The proportionate effect on is .00024(50) = .012. To obtain the percentage effect, we
multiply this by 100: 1.2%. Therefore, a 50 point ceteris paribus increase in ros is predicted
to increase salary by only 1.2%. Practically speaking, this is a very small effect for such a
large change in ros.
(iii) The 10% critical value for a one-tailed test, using df = , is obtained from Table G.2 as
1.282. The t statistic on ros is .00024/.00054 .44, which is well below the critical value.
Therefore, we fail to reject H0 at the 10% significance level.
(iv) Based on this sample, the estimated ros coefficient appears to be different from zero
only because of sampling variation. On the other hand, including ros may not be causing any
harm; it depends on how correlated it is with the other independent variables (although
these are very significant even with ros in the equation).
Q3. Wooldridge 4.5
(i) .412 1.96(.094), or about [.228, .596].
(ii) No, because the value .4 is well inside the 95% CI.
(iii) Yes, because 1 is well outside the 95% CI.

Q4. Wooldridge C4.8 (401ksubs_ch04.do)


(i) There are 2,017 single people in the sample of 9,275.
(ii) The estimated equation is
= 43.04 + .799 inc + .843 age

( 4.08) (.060) (.092)


n = 2,017, R2 = .119.
The coefficient on inc indicates that one more dollar in income (holding age fixed) is
reflected in about 80 more cents in predicted nettfa; no surprise there. The coefficient on
age means that, holding income fixed, if a person gets another year older, his/her nettfa is
predicted to increase by about $843. (Remember, nettfa is in thousands of dollars.) Again,
this is not surprising.
(iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age =
0. Clearly, there is no one with even close to these values in the relevant population.
(iv) The t statistic is (.843 1)/.092 1.71. Against the one-sided alternative H1: 2 < 1, the
p-value is about .044. Therefore, we can reject H0: 2 = 1 at the 5% significance level (against
the one-sided alternative).
(v) The slope coefficient on inc in the simple regression is about .821, which is not very
different from the .799 obtained in part (ii). As it turns out, the correlation between inc and
age in the sample of single people is only about .039, which helps explain why the simple
and multiple regression estimates are not very different, see the text also.

10

Week 6 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

How would you test hypotheses about a single linear combination of parameters?
We may re-parameterise the regression model to isolate the single linear combination.
Then the OLS on the re-parameterised model will provide the estimate of the single linear
combination and the related standard error. See the example in the textbook and lecture
slides
What are exclusion restrictions for a regression model?
Exclusion restrictions are the null hypothesis that a group of x-variables have zero
coefficients in the regression model.
What are restricted and unrestricted models?
When the null hypothesis is a set of restrictions on the parameters, the regression under the
null is called the restricted model, while the regression under the alternative (which simply
states that the null is false) is called the unrestricted model.
How do you compute the F-statistic, given that you have SSRs?
Clearly, the general F-stat is based on the relative difference between the SSRs under the
restricted model and unrestricted model: F = [(SSRr SSRur)/q]/[SSRur/(n-k-1)], where you
should be able to explain the meanings of various symbols.
What are general linear restrictions on parameters?
These are linear equations on the parameters. For example, 1 22 + 33 = 0.
What is the test for the overall significance of a regression?
This is the F-test for the null hypothesis that all the coefficients of x-variables are zero.
How would you report your regression results?
See the guidelines in Section 4.6 of the textbook.
Why would you care about the asymptotic properties of the OLS estimators?
When MLR.6 does not hold, the finite-sample distribution of the OLS estimators is not
available. For reasonably large samples (large n), we use the asymptotic distribution of the
OLS estimators, which is known from studying the asymptotic properties, to approximate
the finite-sample distribution of the OLS estimators, which is unknown.
Comparing the inference procedures in Chapter 5 with those in Chapter 4, can you list the
similarities and differences?
It is beneficial to make a list by yourself.
Under MLR.1-MLR.5, the OLS estimators are consistent, asymptotically normal, and
asymptotically efficient. Try to explain these properties in your own words.
This is really a test on your understanding of these notions in econometrics.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 4.6


(i) With df = n 2 = 86, we obtain the 5% critical value from Table G.2 with df = 90. Because
each test is two-tailed, the critical value is 1.987. The t statistic for H0:0 = 0 is about -.89,
which is much less than 1.987 in absolute value. Therefore, we fail to reject H0:0 = 0. The t
statistic for H0:1 = 1 is (.976 1)/.049 -.49, which is even less significant. (Remember,
we reject H0 in favor of H1 in this case only if |t| > 1.987.)
(ii) We use the SSR form of the F statistic. We are testing q = 2 restrictions and the df in the
unrestricted model is 86. We are given SSRr = 209,448.99 and SSRur = 165,644.51. Therefore,
F = [(209,448.99 165,644.51)/2]/[165,644.51/86] 11.37

11

which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is
4.85.
(iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there
are 88 5 = 83 df in the unrestricted model. The F statistic is
F = [(.829 .820)/3]/[(1 .829)/83] 1.46. The 10% critical value (again using 90
denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact,
the p-value is about .23.
(iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F
statistic would not have an F distribution under the null hypothesis. Therefore, comparing
the F statistic against the usual critical values, or obtaining the p-value from the F
distribution, would not be especially meaningful.

Q2. Wooldridge 4.8


(i) We use Property VAR.3 from Appendix B:
Var(1 32 ) = Var (1 ) + 9 Var (2 ) 6 Cov (1 , 2 ).
(ii) t = (1 32 1)/se(1 32 ), so we need the standard error of 1 32 .
(iii) Because = 1 32, we can write 1 = + 32. Plugging this into the population model
gives y = 0 + ( + 32)x1 + 2 x2 + 3 x3 + u = 0 + x1 + 2 (3 x1 + x2) + 3 x3 + u. This last
equation is what we would estimate by regressing y on x1, 3x1 + x2, and x3. The coefficient
and standard error on x1 are what we want

Q3. Wooldridge 4.10


(i) We need to compute the F statistic for the overall significance of the regression with n =
142 and k = 4: F = [.0395/(1 .0395)](137/4) 1.41. The 5% critical value with 4 and 120
df is 2.45, which is well above the value of F. Therefore, we fail to reject H0: 1 = 2 = 3 = 4
= 0 at the 10% level. No explanatory variable is individually significant at the 5% level. The
largest absolute t statistic is on dkr, tdkr 1.60, which is not significant at the 5% level
against a two-sided alternative.
(ii) The F statistic (with the same df) is now [.0330/(1 .0330)](137/4) 1.17, which is
even lower than in part (i). None of the t statistics is significant at a reasonable level.
(iii) We probably should not use the logs, as the logarithm is not defined for firms that have
zero for dkr or eps. Therefore, we would lose some firms in the regression.
(iv) It seems very weak. There are no significant t statistics at the 5% level (against a twosided alternative), and the F statistics are insignificant in both cases. Plus, less than 4% of
the variation in return is explained by the independent variables.
Q4. Wooldridge C4.9 (discrim_c4_9.do)
(i) The results from the OLS regression, with standard errors in parentheses, are

log()
= 1.46 + .073 prpblck + .137 log(income) + .380 prppov
(0.29) (.031)
(.027)
(.133)
2
n = 401, R = .087
The p-value for testing H0: 1= 0 against the two-sided alternative is about .018, so that we
reject H0 at the 5% level but not at the 1% level.
(ii) The correlation is about .84, indicating a strong degree of multicollinearity. Yet each
coefficient is very statistically significant: the t statistic for log() is about 5.1 and that
for prppov is about 2.86 (two-sided p-value = .004).

(iii) The OLS regression results when log(hseval) is added are

log()
= .84 + .098 prpblck .053 log(income) + .052 prppov + .121 log(hseval)
(.29) (.029)
(.038)
(.134)
(.018)

12

n = 401, R2 = .184
The coefficient on log(hseval) is an elasticity: a one percent increase in housing value,
holding the other variables fixed, increases the predicted price by about .12 percent. The
two-sided p-value is zero to three decimal places.
(iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the
15% significance level against a two-sided alternative for log(income), and prppov is does
not have a t statistic even close to one in absolute value). Nevertheless, they are jointly
significant at the 5% level because the outcome of the F2,396 statistic is about 3.52 with pvalue = .030. All of the control variables log(income), prppov, and log(hseval) are highly
correlated, so it is not surprising that some are individually insignificant.
(v) Because the regression in (iii) contains the most controls, log(hseval) is individually
significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable.
It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is
that if prpblck increases by .10, psoda is estimated to increase by 1%, other factors held fixed.

Q5. Wooldridge 5.2


This is about the inconsistency of the simple regression of pctstck on funds. A higher
tolerance of risk means more willingness to invest in the stock market, so 2 > 0. By
assumption, funds and risktol are positively correlated. Now we use equation (5.5), where
1 > 0: plim(1 ) = 1 + 21 > 1, so 1 has a positive inconsistency (asymptotic bias). This
makes sense: if we omit risktol from the regression and it is positively correlated with funds,
some of the estimated effect of funds is actually due to the effect of risktol.

Q6. Wooldridge C5.1 (wage1_c5_1.do)


(i) The estimated equation is

= 2.87 + .599 educ + .022 exper + .169 tenure


(0.73) (.051)
(.012)
(.022)
n = 526, R2 = .306, = 3.085.
Below is a histogram of the 526 residual, , i = 1, 2 , ..., 526. The histogram uses 27 bins,
which is suggested by the formula in the Stata manual for 526 observations. For comparison,
the normal distribution that provides the best fit to the histogram is also plotted.

13

(ii) With log(wage) as the dependent variable the estimated equation is

log()
= .284 + .092 educ + .0041 exper + .022 tenure
(.104) (.007)
(.0017)
(.003)
2
n = 526, R = .316, = .441.
The histogram for the residuals from this equation, with the best-fitting normal distribution
overlaid, is given below:

(iii) The residuals from the log(wage) regression appear to be more normally distributed.
Certainly the histogram in part (ii) fits under its comparable normal density better than in
part (i), and the histogram for the wage residuals is notably skewed to the left. In the wage
regression there are some very large residuals (roughly equal to 15) that lie almost five
estimated standard deviations ( = 3.085) from the mean of the residuals, which is
identically zero, of course. Residuals far from zero does not appear to be nearly as much of a
problem in the log(wage) regression.
The skewness and kurtosis of the residuals from the OLS regression may be used to test the
null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null). The null
distribution of the Jarque-Bera test statistic (which takes into account both skewness and
kurtosis) is approximately the chi-squared with 2 degrees of freedom. Hence, at the 5% level,
we reject the normality of u whenever the JB statistic is greater than 5.99. In STATA, this test
is carried out by the command sktest. With the data in WAGE1.RAW, JB = (extremely
large) for the wage model and JB = 10.59 for the log(wage) model. While the normality is
rejected for both models, we do see that the JB statistic is much smaller for the log(wage)
model [also compare its p-values of skewness and excess kurtosis tests against those from
the wage model]. Hence the normality appears to be a reasonable approximation to the
distribution of the error term in the log(wage) model.

14

Week 7 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What are the advantages of using the log of a variable in regression? Find the rules of
thumb for taking logs.
See page 191 of Wooldridge (p198-199 for the 3rd edition)
Be careful when you interpret the coefficients of explanatory variables in a model where
some variables are in logarithm. Do you remember Table 2.3?
Consult Table 2.3.
How do you compute the change in y caused by x when the model is built for log(y)?
A precise computation is given by Equation (6.8) of Wooldridge.
Why do we need the interaction terms in regression models?
An interaction term is needed if the partial effect of an explanatory variable is linearly
related to another explanatory variable. See (6.17) for example.
What is the adjusted R-squared? What is the difference between it and the R-squared?
The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding
additional explanatory variables to a model. The R-squared can never fall when a new
explanatory variable is added. However, the adjusted R-squared will fall if the t-ratio on the
new variable is less than one in absolute value.
How do you construct interval prediction for given x-values?
This is nicely summarised in Equations (6.27)-(6.31) and the surrounding text.
How do you predict y for given x-values when the model is built for log(y)?
Check the list on page 212 (p220 for the 3rd edition).
What is involved in residual analysis?
See pages 209-210 (p217-218 for the 3rd edition).

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 6.4


(i) Holding all other factors fixed we have
log(wage) = 1 educ + 2 educ pareduc = (1 + 2 pareduc) educ .
Dividing both sides by educ gives the result. The sign of 2 is not obvious, although 2 > 0 if
we think a child gets more out of another year of education the more highly educated are
the childs parents.
(ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on
educpareduc. The difference in the estimated return to education is .00078(32 24) = .0062,
or about .62 percentage points. (Percentage points are changes in percentages.)
(iii) When we add pareduc by itself, the coefficient on the interaction term is negative. The t
statistic on educpareduc is about 1.33, which is not significant at the 10% level against a
two-sided alternative. Note that the coefficient on pareduc is significant at the 5% level
against a two-sided alternative. This provides a good example of how omitting a level effect
(pareduc in this case) can lead to biased estimation of the interaction effect.
Q2. Wooldridge 6.5
This would make little sense. Performances on math and science exams are measures of
outputs of the educational process, and we would like to know how various educational
inputs and school characteristics affect math and science scores. For example, if the
staff-to-pupil ratio has an effect on both exam scores, why would we want to hold

15

performance on the science test fixed while studying the effects of staff on the math pass
rate? This would be an example of controlling for too many factors in a regression
equation. The variable scill could be a dependent variable in an identical regression
equation.

Q3. Wooldridge C6.3 (C6.2 for the 3rd Edition) (wage2_c6_3.do)


(i) Holding exper (and the elements in u) fixed, we have
log(wage) = 1educ + 2educexper = (1 + 2exper)educ ,
or
log(wage)/ educ = (1 + 2exper),
This is the approximate proportionate change in wage given one more year of education.
(ii) H0: 3 = 0. If we think that education and experience interact positively so that people
with more experience are more productive when given another year of education, then 3 >
0 is the appropriate alternative.
(iii) The estimated equation is

log()
= 5.95 + .0440 educ .0215 exper + .00320 educexper
(0.24) (.0174)
(.0200)
(.00153)
2
2
n = 935, R = .135, adjusted-R = .132.
The t statistic on the interaction term is about 2.13, which gives a p-value below .02 against
H1: 3 > 0. Therefore, we reject H0: 3 = 0 in favor of H1: 3 > 0 at the 2% level.
(iv) We rewrite the equation as
log(wage) = 0 + 1educ + 2exper + 3 educ (exper 10) + u,
and run the regression log(wage) on educ, exper, and educ(exper 10). We want the
coefficient on educ. We obtain 1 .0761 and se(1) .0066. The 95% CI for 1 is about .063
to .089.

Q4. Wooldridge C6.8 (hprice1_c6_8.do)


(i) The estimated equation (where price is in dollars) is

= 21,770.3 + 2.068 lotsize + 122.78 sqrft + 13,852.5 bdrms


(29,475.0) (0.642)
(13.24)
(9,010.1)
2
2
n = 88, R = .672, adjusted-R = .661, = 59,833.
The predicted price at lotsize = 10,000, sqrft = 2,300, and bdrms = 4 is about $336,714.
(ii) The regression is pricei on (lotsizei 10,000), (sqrfti 2,300), and (bdrmsi 4). We want
the intercept estimate and the associated 95% CI from this regression. The CI is
approximately 336,706.7 14,665, or about $322,042 to $351,372 when rounded to the
nearest dollar.
(iii) We must use equation (6.36) to obtain the standard error of and then use equation
(6.37) (assuming that price is normally distributed). But from the regression in part (ii),
se( 0 ) 7,374.5 and 59,833. Therefore, se( 0 ) [(7,374.5)2 + (59,833) 2]1/2 60,285.8.
Using 1.99 as the approximate 97.5th percentile in the t84 distribution gives the 95% CI for
price0, at the given values of the explanatory variables, as 336,706.7 1.99(60,285.8) or,
rounded to the nearest dollar, $216,738 to $456,675. This is a fairly wide prediction interval.
But we have not used many factors to explain housing price. If we had more, we could,
presumably, reduce the error standard deviation, and therefore , to obtain a tighter
prediction interval.

16

Week 8 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What is a qualitative factor? Give some examples.


This should be easy.
How do you use dummy variables to represent qualitative factors?
A factor with two levels (or values) is easily represented one dummy variable. For a factor
with more than two levels (or values), we need more than one dummies. For example, a
factors with 5 values requires 4 dummy variables.
How do you use dummy variables to represent an ordinal variable?
An ordinal variable may be represented by dummies in the same way as a multi-levelled
factor.
How do you test for differences in regression functions across different groups?
We generally use the F test on properly-defined restricted model and unrestricted model.
Chow test is a special case of the F test, where the null hypothesis is no difference across
groups.
What can you achieve by interacting group dummy variables with other regressors?
The interaction allows the slope coefficients to be different across groups.
What is program evaluation?
It assesses the effectiveness of a program, by checking the differences in regression
coefficients for the controlled (or base) group and the treatment group.
What are the interpretations of the PRF and SRF when the dependent variable is binary?
The PRF is the conditional probability of success for given explanatory variables. The SRF
is the predicted conditional probability of success for given explanatory variables.
What are the shortcomings of the LPM?
The major drawback of the LPM is that the predicted probability of success may be outside
the interval [0,1].
Does MLR5 hold for the LPM?
No, as indicated in the textbook and lecture, the conditional variance of the disturbance is
not a constant and depends on explanatory variables: var(u|x) = E(y|x)[1 E(y|x)].

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 7.4


(i) The approximate difference is just the coefficient on utility times 100, or 28.3%. The t
statistic is .283/.099 2.86, which is statistically significant.
(ii) 100[exp(.283) 1) 24.7%, and so the estimate is somewhat smaller in magnitude.
(iii) The proportionate difference is .181 .158 = .023, or about 2.3%. One equation that can
be estimated to obtain the standard error of this difference is
log(salary) = 0 + 1log(sales) + 2roe + 1consprod + 2utility +3trans + u,
where trans is a dummy variable for the transportation industry. Now, the base group is
finance, and so the coefficient 1 directly measures the difference between the consumer
products and finance industries, and we can use the t statistic on consprod.
Q2. Wooldridge 7.6
In Section 3.3 in particular, in the discussion surrounding Table 3.2 we discussed how to
determine the direction of bias in the OLS estimators when an important variable (ability, in
this case) has been omitted from the regression. As we discussed there, Table 3.2 only

17

strictly holds with a single explanatory variable included in the regression, but we often
ignore the presence of other independent variables and use this table as a rough guide. (Or,
we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are
more likely to receive training, then train and u are negatively correlated. If we ignore the
presence of educ and exper, or at least assume that train and u are negatively correlated
after netting out educ and exper, then we can use Table 3.2: the OLS estimator of 1 (with
ability in the error term) has a downward bias. Because we think 1 0, we are less likely to
conclude that the training program was effective. Intuitively, this makes sense: if those
chosen for training had not received training, they would have lowers wages, on average,
than the control group.

Q3. Wooldridge 7.9


(i) Plugging in u = 0 and d = 1 gives f1(z) = (0+0) + (1+1)z.
(ii) Setting f0(z*) = f1(z*) gives 0 + 1z* = (0+0) + (1+1)z* . Therefore, provided 1 0,
we have z* = 0/1 . Clearly, z* is positive if and only if 0/1 is negative, which means 0
and 1 must have opposite signs.
(iii) Using part (ii) we have totcoll* = .357/.030 = 11.9 years. (totcoll= total college years).
(iv) The estimated years of college where women catch up to men is much too high to be
practically relevant. While the estimated coefficient on femaletotcoll shows that the gap is
reduced at higher levels of college, it is never closed not even close. In fact, at four years of
college, the difference in predicted log wage is still .357+.030(4) = .237, or about 21.1%
less for women.

Q4. Wooldridge C7.13 (apple_c7_13.do)


(i) 412/660 .624.
(ii) The OLS estimates of the LPM are
= .424 .803 ecoprc + .719 regprc + .00055 faminc + .024 hhsize

(.165) (.109)
(.132)
(.00053)
(.013)
+ .025 educ .00050 age
(.008)
(.00125)
n = 660, R2 = .110
If ecoprc increases by, say, 10 cents (.10), then the probability of buying eco-labeled apples
falls by about .080. If regprc increases by 10 cents, the probability of buying eco-labeled
apples increases by about .072. (Of course, we are assuming that the probabilities are not
close to the boundaries of zero and one, respectively.)
(iii) The F test, with 4 and 653 df, is 4.43, with p-value = .0015. Thus, based on the usual F
test, the four non-price variables are jointly very significant. Of the four variables, educ
appears to have the most important effect. For example, a difference of four years of
education implies an increase of .025(4) = .10 in the estimated probability of buying ecolabeled apples. This suggests that more highly educated people are more open to buying
produce that is environmentally friendly, which is perhaps expected. Household size (hhsize)
also has an effect. Comparing a couple with two children to one that has no children other
factors equal the couple with two children has a .048 higher probability of buying ecolabeled apples.
(iv) The model with log(faminc) fits the data slightly better: the R-squared increases to
about .112. (We would not expect a large increase in R-squared from a simple change in the
functional form.) The coefficient on log(faminc) is about .045 (t = 1.55). If log(faminc)
increases by .10, which means roughly a 10% increase in faminc, then P(ecobuy = 1) is
estimated to increase by about .0045, a pretty small effect.

18

(v) The fitted probabilities range from about .185 to 1.051, so none are negative. There are
two fitted probabilities above 1, which is not a source of concern with 660 observations.
0.5 and zero
(vi) Using the standard prediction rule predict one when
otherwise gives the fraction correctly predicted for ecobuy = 0 as 102/248 .411, so about
41.1%. For ecobuy = 1, the fraction correctly predicted is 340/412 .825, or 82.5%. With the
usual prediction rule, the model does a much better job predicting the decision to buy ecolabeled apples. (The overall percent correctly predicted is about 67%.)

19

Week 9 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What is heteroskedasticity in a regression model?


When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across
observation index (i), we say that heteroskedasticity is present.
In the presence of heteroskedasticity, are the t-stat and F-stat from the usual OLS still valid?
Why? Are there any other problems with the OLS under heteroskedasticity?
The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect
under heteroskedasticity. Further the OLS estimator will no longer be (asymptotically)
efficient and there are better estimators.
What are the heteroskedasticity-robust standard errors? How do you use them in STATA?
These are the corrected standard errors that take into account the possible presence of
heteroskedasticity. The t-stat and F-stat computed using the heteroskedasticity-robust
standard errors are valid test statistics. In STATA, these are easily obtained by using the
option robust with the regress command, e.g., regress lwage educ, robust;
How do you detect if there is heteroskedasticity?
The Breusch-Pagan test or the White test can be used to detect the presence of
heteroskedasticity. See Section 8.3 for details.
If heteroskedasticity is present in a known form, how would you estimate the model?
In this case, the WLS estimators should be used, which is more efficient than the OLS
estimators. See Section 8.4 for details.
If heteroskedasticity is present in an unknown form, how would you estimate the model?
If there is strong evidence for heteroskedasticity, the FGLS estimators should be used, which
is based on an exponential functional form. See Section 8.4 for details.
What are the steps in the FGLS estimation?
You should summarise from Section 8.4.
How would you handle the heteroskedasticity of the LPM?
The heteroskedasticity functional form is known for the LPM. Hence the WLS can be used in
principle. However, because the LPM can produce predicted probabilities that are outside
the interval (0,1), the known functional form p(x)[1-p(x)] may not be useful for the WLS. In
that case, it may be necessary to use FGLS for the LPM.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 8.1


Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing
that OLS is consistent. But we know that heteroskedasticity causes statistical inference
based on the usual t and F statistics to be invalid, even in large samples. As
heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE.
Q2. Wooldridge 8.2
With Var(u|inc,price,educ,female) = 2inc2, h(x) = inc2, where h(x) is the heteroskedasticity
function defined in equation (8.21). Therefore, h(x) = inc, and so the transformed equation
is obtained by dividing the original equation by inc:
beer/inc = 0(1/inc) + 1 + 2price/inc + 3educ/inc + 4female/inc + u/inc

20

Notice that 1, which is the slope on inc in the original model, is now a constant in the
transformed equation. This is simply a consequence of the form of the heteroskedasticity
and the functional forms of the explanatory variables in the original equation.

Q3. Wooldridge 8.3


False. The unbiasedness of WLS and OLS hinges crucially on Assumption MLR.4, and, as we
know from Chapter 4, this assumption is often violated when an important variable is
omitted. When MLR.4 does not hold, both WLS and OLS are biased. Without specific
information on how the omitted variable is correlated with the included explanatory
variables, it is not possible to determine which estimator has a small bias. It is possible that
WLS would have more bias than OLS or less bias. Because we cannot know, we should not
claim to use WLS in order to solve biases associated with OLS.

Q4. Wooldridge 8.5


(i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust
ones are practically very similar.
(ii) The effect is .029(4) = .116, so the probability of smoking falls by about .116.
(iii) As usual, we compute the turning point in the quadratic: .020/[2(.00026)] 38.46, so
about 38 and one-half years.
(iv) Holding other factors in the equation fixed, a person in a state with restaurant smoking
restrictions has a .101 lower chance of smoking. This is similar to the effect of having four
more years of education.
(v) We just plug the values of the independent variables into the OLS regression line:
= .656 .069log(67.44)+.012log(6,500) .029(16)+.020(77) .00026(77)2 .0052.

Thus, the estimated probability of smoking for this person is close to zero. (In fact, this
person is not a smoker, so the equation predicts well for this particular observation.)

Q5. Wooldridge C8.10 (401ksubs_c8_10.do)


(i) In the following equation, estimated by OLS, the usual standard errors are in () and the
heteroskedasticity-robust standard errors are in []:
= .506 + .0124 inc .000062 inc2 + .0265 age .00031 age2 .0035 male
401
(.081) (.0006)
(.000005)
(.0039)
(.00005)
(.0121)
[.079] [.0006]
[.000005]
[.0038]
[.00004]
[.0121]
n = 9,275, R2 = .094.
There are no important differences; if anything, the robust standard errors are smaller.
(ii) This is a general claim. Since Var(y|x) = p(x)[1-p(x)], we can write E(u2|x) = p(x)-p(x)2.
Written in error form, u2 = p(x) - p(x)2 + v. In other words, we can write this as a regression
model u2 = 0 + 1p(x) + 2p(x)2 + v, with the restrictions 0 = 0, 1 = 1, and 2 = -1.
Remember that, for the LPM, the fitted values, , are estimates of p(x). So, when we run the
regression 2 on and 2 (including an intercept), the intercept estimates should be close
to zero, the coefficient on should be close to one, and the coefficient on 2 should be
close to 1.
(iii) The White LM statistic and F statistic about 581.9 and 310.32 respectively, both of
is about 1.010, the coefficient on 401
2
which are very significant. The coefficient on 401
about .970, and the intercept is about -.009. These estimates are quite close to what we
expect to find from the theory in part (ii).

21

(iv) The smallest fitted value is about .030 and the largest is about .697. The WLS estimates
of the LPM are
= .488 + .0126 inc .000062 inc2 + .0255 age .00030 age2 .0055 male
401
(.076) (.0005) (.000004)
(.0037)
(.00004)
(.0117)
n = 9,275, R2 = .108.
There are no important differences with the OLS estimates. The largest relative change is in
the coefficient on male, but this variable is very insignificant using either estimation method.

22

Week 10 Tutorial Exercises


Readings

Read Chapter 9 thoroughly.


Make sure that you know the meanings of the Key Terms at the chapter end.

Review Questions (these may or may not be discussed in tutorial classes)

What is functional form misspecification? What is its major consequence in regression


analysis?
As the word misspecification suggests, it is a mis-specified model with a wrong functional
form for explanatory variables. For example, a log wage model that does not include age2 is
mis-specified if the partial effect of age on log wage is upside-down-U shaped. In linear
regression models, the functional form misspecification can be viewed as the omission of a
nonlinear function of the explanatory variables. Hence the consequences of omitting
important variables apply and the major one is estimation bias.
How would you test for functional form misspecification?
The RESET may be used to test for functional form misspecification, more specifically, for
neglected nonlinearities (nonlinear function of regressors). The RESET is based on the Ftest for the joint significance of the squared and cubed predicted value (y-hat) in a
expanded model that includes all the original regressors as well as the squared and cubed
y-hat, where the latter represent possible nonlinear functions of the regressors.
What are nested models? And, nonnested models?
Two models are nested if one is a restricted version of the other (by restricting the values of
parameters). Two models are nonnested if neither nests the other.
What is the purpose of testing one model against another?
The purpose of testing one model against another is to select a best or preferred model
for statistical inference and prediction.
How would you test for two nonnested models?
There are two strategies. The first is to test exclusion restrictions in an expanded model that
nest both candidate models and decide which model can be excluded. The second is to test
the significance of the predicted value of one model in the other model (known as DavidsonMacKinnon test).
What is a proxy variable? What are the conditions for a proxy variable to be valid in
regression analysis?
A proxy variable is one that is used to represents the influence of an unobserved (and
important) explanatory variable. There are two conditions for the validity of a proxy. The
first is that the zero conditional mean assumption holds for all explanatory variables
(including the unobserved and the proxy). The second is that the conditional mean of the
unobserved, given other explanatory variables and the proxy, only depends on the proxy.
Can you analyse the consequences of measurement errors?
With a careful reading of Section 9.4, you should be able to analyse.
In what circumstances missing observations will cause major concerns in regression
analysis?
Concerns arise when observations are systematically missing and the sample can no longer
be regarded as a random sample. Nonrandom samples may cause bias in the OLS estimation.
What is exogenous sample selection? What is endogenous sample selection?

23

The exogenous sample selection is a nonrandom sampling scheme in which a sample is


chosen on the basis of the explanatory variables. It is easily seen that such sampling scheme
does not violate MLR1,3,4 and hence does not introduce bias in the OLS estimation. On the
other hand, the endogenous sample selection scheme selects a sample on the basis of the
dependent variable and causes bias in the OLS estimation.
What are outliers and influential observations?
Outliers are unusual observations that are far away from the centre of other observations. A
small number of outliers can have large impact on the OLS estimates of parameters. These
are called influential observations if the estimation results when including them are
significantly different from the results when excluding them.
Consider the simple regression model yi = + ( + bi)xi + ui for i = 1, 2,, n, where the slop
parameter contains a random variable bi; and are constant parameters. Assume that the
usual MLR1-5 hold for yi = + xi + ui, ; E(bi|xi) = 0 and Var(bi|xi) = 2. If we regress y on x
with a intercept, will the estimator of the slope parameter be biased from ? Will the usual
OLS standard errors be valid for statistical inference?
As the model can be writted as yi = + xi + (bixi + ui), where the overall disturbance is (bixi
+ ui). Given the stated assumptions, MLR1-4 hold for this model and the OLS estimator of the
slope parameter is unbiased (from ). However, MLR5 does not hold since the conditional
variance of the overall disturbance, var(bixi + ui), will be dependent on xi , and the usual OLS
standard errors will not be valid.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 9.1
There is functional form misspecification if 6 0 or 7 0, where these are the population
parameters on ceoten2 and comten2, respectively. Therefore, we test the joint significance of
these variables using the R-squared form of the F test:
F = [(.375 .353)/2]/[(1 .375)/(177 8)] = 2.97.
With 2 and df, the 10% critical value is 2.30, while the 5% critical value is 3.00. Thus, the
p-value is slightly above .05, which is reasonable evidence of functional form
misspecification. (Of course, whether this has a practical impact on the estimated partial
effects for various levels of the explanatory variables is a different matter.)

Q2. Wooldridge 9.3


(i) Eligibility for the federally funded school lunch program is very tightly linked to being
economically disadvantaged. Therefore, the percentage of students eligible for the lunch
program is very similar to the percentage of students living in poverty.
(ii) We can use our usual reasoning on omitting important variables from a regression
equation. The variables log(expend) and lnchprg are negatively correlated: school districts
with poorer children spend, on average, less on schools. Further, 3 < 0. From Table 3.2,
omitting lnchprg (the proxy for poverty) from the regression produces an upward biased
estimator of 1 [ignoring the presence of log(enroll) in the model]. So when we control for
the poverty rate, the effect of spending falls.
(iii) Once we control for lnchprg, the coefficient on log(enroll) becomes negative and has a t
of about 2.17, which is significant at the 5% level against a two-sided alternative. The
coefficient implies that math10 (1.26/100)(%enroll) = .0126(%enroll). Therefore,
a 10% increase in enrollment leads to a drop in math10 of .126 percentage points.

24

(iv) Both math10 and lnchprg are percentages. Therefore, a ten percentage point increase in
lnchprg leads to about a 3.23 percentage point fall in math10, a sizeable effect.
(v) In column (1) we are explaining very little of the variation in pass rates on the MEAP
math test: less than 3%. In column (2), we are explaining almost 19% (which still leaves
much variation unexplained). Clearly most of the variation in math10 is explained by
variation in lnchprg. This is a common finding in studies of school performance: family
income (or related factors, such as living in poverty) are much more important in explaining
student performance than are spending per student or other school characteristics.

Q3. Wooldridge 9.5


The sample selection in this case is arguably endogenous. Because prospective students may
look at campus crime as one factor in deciding where to attend college, colleges with high
crime rates have an incentive not to report crime statistics. If this is the case, then the
chance of appearing in the sample is negatively related to u in the crime equation. (For a
given school size, higher u means more crime, and therefore a smaller probability that the
school reports its crime figures.)

Q4. Wooldridge C9.3 (jtrain_c9_3.do)


(i) If the grants were awarded to firms based on firm or worker characteristics, grant could
easily be correlated with such factors that affect productivity. In the simple regression
model, these are contained in u.
(ii) The simple regression estimates using the 1988 data are
log(scrap) = .409 + .057 grant
(.241) (.406)
2
n = 54, R = .0004.
The coefficient on grant is actually positive, but not statistically different from zero.
(iii) When we add log(scrap87) to the equation, we obtain
log(scrap88) = .021 .254 grant88 + .831 log(scrap87)
(.089) (.147)
(.044)
2
n = 54, R = .873,
where the years are indicated as suffixes to the variables for clarity. The t statistic for H0:
grant = 0 is .254/.147 1.73. We use the 5% critical value for 40 df in Table G.2: -1.68.
Because t = 1.73 < 1.68, we reject H0 in favor of H1: grant < 0 at the 5% level.
(iv) The t statistic is (.831 1)/.044 3.84, which is a strong rejection of H0.
(v) With the heteroskedasticity-robust standard error, the t statistic for grant88 is
.254/.142 1.79, so the coefficient is even more significantly less than zero when we use
the heteroskedasticity-robust standard error. The t statistic for H0: log(scrap87) = 1 is (.831
1)/.071 2.38, which is notably smaller than before, but it is still pretty significant.

25

Week 11 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What are the main features of time series data? How do time series data differ from crosssectional data?
The main features include the following. First, time series data have a temporal ordering,
which matters in regression analysis because of the second feature. Second, many economic
time series are serially correlated (or autocorrelated), meaning that future observation are
dependent on present and past observations. Third, many economic time series contain
trends and seasonality. In comparison to cross-sectional data, the major difference is that
time series data are not random sample. Recall that, for a random sample, observations
are required to be independent of one another.
What is a stochastic process and its realisation?
A stochastic process (SP) is a random variable that depends on the time index. At any fixed
point in time, the SP is a random variable (hence has a distribution). The SP can be viewed
as a random curve with the horizontal axis being the time index, where the outcome of the
underlying experiment is a curve (or a time series plot). The time series we observe is
viewed as a realisation (or outcome) of the random curve or the SP.
What is serial correlation or autocorrelation?
The serial (or auto) correlation of a time-series variable is the correlation between the
variable at one point in time and the same variable at another point in time. The serial (or
auto) correlation of yt is usually denoted as Corr(yt, yt-h) = Cov(yt, yt-h)/[Var(yt)Var(yt-h)]1/2.
What is a finite distributed lag model? What is the long-run propensity (LRP)? How would
you estimate the LRP and the associated standard error (say in STATA)?
Read ie_Slides10 pages 5-7. Read Section 10.2. Try Wooldridge 10.3.
What are TS1-6 (assumptions about time series regression)? How do they differ from the
assumptions in MLR1-6?
TS1-6 include: 1) linear in parameter; 2) no perfect collinearity; 3) strict zero conditional
mean; 4) homoskedasticity; 5) no serial correlation; 6) normality. These differ from MLR1-6
mainly in that MLR2 (random sampling) do not hold for time series data. To ensure the
unbiasedness of the OLS estimators, TS3 is needed. To ensure the validity of OLS standard
errors, t-stat and F-stat for statistical inference, TS3-TS6 are needed. These are very strong
assumptions and can be relaxed when sample sizes are large.
What are strictly exogenous regressors and contemporaneously exogenous regressors?
Strictly exogenous regressors satisfy the assumption TS3, whereas the contemporaneously
exogenous regressors only satisfy the condition (10.10) (or (z10) of the Slides) that is
weaker than TS3.
What is a trending time series? What is a time trend?
A time series is trending if it has the tendency of growing (or shrinking) over time. A time
trend is a function of time index (eg, linear, quadratic, etc). A time trend can be used to
mimic (or model) the trending component of a time series.
Why may a regression with trending time series produce spurious results?
First, two time trends are always correlated because they grow together in the same (or
opposite) direction. Now consider two unrelated time series, each of which contains a time
trend. Since the time trends in the two series are correlated, regressing one on the other
will produce a significant slope coefficient. Such statistical significance is spurious because

26

it is purely induced by the time trends and has little to say about the true relationship
between the two time series.
Why would you include a time trend in regressions with trending variables?
Including a time trend in regressions allows one to study the relationship among time series
variables that is not induced by the time trends in the time series.
What is seasonality in a time series? Give an example of time series variable with seasonality.
Seasonality in a time series is the fluctuation that repeats itself every year (or every week, or
every day). An example would be the daily revenue a pub, which is likely to have week-day
seasonality.
For quarterly data, how would you define seasonal dummy variables for a regression model?
We need to define three dummies. For instance, if we take the first quarter as the base, we
define dummy variables Q2, Q3 and Q4 for the second, third and fourth quarters and include
them in the regression model.

Problem Set
Q1. Wooldridge 10.1
(i) Disagree. Most time series processes are correlated over time, and many of them strongly
correlated. This means they cannot be independent across observations, which simply
represent different time periods. Even series that do appear to be roughly uncorrelated
such as stock returns do not appear to be independently distributed, as they have dynamic
forms of heteroskedasticity.
(ii) Agree. This follows immediately from Theorem 10.1. In particular, we do not need the
homoskedasticity and no serial correlation assumptions.
(iii) Disagree. Trending variables are used all the time as dependent variables in a
regression model. We do need to be careful in interpreting the results because we may
simply find a spurious association between yt and trending explanatory variables. Including
a trend in the regression is a good idea with trending dependent or independent variables.
As discussed in Section 10.5, the usual R-squared can be misleading when the dependent
variable is trending.
(iv) Agree. With annual data, each time period represents a year and is not associated with
any season.
Q2. Wooldridge 10.2
We follow the hint and write
gGDPt-1 = 0 + 0 intt-1 + 1intt-2 + ut-1,
and plug this into the right-hand-side of the intt equation:
int t = 0 + 1( 0 + 0int t-1 + 1int t-2 + ut-1 3) + vt
= (0 + 1 0 3 1) + 1 0int t-1 + 1 1int t-2 + 1ut-1 + vt .
Now by assumption, u t-1 has zero mean and is uncorrelated with all right-hand-side
variables in the previous equation, except itself of course. So
Cov(intt , ut-1) = E(int t ut-1) = 1E(ut-12) > 0
because 1 > 0. If 2 = E(ut2 ) for all t then Cov(intt , ut-1) = 1 2. This violates the strict
exogeneity assumption, TS3. While u t is uncorrelated with intt, intt-1, and so on, ut is
correlated with intt+1.

27

Q3. Wooldridge 10.7


(i) pet-1 and pet-2 must be increasing by the same amount as pet.
(ii) The long-run effect, by definition, should be the change in gfr when pe increases
permanently. But a permanent increase means the level of pe increases and stays at the new
level, and this is achieved by increasing pet-1, pet-1, and pet by the same amount.

Q4. Wooldridge C10.10 (intdef_c10_10.do)


(i) The sample correlation between inf and def is only about .098, which is pretty small.
Perhaps surprisingly, inflation and the deficit rate are practically uncorrelated over this
period. Of course, this is a good thing for estimating the effects of each variable on i3, as it
implies almost no multicollinearity.
(ii) The equation with the lags is
i3t = 1.61 + .343 inf t + .382 inf t-1 .190 def t + .569 def t-1
(0.40) (.125)
(.134)
(.221)
(.197)
2
2
n = 55, R = .685, adj.R = .660.
(iii) The estimated LRP of i3 with respect to inf is .343 + .382 = .725, which is somewhat
larger than .606, which we obtain from the static model in (10.15). But the estimates are
fairly close considering the size and significance of the coefficient on inft-1.
(iv) The F statistic for significance of inf t-1 and def t-1 is about 5.22, with p-value .009. So
they are jointly significant at the 1% level. It seems that both lags belong in the model.

28

Week 12 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What is a strictly stationary stochastic process (SP)? What is a covariance stationary SP?
A SP is strictly stationary if its joint distribution (at arbitrary points in time) is invariant
under time shifts, meaning that, for example, the joint distribution at the time points (t1, t2,
t3) is the same as the joint distribution at (t1+h, t2+h, t3+h) for any shift h. On the other hand,
a SP is covariance stationary if it has a finite variance and its mean and autocovariances are
independent of the time index. In other words, a covariance stationary SP has an invariant
structure for the mean and autocovariances.
In this course, what do we mean by weakly dependent (WD) time series?
By WD time series, we mean the time series to which the LLN and CLT apply. The WD time
series is not independent but is not far away from being independent either, so that the
LLN and CLT still apply.
What is a dynamically complete (DC) model?
A regression model is DC if the lags of the dependent variable (yt) and the lags of the
explanatory variables (xt) do not help to predict the dependent variable. In math notation, a
DC model satisfies E(yt | xt, yt-1, xt-1, ) = E(yt | xt). That is, once xt is controlled for, the lags of
(yt, xt) have no explanatory or predictive power.
How would you test for the serial correlation in the disturbance of a regression model?
Use the Breusch-Godfrey LM test. The test involves the following steps: (i) run the original
regression and save the residuals; (ii) regress the residual on the original explanatory
variables and q lags of the residual and save the R-squared; (iii) the LM statistic, equal to the
product of the sample size and the R-squared from (ii), follows the Chi-squared distribution
with q df under the null hypothesis of no serial correlation; (iv) reject the null if the LM
statistic is too large in comparison to the Chi-square, q df, critical value. A rule of thumb is
that q is in the neighbourhood of [sample size]1/4.
Will the usual OLS standard errors be valid when the disturbance has autocorrelation?
No, the OLS standard errors, the usual t-stat and F-stat are all invalid in the presence of
autocorrelated disturbance. A remedy is to use the Newey-West standard errors that are
robust to the autocorrelated disturbance.
What is a random walk? What are the main properties of the random walk? What are the
properties of the difference of the random walk?
A random walk yt is defined as the cumulation of an iid random sequence {et}:
E(et) = 0,
Var(et) = 2, t = 1, 2, ....
yt = yt-1 + et = et + et-1 + ... + e1 + y0,
The main properties include (i) its variance grows linearly with the time index; (i) the
conditional mean of future yt+h given the current yt is always the current yt, no matter how
distant the future is (or how large h is); (iii) the autocorrealtions are close to one. The
difference of the random walk is simply the iid random variable et, which is of course strictly
stationary, zero mean and finite variance.
What are I(1) and I(0) time series? How do we decide whether a time series is I(1) or I(0)?
A nonstationary time series yt is called I(1) if its difference yt is stationary. A stationary
time series is called I(0). The augmented Dickey-Fuller (ADF) test may be used to detect
whether a time series is I(1) or I(0). The ADF test involves regressing yt on {1, t, yt-1, yt-1,...,
yt-q}. The test statistic is the t-ratio on yt-1 that follows the Dickey-Fuller distribution under
the null of a unit root (ie, the time series is I(1)). The null is rejected in favour of the

29

alternative (that the time series is I(0)) if the t-ratio is too negative in comparison to the
Dickey-Fuller critical value.
What is a stochastic trend? What is a spurious regression?
The stochastic trend refers to the random-walk component in a time series (if there is one).
A regression is spurious if the seemingly-significant estimation result is merely a reflection
of that the dependent variable and the explanatory variables have unrelated stochastic
trends and has little to do with the true relationship among these variables.
Can you explain the notion of cointegration?
A set of nonstaionary time series is cointegrated if there is a linear combination of these
time series that is stationary. For example, the logs of the aggregate investment and income
of an economy are nonstationary. We say that log(investment) and log(income) are
cointegrated if there is a constant such that log(investment) log(income) is stationary.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 11.1


Because of covariance stationarity, 0 = Var(xt) does not depend on t, so Var(xt+h) = 0 for any
h 0. By definition, Corr(xt, xt+h) = Cov(xt, xt+h)/[Var(xt)Var(xt+h)]1/2 = h/ 0.
Q2. Wooldridge 11.4
Assuming y0 = 0 is a special case of assuming y0 nonrandom, and so we can obtain the
variances from (11.21): Var(yt) = 2 t and Var(yt+h) = 2(t + h), h > 0. Because E(yt) = 0 for all
t (since E(y0) = 0), Cov(yt, yt+h) = E(yt yt+h) and, for h > 0,
E(yt yt+h) = E[(et + et-1 + e1)(et+h + et+h-1 + + e1)]
= E(et2) + E(et-12) + + E(e12) = 2t,
where we have used the fact that {et} is a pairwise uncorrelated sequence. Therefore,
Corr(yt, yt+h) = Cov(yt, yt+h)/ [Var(yt)Var(yt+h)]1/2 = [t/(t+h)] 1/2.

Q3. Wooldridge 11.6


(i) The t statistic for H0: 1 = 1 is t = (1.104 1)/.039 2.67. Although we must rely on
asymptotic results, we might as well use df = 120 in Table G.2. So the 1% critical value
against a two-sided alternative is about 2.62, and so we reject H0: 1 = 1 against H1: 1 1 at
the 1% level. It is hard to know whether the estimate is practically different from one
without comparing investment strategies based on the theory (1 = 1) and the estimate
(1.104). But the estimate is 10% higher than the theoretical value.
(ii) The t statistic for the null in part (i) is now (1.053 1)/.039 1.36, so H0: 1 = 1 is no
longer rejected against a two-sided alternative unless we are using more than a 10%
significance level. But the lagged spread is very significant (contrary to what the
expectations hypothesis predicts): t = .480/.109 4.40. Based on the estimated equation,
when the lagged spread is positive, the predicted holding yield on six-month T-bills is above
the yield on three-month T-bills (even if we impose 1 = 1), and so we should invest in sixmonth T-bills.
(iii) This suggests unit root behavior for {hy3t}, which generally invalidates the usual ttesting procedure.
(iv) We would include three quarterly dummy variables, say Q2t, Q3t , and Q4t, and do an F
test for joint significance of these variables. (The F distribution would have 3 and 117 df.)

30

Q4. Wooldridge C11.7 (consump_c11_7.do)


(i) If E(gct|It-1) = E(gct), that is, E(gct|It-1) = does not depend on gct-1, then 1 = 0 in gct = 0 +
1gct-1 + ut. So the null hypothesis is H0: 1 = 0 and the alternative is H1: 1 0. Estimating the
simple regression using the data in CONSUMP.RAW gives
gct = .011 + .446 gct-1
(.004) (.156)
n = 35, R2 = .199.
The t statistic for 1 is about 2.86, and so we strongly reject the PIH. The coefficient on gct-1
is also practically large, showing significant autocorrelation in consumption growth.
(ii) When gyt-1 and i3t-1 are added to the regression, the R-squared becomes about .288. The
F statistic for joint significance of gyt-1 and i3t-1, obtained using the STATA test command, is
1.95, with p-value .16. Therefore, gyt-1 and i3t-1 are not jointly significant at even the 15%
level.

31

Week 13 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)

What are pooled cross sections? How do they compare to a single cross sectional sample?
Pooled cross sections consist of a number of cross-sectional samples that are obtained at
different points in time, usually in different years. For pooled cross sections, the individuals
in one cross-sectional sample may be different from the individuals in another crosssectional sample. Similar to a single cross-sectional sample (with observations on the
characteristics of randomly selected individuals at the same point in time), the observations
in pooled cross sections are independent. However, because the observations are collected
at different point in time, they should generally be treated as coming from different
populations (or having different distributions). This is a reason why yearly dummies are
usually included in the regression with pooled cross sections.
What is policy analysis? Give an example.
The goal of policy (or event) analysis is to quantify the causal effect of a policy on a
dependent variable. For example, it is interesting to find out the effect of the government
stimulus package on individual expenditures. A policy (eg. stimulus package) can be viewed
as a natural experiment in the sense that it alters the environment of a group of individuals
(eg. stimulus package receivers) but not others (eg. non-receivers). With data sets collected
before and after the policy, the difference in the differences of the treatment group and
control group after and before the policy can be estimated.
How are pooled cross sections used for policy analysis? What is the difference-in-difference
estimator? Estimating what?
With pooled cross sections before and after the policy implementation, the policy analysis
can be carried out in a regression model, where the after-policy dummy and the treatment
group dummy are key variables. In this framework, the parameter of interest (the effect of
the policy on the treatment group) is the coefficient of the interaction of the after-policy
dummy and the treatment group dummy. The estimator of this coefficient is known as
difference-in-difference estimator (of course, estimating the ceteris paribus effect of the
policy on the treatment group), as it is the difference of the treatment control after the
policy and the treatment control before the policy.
What is a set of panel data? How does it differ from a set of pooled cross sections?
A set of panel data consists of the observations on the characteristics of individuals at a
number of different points in time, where the individuals are randomly-selected initially and
are followed subsequently through the whole sampling duration. Therefore, unlike pooled
cross sections, panel data are collected from the same individuals at different points in time.
What is an unobserved (or fixed) effect model? What does the unobserved (or fixed) effect
represent? Is there a fixed effect in the usual cross sectional regression?
A fixed effect model postulates that the effect (on a dependent variable) of the unobserved
factors that are specific to each individual can be summarised into a time-invariant term (ai),
known as unobserved or fixed effect, in the panel regression model. The fixed effect ai
represents the impact of the unobserved individual-specific factors on the dependent
variable. In the usual cross sectional regression, the fixed effect is absorbed into the
disturbance term and, if it is correlated with the regressors in the model, it will make the
usual OLS estimation biased.

32

What is the first-differenced estimation for two-period panel? Dose the validity of the firstdifferenced estimation rely on the assumption that the unobserved or fixed effect is
uncorrelated with the regressors?
With a two-period panel data set, because the fixed effect does not change over time, it can
be eliminated by differencing, ie, subtracting the equation of the first period from that of the
second period. The resultant equation is a linear regression model in differenced variables.
Hence the original parameters of the panel model can be estimated by regressing the
differenced dependent variables on the differenced explanatory variables. This approach is
known as the first-differenced estimation. Its validity does not rely on the assumption that
the fixed effect is uncorrelated with the regressors since the fixed effect is removed from the
first-differenced equation.
Can we estimate the unobserved or fixed effect with a two-period panel?
While the fixed effects (ai, i = 1, , N) can be viewed as parameters, they cannot be estimated
with reasonable accuracy for a small T (number of time periods) panel data set because
there are too many of them. For example, for a panel with N = 1000 individuals and T = 2
time periods, there are 1000 fixed effects, which are difficult to estimate with only NT =
2000 observations.
Comment on the advantage of using panel data for policy analysis.
The main advantage of using panel data is the fact that the unobserved individual-specific
factors, which in a single cross section regression will bias the OLS estimation, can be
removed by differencing. Hence, for policy analysis, panel data make it possible to properly
estimate the ceteris paribus effect of the key variables in the presence of unobserved
individual-specific factors.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 13.1


Without changes in the averages of any explanatory variables, the average fertility rate fell
by .545 between 1972 and 1984; this is simply the coefficient on y84. To account for the
increase in average education levels, we obtain an additional effect: .128(13.3 12.2)
.141. So the drop in average fertility if the average education level increased by 1.1 is .545
+ .141 = .686, or roughly two-thirds of a child per woman.

Q2. Wooldridge 13.2


The first equation omits the 1981 year dummy variable, y81, and so does not allow any
appreciation in nominal housing prices over the three year period in the absence of an
incinerator. The interaction term in this case is simply picking up the fact that even homes
that are near the incinerator site have appreciated in value over the three years. This
equation suffers from omitted variable bias.
The second equation omits the dummy variable for being near the incinerator site, nearinc,
which means it does not allow for systematic differences in homes near and far from the site
before the site was built. If, as seems to be the case, the incinerator was located closer to less
valuable homes, then omitting nearinc attributes lower housing prices too much to the
incinerator effect. Again, we have an omitted variable problem. This is why equation (13.9)
(or, even better, the equation that adds a full set of controls), is preferred.
Q3. Wooldridge 13.7
(i) It is not surprising that the coefficient on the interaction term changes little when
afchnge is dropped from the equation because the coefficient on afchnge in (3.12) is

33

only .0077 (and its t statistic is very small). The increase from .191 to .198 is easily
explained by sampling error.
(ii) If highearn is dropped from the equation [so that 1 = 0 in (3.10)], then we are assuming
that, prior to the change in policy, there is no difference in average duration between high
earners and low earners. But the very large (.256), highly statistically significant estimate on
highearn in (3.12) shows this presumption to be false. Prior to the policy change, the high
earning group spent about 29.2% [exp(.256) 1 .292] longer on unemployment
compensation than the low earning group. By dropping highearn from the regression, we
attribute to the policy change the difference between the two groups that would be
observed without any intervention.

Q4. Wooldridge C13.5 (rental_c13_5.do)


(i) Using pooled OLS we obtain
log(rent) = .569 + .262 d90 + .041 log(pop) + .571 log(avginc) + .0050 pctstu
(.535) (.035)
(.023)
(.053)
(.0010)
n = 128, R2 = .861.
The positive and very significant coefficient on d90 simply means that, other things in the
equation fixed, nominal rents grew by over 26% over the 10 year period. The coefficient on
pctstu means that a one percentage point increase in pctstu increases rent by half a percent
(.5%). The t statistic of five shows that, at least based on the usual analysis, pctstu is very
statistically significant.
(ii) The standard errors from part (i) are not valid, unless we think ai does not really appear
in the equation. If ai is in the error term, the errors across the two time periods for each city
are positively correlated, and this invalidates the usual OLS standard errors and t statistics.
(iii) The equation estimated in differences is
log(rent)= .386 + .072 log(pop) + .310log(avginc) + .0112 pctstu
(.037) (.088)
(.066)
(.0041)
n = 64, R2 = .322.
Interestingly, the effect of pctstu is over twice as large as we estimated in the pooled OLS
equation. Now, a one percentage point increase in pctstu is estimated to increase rental
rates by about 1.1%. We obtain larger standard errors when we difference (but the OLS
standard errors from part (i) are likely to be much too small because of the positive serial
correlation in the errors within each city). Warning: while we have differenced away ai,
there may still be other unobservables that change over time and are correlated with
pctstu.
(iv) The heteroskedasticity-robust standard error on pctstu is about .0028, which is
actually much smaller than the usual OLS standard error. This only makes pctstu even more
significant (robust t statistic 4). Note that serial correlation is no longer an issue because
we have no time component in the first-differenced equation.

34

You might also like