You are on page 1of 4

Chapter 2: The linear model

Set of Exercises
Exercise 1. Gender wage gap
We consider the Mincer equation:
0

yi = si + xi + ui ,

i = 1, ..., N

where yi =ln(wi ) are log wages, si = 1 if the individual i is a male, zero if not, and xi gathers
individual characteristics such as labor market experience, education, ..., and a constant.
0

We assume that (yi , si , xi ), i = 1, ..., N is a random sample, and that ui is independent of si


and xi .
We are interested in the quantity:
!

= 100E

E(wi |si = 1, xi )
1
E(wi |si = 0, xi )

1. Interpret .
Hint: you may start by considering
!
E(wi |si = 1, xi )
1
(xi ) = 100
E(wi |si = 0, xi )
2. Compute as a function of only:
= f ()
where you will give the expression of f .
3. We estimate and by regressing yi on si and xi . Let
be the OLS estimate of .
Then we estimate as:
= f (

)
is a consistent estimate of .
Show that
4. Let
be the robust asymptotic standard error of
. Show that the robust asymptotic

standard error of is:


100exp(
)

Hint: use the delta-method.

Exercise 2. Grouped data


We consider the classical regression model:
V ar(y|X) = 2 IN

E(y|X) = X,

where there are K regressors and N observations.


We assume here that the observations yi , xi are grouped into J groups of size n1 , ..., nJ , and
that we only observe the means of y and X in the groups:
X
X
yj = n1j
yi ,
xj = n1j
xi
ij

ij

We construct a Jx1 vector y and a JxK matrix X .


1. Show that
E(y |X ) = X ,

V ar(y |X ) = DN

where

2
n1

DN = 0
0

0
...
0

2
nJ

Hint: find a matrix M such that y = M y and X = M X.


2. Show that
GLS =

J
X

!1
nj xj xj

J
X

j=1

nj xj yj

j=1

Interpret.
3. If we estimate by OLS from the grouped data, how do we have to correct standard
errors?
Exercise 3. Household data
We want to estimate in the classical regression model:
E(y|X) = X,

V ar(y|X) = 2 I2N

where i = 1, ..., 2N are individual observations.


However, we do not dispose of individual data. Instead, we observe data taken at the
household level. It is assumed that each household comprises two individuals. We observe
xj and yj , j = 1, ..., N , which are the average values in each household. Sample size N is
1000.
We regress yj on xj by OLS, and use the standard formula to compute the standard error.

1. Give the value of V ar(y |X ), where y is the N x1 vector of yj , and X is the N xK


0
matrix of (xj ) , as a function of 2 .
2. Is the way we have computed the standard error correct?
3. What is the (infeasible) GLS estimate of the education coefficient in the regression?
4. In fact, half of the households in the sample comprise one single person. Does this
finding modify the previous results?
Explain how you would compute the GLS estimator in this case.
Exercise 4. Estimation with parameterized conditional heteroskedasticity
A researcher is interested in the following model:
0

yi = xi + ui
0

where xi is a vector of K regressors, observations are iid, E(ui |xi ) = 0, and E(u2i |xi ) = xi .
1. Assume first that is known. Show that the GLS estimator of writes:
!1 N
N
X 1
X
1
0
x
x
xi yi
GLS =
i i
0
0
x

i
i
i=1
i=1
2. Give the expression of the asymptotic variance of GLS .
From now on, we asume that is not known. Then the researcher proposes to
estimate the parameters in 2 steps:
Step 1: Regress yi on xi by OLS, and compute the prediction error ui . Then
.
regress u2i on xi , again by OLS. This yields an estimate for , say
Step 2: Estimate by weighted least squares, proceeding as if
were the true .

This yields .
3. Show that:
plim
N

N
X

!1
0

xi xi

i=1

N
X

xi u2i =

i=1

4. Show that
is a consistent estimator of , under the condition E(x3i ) < . In this
question, you will assume K = 1 to simplify the notation.
5. Show that is a consistent estimator of . It is sufficient to give an intuition of the
proof.
Remark: it can be shown that

plim N GLS
N

!
=0

so that and GLS are asymptotically equivalent.


6. The researcher then changes her mind, and considers the following model:
0

yi = xi + ui
0

where E(ui |xi ) = 0, and E(u2i |xi ) = exp(xi ).


Why is this specification? Propose a way to estimate efficiently.
Hint: Recall the Nonlinear Least Squares estimation method.

You might also like