You are on page 1of 54

KAN-COECO1054U - 2014 1

Endogeneity
2014/2015, Term 1
Ralf A. Wilke
Copenhagen Business School
Main references:
Wooldridge, 2010, Chapter 5, parts of Chapter 6
Several (sub) chapters of Wooldridge, 2009, mainly Chapter 15
KAN-COECO1054U - 2014 2
KAN-COECO1054U - 2014 3
Suppose we have a linear regression model:
Definition: Exogeneity and Endogeneity of Independent
Variables.
is exogenous if it is uncorrelated with u.
is endogenous if it is correlated with u.
OLS estimation of the linear regression model requires
exogeneity of for consistency.
x
j
x
j
x
j
y =
0
+
1
x
1
+ ... +
k
x
k
+ u
Endogeneity can be caused by many things.
An important variable that is not observed and omitted
Functional form misspecification
Simultaneity
Measurement error in the regressors
...
Endogeneity is present in most applications in applied
economic research.
KAN-COECO1054U - 2014 4
Omitted Variables
Omission of a relevant variable .
Suppose
with and .
Now, if and ,
we do not have and we would have an
omitted variable bias.
Solution: Instrumental Variable, Proxy Variable
KAN-COECO1054U - 2014 5
KAN-COECO1054U - 2014 6
Remark: Omitted Variable Bias
What happens if we omit a variable that actually belongs
to the true model?
Let k=2:
and remark:
Then:
y =

0
+

1
x
1
+

2
x
2
y =

0
+

1
x
1
x
2
=

0
+

1
x
1

1
=

1
+

1
Bias(

1
) = E(

1
)
1
= E(

1
+

1
)
1
=
1
+
2

1
=
2

1
non-random
unbiased

i
KAN-COECO1054U - 2014 7
The case of k independent variables
Similar to k=2
if the true parameter for the omitted variable is zero, then
estimates are still unbiased.
This implies that if we include irrelevant variables in the model, the
estimators are still unbiased.
If we omit irrelevant variables this typically decreases the variance
of the estimators.
If the omitted variable is uncorrelated with all other independent
variables, the estimators are still unbiased
If the omitted variable is correlated with at least one independent
variable, this can cause a bias for all estimates.
It does not happen if the remaining independent variables are
uncorrelated
KAN-COECO1054U - 2014 8
Using a Proxy Variable for Unobserved Explanatory
Variables
A more difficult problem arises when a model excludes a
key variable, usually because of data unavailability.
Example: Return to Education
the population model is:
Suppose we do not observe the ability. Ignoring abil would
generally give biased and inconsistent estimates of the return to
education.
We expect an upward bias for the estimated return to education.
Why?
How can we solve or at least mitigate this omitted variable
problem?
log(wage) =
0
+
1
educ +
2
exper +
3
abil + u
KAN-COECO1054U - 2014 9
One possibility is to use a proxy variable for the omitted
variable.
Something that is related to the unobserved variable.
In the wage equation one could use the intelligence
quotient, or IQ as a proxy for ability. IQ and ability do not
need to be the same, but they need to be correlated.
Suppose we have the model
with being unobserved. We have a proxy variable, which we call
What do we require of ?
When we would run the regression , we
should obtain . Otherwise the proxy is not good.
y =
0
+
1
x
1
+
2
x
2
+
3
x

3
+ u
x

3
x
3
x
3
x

3
=
0
+
3
x
3
+
3

3
> 0
KAN-COECO1054U - 2014 10
The proposal is just to regress y on as and
would be the same. This is called the plug-in solution
to the omitted variables problem.
Since and are not the same: when this procedure
does in fact give consistent estimators for and ?
The assumptions are with respect to u and :
1. In addition to assuming that u and are uncorrelated,
we need that u and are uncorrelated. This means that is
irrelevant in the population model once are included.
2. The error is uncorrelated with and . This means
that is a good proxy for :
x
1
, x
2
, x
3
x
3
x

1

2

3
x
3
x
1
, x
2
, x

3
x
3
x
1
, x
2
, x

3
x
3
x

3
x
3
x
1
, x
2
x
3
x

3
E(x

3
|x
1
, x
2
, x
3
) = E(x

3
|x
3
)
KAN-COECO1054U - 2014 11
From the latter assumption follows, that
.
In terms of our wage equation this means:
thus the average value of ability only changes with IQ.
More formally, what are the implications of the two
assumptions?
E(x

3
|x
3
) =
0
+
3
x
3
E(abil|educ, exper, IQ) = E(abil|IQ) =
0
+
3
IQ
KAN-COECO1054U - 2014 12
By combining
we obtain:
now let us denote as the composite error.
And note that u and have both zero mean and each is
uncorrelated with and . Then e has also zero mean and
is uncorrelated with and .
For this reason, we can write
This gives unbiased (or consistent) estimates of and
We do not get unbiased estimates for .
In an application may even be of more interest than .
y =
0
+
1
x
1
+
2
x
2
+
3
x

3
+ u
x

3
=
0
+
3
x
3
+
3
y = (
0
+
3

0
) +
1
x
1
+
2
x
2
+
3

3
x
3
+ u +
3

3
e = u +
3

3
x
3
x
1
, x
2
x
3
x
1
, x
2

3
y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+ e

0
,
1
,
2

3

0
,
3

3
Functional form misspecification
Special case: omission of a relevant variable .
Suppose
with and .
Now, since and if ,
we do not have and we would have a bias
due to functional form misspecification.
Solution: Test for functional form (RESET), Non- and
semiparametric methods.
KAN-COECO1054U - 2014 13
E(u|x
1
, x
2
1
) = 0
RESET test for model specification
How do we know whether we have assumed the correct
functional form?
For example: have we included all relevant quadratics and
interaction terms?
By noting that and are highly nonlinear functions of
all regressors and their interactions, we could use the
fitted values of the model above to compute and .
Then we estimate
and perform an F-test for joint significance of and :
KAN-COECO1054U - 2014 14
y =
0
+
1
x
1
+ ... +
k
x
k
+ u
y
2
y
3
y
2
y
3
y
2
y
3
y =
0
+
1
x
1
+ ... +
k
x
k
+
1
y
2
+
2
y
3
+ u
H
0
:
1
=
2
= 0
KAN-COECO1054U - 2014 15
Simultaneity
If an explanatory variable is determined simultaneously
with the dependent variable, it is generally correlated
with the error terms.
In this case OLS is biased and inconsistent.
As an example we consider two equations (structural
equations) without an intercept:
the variables and are exogenous. We focus on
estimation of the first equation.
y
1
=
1
y
2
+
1
z
1
+ u
1
y
2
=
2
y
1
+
2
z
2
+ u
2
z
2
z
1
KAN-COECO1054U - 2014 16
To show that the dependent variables are generally
correlated with the error terms (e.g. with ), we plug
in the first equation for into the second:
or
In order to solve for we have to assume
It depends on the application whether this is restrictive.
y
1
=
1
y
2
+
1
z
1
+ u
1
y
2
=
2
y
1
+
2
z
2
+ u
2
y
2
y
2
=
2
(
1
y
2
+
1
z
1
+ u
1
) +
2
z
2
+ u
2
(1
2

1
)y
2
=
2

1
z
1
+
2
z
2
+
2
u
1
+ u
2
y
2

1

2
6= 1.
y
2
u
1
KAN-COECO1054U - 2014 17
Can be rewritten as:
where
This is a reduced form equation for .
are reduced form parameters.
is linear in . For this reason, it is uncorrelated
with . We can apply OLS to estimate
There is an equivalent reduced form equation for .
y
2
=
21
z
1
+
22
z
2
+
2

21
=
2

1
/(1
2

1
)

22
=
2
/(1
2

1
)

2
= (
2
u
1
+ u
2
)/(1
2

1
)
y
2

21
,
22

2
u
1
and u
2
z
1
and z
2

21
,
22
.
y
1
(1
2

1
)y
2
=
2

1
z
1
+
2
z
2
+
2
u
1
+ u
2
KAN-COECO1054U - 2014 18
We can use this equation to show that OLS estimation of
the structural equations will generally result in biased
and inconsistent estimates for and :
We need that and are uncorrelated.
From the reduced form equation, we see that and
are correlated if are correlated. Since is linear
in , it is generally correlated with .
When is it not correlated?
If and if and are uncorrelated.
In this case is not simultaneously determined with .
Moreover, the error in determining is not correlated with .
y
2
=
21
z
1
+
22
z
2
+
2
y
2
=
21
z
1
+
22
z
2
+ (
2
u
1
+ u
2
)/(1
2

1
)

j

j
y
1
=
1
y
2
+
1
z
1
+ u
1
y
2
u
1
u
1
y
2

2
and u
1

2
u
1
u
1

2
= 0 u
1
u
2
y
2
y
1
y
2
y
1
KAN-COECO1054U - 2014 19
When is correlated with because of simultaneity, we
say OLS suffers from simultaneity bias and it is
inconsistent.
Obtaining the direction of the bias is generally
complicated. Simple expressions of the bias can be
derived under additional assumptions but this is not
covered here.
Solution: Instrumental Variable estimation
y
2
u
1
KAN-COECO1054U - 2014 20
Measurement error in an explanatory variable
We consider the simple regression model:
and assume that it satisfies the Gauss Markov
assumptions.
We do not observe but (e.g. actual and reported
income).
The measurement error in the population is:
We assume:
Moreover, we assume that u is uncorrelated with and
y =
0
+
1
x

1
+ u
x

1
x
1
e
1
= x
1
x

1
E(e
1
) = 0
x
1
x

1
:
E(y|x
1
, x

1
) = E(y|x

1
)
KAN-COECO1054U - 2014 21
The model can be written as:
The classical errors-in-variables (CEV) assumption is
that the measurement error is uncorrelated with the
unobserved explanatory variable:
This has the meaning that the observed measure consists of
two uncorrelated components:
(We still assume that u is uncorrelated with and .)
The above assumption implies that and must be correlated:
This correlation causes problems for the OLS estimation.
cov(x

1
, e
1
) = 0
x
1
x

1
x
1
e
1
cov(x
1
, e
1
) = E(x
1
e
1
) = E(x

1
e
1
) + E(e
2
1
) = 0 +
2
e
1
=
2
e
1
x
1
x
1
= x

1
+ e
1
y =
0
+
1
x
1
+ (u
1
e
1
)
KAN-COECO1054U - 2014 22
This implies for our model
that since u and are uncorrelated, the covariance between
and the composite error is:
Note also that
Then one can show:
This equation is very interesting: is always closer to
zero than : attenuation bias
y =
0
+
1
x
1
+ (u
1
e
1
)
cov(x
1
, u
1
e
1
) =
1
cov(x
1
, e
1
) =
1

2
e
1
x
1
u
1
e
1
var(x
1
) = var(x

1
) + var(e
1
)
plim(

1
)

1
plim(

1
) =
1
+
cov(x
1
, u
1
e
1
)
var(x
1
)
=
1

2
e
1

2
x

1
+
2
e
1
=
1

1

2
e
1

2
x

1
+
2
e
1
!
=
1


2
x

2
x

1
+
2
e
1
!
x
1
KAN-COECO1054U - 2014 23
OLS is biased in the classical error in variables model:
If is positive, it will underestimate and vice versa.
Things are more complicated if we look at the multiple
regression model but again OLS will be biased and inconsistent.
Solution: Instrumental Variable estimation, ....

1

1
KAN-COECO1054U - 2014 24
Instrumental Variable Estimation
Suppose we have an endogenous independent variable.
How can we obtain a good estimate for the coefficient on the
endogenous variable?
This can be achieved if there is an instrumental variable available.
KAN-COECO1054U - 2014 25
First, we look at the simple regression model:
Now take another variable z with . Then,
Under the additional assumption , we have
y =
0
+
1
x + u
cov(x, z) 6= 0
Cov(z, y) =
1
Cov(z, x) + Cov(z, u)
or
Cov(z, y) Cov(z, u)
Cov(z, x)
=
1
cov(u, z) = 0

1
=
Cov(z, y)
Cov(z, x)
KAN-COECO1054U - 2014 26
A natural estimator for is therefore to replace the
population covariances by their sample analogues:
This estimator is consistent for but it is biased if
. For this reason dont use it in small
samples.
If x is exogenous, it can be used as an instrument and
then the IV estimator is identical to OLS.
The natural estimator for is simply:

1
=
P
i
(z
i
z)(y
i
y)
P
i
(z
i
z)(x
i
x)

0
= y

1
x
cov(u, z) 6= 0
KAN-COECO1054U - 2014 27
A variable z is a candidate for an instrument for a variable x if it
satisfies:
and
Some remarks on the choice of an instrument:
It is often difficult to find a good instrument.
A proxy variable does not make a good instrument as it is supposed to
be correlated with the error term.
Example: Ability is not observed and IQ is highly correlated with ability.
Then it is a candidate for a proxy but clearly violates the condition
.
One may use family background variables as instruments for education
such as the number of siblings: negatively correlated with education but
maybe uncorrelated with ability. The latter, however, is unclear as ability
is not observed.
If one is not sure about the quality of an instrument, it may be better to
use a proxy variable (if available).
cov(u, z) = 0 cov(x, z) 6= 0
cov(u, z) = 0
KAN-COECO1054U - 2014 28
Inference with the IV estimator
The IV estimator has an asymptotic normal distribution.
When we impose a homoscedasticity assumption
conditional to the instrument , one
can derive the asymptotic variance of which is
This provides us a standard error for the IV estimator.
As with the OLS estimator, the asymptotic variance of
the IV estimator decreases to zero at the rate 1/n.
If is small, the variance of the IV estimator is large.
All population quantities can be consistently estimated
with a random sample .
(E(u
2
|z) =
2
= V ar(u))

1
Avar(

1
) =

2
n
2
x

2
x,z

2
x,z
KAN-COECO1054U - 2014 29
The population variance of the error term can be
estimated just like in the case of the OLS regression:
where are now the residuals from the IV regression.
The population variance of x can be estimated by the
sample variance of :
The square of the population correlation between x and z
can be estimated by the R-squared of the regression of
on : .
Then a consistent estimator is:
The asymptotic variance of the IV estimator is always
larger and sometimes much larger than the asymptotic
variance of the OLS estimator.

2

2
=
1
n2
P
i
u
2
i
u
i
x
i
SST
x
/n
x
i
z
i
R
2
x,z
d
Avar(

1
) =

2
SST
x
R
2
x,z
KAN-COECO1054U - 2014 30
Example: Return to Education for Married Women
Data: MROZ.dta
Simple log level regression model:
We obtain OLS estimates:
We use fathers education as an instrument for education.
We cannot empirically check whether ability and fathers
education are uncorrelated. However, we can test, whether
education and fathers education are correlated.
log(wage) =
0
+
1
educ + u
d
log(wage) = 0.185 + 0.109educ
(0.185) (0.014)
n = 428, R
2
= 0.118
KAN-COECO1054U - 2014 31
Example: cont.
When we regress educ on fatheduc, we obtain
This suggests that there is significant positive correlation and
about 17% of the variation in educ is explained by fathers educ.
When we use fathers educ as instrument for educ, we obtain:
The IV estimate of the return of education is about one half of the
OLS estimate, suggesting that there is omitted ability bias.
The IV standard error is much larger than the OLS standard error
and the IV 95% Conf. Interval contains the OLS estimate.
d
educ = 10.24 + 0.269fatheduc
(0.185) (0.014)
n = 428, R
2
= 0.173
d
log(wage) = 0.441 + 0.059educ
(0.446) (0.035)
n = 428, R
2
= 0.093
KAN-COECO1054U - 2014 32
While the previous example suggests that differences in
estimates are practically large, they are not statistically
significant.
There are similar applications with other data sets which
yield larger IV estimates than OLS estimates. From a
practical point of view it is therefore unclear whether the
instruments are uncorrelated with the error terms.
Since just a little correlation between z and u can already
cause serious problems for the IV estimator, this is an
important issue.
IV estimation can be also applied in case of a binary
endogenous regressor or a binary instrumental variable.
KAN-COECO1054U - 2014 33
IV Estimation with a poor Instrumental Variable
IV estimates can have large standard errors if x and z are only
weakly correlated. (Dont use IV in this case.)
IV estimates can have a large asymptotic bias if z and u are only
weakly correlated:
This implies that the bias can be large if the population correlation
between z and x is small even if the population correlation between
z and u is small.
For this reason IV can be worse in terms of consistency than OLS
even if Corr(z,u) is small (provided that Corr(z,x) is also small).
One can show that IV is only superior in terms of asymptotic bias if
plim

1
=
1
+
Corr(z, u)
Corr(z, x)

u

x
Corr(z, u)/Corr(z, x) < Corr(x, u)
KAN-COECO1054U - 2014 34
R-Squared and IV Estimation
SSR (sum of squared IV residuals) can be larger than
SST. For this reason the R-squared can be become
negative and it is smaller than for OLS.
It is not clear whether the IV R-squared should be
reported after IV estimation.
If you try to maximise the R-squared, use OLS as IV tries
to improve the quality of ceteris paribus effects.
R
2
= 1 SSR/SST
KAN-COECO1054U - 2014 35
IV Estimation of the Multiple Regression Model
The idea of IV estimation can be easily extended to the multiple
regression case.
For this purpose we change a bit the notation.
The model is now: with ,
is Kx1 and .
and is endogenous with .
We call the above equation structural equation as we are
interested in the coefficients.
We need an instrument for to obtain consistent estimates.
We need another exogenous variable with .
Then with .
z
1
The reduced form equation is
with and is uncorrelated with all exogenous
variables.
Rank condition: We require a non-zero partial correlation
between and : . (use t-test to check this)
This implies which makes it invertible.
We obtain:
and can be consistently estimated.
KAN-COECO1054U - 2014 36
z
1
Given a random sample i=1,....,N the instrumental
variables estimator of is
with Z and X are N x K and Y is N x 1.
and are the ith row of X and Z respectively.
KAN-COECO1054U - 2014 37
KAN-COECO1054U - 2014 38
Example: College Proximity as IV for Education
Data: Card.dta
Log(wage) is dependent variable, several controls
Instrument for education: dummy if someone grew up near a four
year old college (nearc4).
We assume that nearc4 is uncorrelated with the error. Moreover,
to be a valid instrument it has to be partially correlated with educ.
We can test this by estimating the reduced form equation:
The t-statistic is 3.64 and therefore if nearc4 is uncorrelated with
the error term, we can use it as IV for educ.
d
educ = 16.64 + 0.320nearc4 + ...
(0.24) (0.088)
n = 3, 010, R
2
= 0.477
KAN-COECO1054U - 2014 39
The following table reports OLS and IV estimates.
IV estimate is almost twice as large as the OLS estimate.
SE of the IV estimate is 18 times larger. This is the price we have to
pay if we use an instrument to obtain a consistent estimator.
Dependent Variable: log(wage)
Independent
Variable
(1) OLS (2) IV
Educ 0.075
(0.003)
0.132
(0.055)
Exper 0.085
(0.007)
0.108
(0.024)
Exper^2 -0.0023
(0.0003)
-0.0023
(0.0003)
other controls
Observations
R-squared
3,010
0.300
3,010
0.238
KAN-COECO1054U - 2014 40
Two Stage Least Squares (2SLS)
Sometimes there are multiple valid IVs for an
endogenous explanatory variable.
Suppose the variables satisfy
We could simply use both of them as instruments and
obtain multiple IV estimators.
The idea is to use both together to obtain a more
efficient estimator:
Let be 1x L with L=K+M.
As each element of z is uncorrelated with u, any linear
combination is also uncorrelated with u.
The linear combination of z which is most highly
correlated with is the linear projection of on z.
The reduced form equation is
where and is uncorrelated with all elements
of z.
is correlated with u if is endogenous.
is not correlated with u with
Consistently estimate the parameters by OLS and use
fitted values as an estimator for :
We require that at least one is non-zero. Use F-test.
KAN-COECO1054U - 2014 41
Now, let and use it as the
instruments for :
It can be shown that and thus
This estimator is consistent under the conditions:
, , ,
The last condition suggests that we need at least as
many instruments as explanatory variables in the model
(order condition)
KAN-COECO1054U - 2014 42
Under homoscedasticity it is possible to show
that asymptotically
The variance matrix can be estimated by with
Econometric packages have 2SLS implemented. There
is no need to perform the two stages manually. If you
compute it manually, OLS standard errors and statistics
for the second stage are not valid (as there is a
composite error in the second regression).
KAN-COECO1054U - 2014 43
KAN-COECO1054U - 2014 44
The IV estimator with multiple instruments is also called
two stage least squares (2SLS) estimator:
One can show that the IV estimates are identical to OLS
estimates from the regression of
This is the second stage.
The first stage is the regression of
Multicollinearity is a bigger problem for 2SLS than for
OLS . This is for two reasons:
has less variation than .
has more correlation with than .
KAN-COECO1054U - 2014 45
Some Remarks:
If the R-squared of on the exogenous variables
appearing in the structural equation (without instruments)
is very large, the standard error of 2SLS explodes. Can
be verified with data at hand.
2SLS can be also used in models with more than one
endogenous variable.
We need more candidates for instruments to achieve
identification.
The sufficient condition for identification is the rank condition.
Since R-squared after 2SLS cannot be compared to OLS
R-squared we must be careful when using the F-test.
It is possible to derive a statistic with an approximate F-
distribution in large samples. Use econometric packages
to test multiple hypothesis after 2SLS as commands are
available.
KAN-COECO1054U - 2014 46
Testing for Endogeneity
2SLS is less efficient than OLS and can have large
standard errors.
It is therefore useful to have a test for endogeneity of
explanatory variables to show whether 2SLS is even
necessary:
1. Regression based test (Hausman, 1978)
2. Durbin, Wu and Hausman suggest a test which directly
compares OLS and 2SLS estimates and determines
whether differences are statistically significant (DWH Test):
H0: all regressors are exogeneous
Regression based test (Hausman, 1978)
Suppose we have the structural equation
with being endogenous and there are two exogenous
variables which are not included in the model.
KAN-COECO1054U - 2014 47
y
1
=
0
+
1
y
2
+
2
z
1
+
3
z
2
+ u
1
z
3
, z
4
y
2
KAN-COECO1054U - 2014 48
The idea behind the test is as follows:
We have:
and
Each is uncorrelated with .
is uncorrelated with if and only if, is uncorrelated with
and has zero mean. This is what we want to test.
Write , where is uncorrelated with and
Then and are uncorrelated if and only if .
Simply plug this into the structural equation and do a t test.
Since is not observed, use instead the residuals from the
reduced form equation as a regressor:
y
1
=
0
+
1
y
2
+
2
z
1
+
3
z
2
+ u
1
y
2
=
0
+
1
z
1
+
2
z
2
+
3
z
3
+
4
z
4
+
2
z
j
u
1
y
2

2
u
1
u
1
u
1
=
1

2
+ e
1
e
1

2
E(e
1
) = 0.
u
1

2

1
= 0

2
y
1
=
0
+
1
y
2
+
2
z
1
+
3
z
2
+
1

2
+ error
KAN-COECO1054U - 2014 49
We then test: using a t-test.
if we reject it at a small significance level, we conclude is
endogenous because and are correlated.
Practical guideline for the Hausman test:
1. Estimate the reduced form for and obtain .
2. Add to the structural regression and estimate it by
OLS. You may want to use a heteroscedasticity robust
version of the t-test for testing whether the coefficient
on is significant. If it is statistically significant from
zero, we conclude that is indeed endogenous.
y
1
=
0
+
1
y
2
+
2
z
1
+
3
z
2
+
1

2
+ error
H
0
:
1
= 0
y
2
u
1

2
y
2

2
y
2

2

2
This test requires the availability of valid instruments.
It can be easily extended to the case of multiple
endogenous variables:
The reduced form of step 1 is then estimated for each
endogenous regressor.
The regression in step 2 then includes the residuals obtained by
all regressions of step 1. The test is then to test for joint
significance of all residuals using F- or LM test in this regression.
KAN-COECO1054U - 2014 50
Durbin-Wu-Hausman (DWH) test
If under the H0 all regressors are exogenous but some
are endogenous under H1, we can base a test directly
on the difference between 2SLS and OLS estimators.
The DWH statistics
is asymptotically distributed with is the number of
endogenous regressors and is a generalised
inverse of H.
Statistic may be cumbersome to compute.
Stata command requires module: IVENDOG
KAN-COECO1054U - 2014 51
(

2SLS

OLS
)
0
[(

X
0

X)
1
(X
0
X)
1
]

2SLS

OLS
)/
2
OLS
H

2
(G
1
)
G
1
Testing Overidentifying restrictions
If we have more instruments for one endogenous
explanatory variable, we can test whether at least some
of them are not correlated with (validity of the
instrument). We need that at least one of the IVs is
exogenous and we need to know which one.
Then we can test the overidentifying restrictions that are
used in 2SLS:
E(zu)=0 is Lx1
Suppose we estimate the same model by 2SLS but with a
different number of instruments. Say model 1 is just identified
and model 2 is overidentified. Then L in the second model is
bigger than in the first and thus we have imposed additional
moment conditions. These conditions can be tested.
Under H0: E(zu)=0 is true for model 2.
Regression based version: Sargan Test
KAN-COECO1054U - 2014 52
u
1
KAN-COECO1054U - 2014 53
More remarks on IV estimation:
Heteroscedasticity in the context of 2SLS raises the same issues
as with OLS.
There are standard errors and test statistics available which are
robust with respect to heteroscedasticity.
There are also tests for heteroscedasticity available.
KAN-COECO1054U - 2014 54
Summary
We have seen the method of instrumental variables as a
way to consistently estimate the parameters in a linear
model when there are endogenous explanatory
variables.
When instruments are poor, IV estimates can be worse
than OLS.
2SLS is routinely used in economics and social sciences
alike.
Tests for endogeneity.

You might also like