You are on page 1of 56

Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08

IRP Lectures Madison, WI, August 2008


Lectures 3 & 4, Monday, August 4, 11:15-12:30 and 1:30-2:30
Linear Panel DataModels
Thesenotescover somerecent topicsinlinear panel datamodels. Theybeginwitha
modern treatment of thebasiclinear model, andthenconsider someembellishments, suchas
randomslopesandtime-varyingfactor loads. Inaddition, fullyrobust testsfor correlated
randomeffects, lackof strict exogeneity, andcontemporaneousendogeneityarepresented.
Section4discussesmethodsfor estimatingdynamicpanel datamodelswithout strictly
exogenousregressors. Recent methodsfor estimatingproductionfunctionsusingfirm-level
panel dataaresummarizedinSection5, andSection6providesaunifiedtreatment of
estimationwithpseudo-panel data.
1. Overview of the Basic Model
Most of thesenotesareconcernedwithanunobservedeffectsmodel definedfor alarge
population. Therefore, weassumerandomsamplinginthecrosssectiondimension. Unless
statedotherwise, theasymptoticresultsarefor afixednumber of timeperiods, T, withthe
number of crosssectionobservations, N, gettinglarge.
For someof what wedo, it iscritical todistinguishtheunderlyingpopulationmodel of
interest andthesamplingschemethat generatesdatathat wecanusetoestimatethepopulation
parameters. Thestandardmodel canbewritten, for agenerici inthepopulation, as
y
it
p
t
x
it
c
i
u
it
, t 1,...,T, (1.1)
wherep
t
isaseparatetimeperiodintercept (almost alwaysagoodidea), x
it
isa1 K vector of
explanatoryvariables, c
i
isthetime-constant unobservedeffect, andtheu
it
: t 1,...,T are
idiosyncraticerrors. ThankstoMundlak(1978) andChamberlain(1982), wenowknowthat, in
1
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
thesmall T case, viewingthec
i
asrandomdrawsalongwiththeobservedvariablesisthe
appropriateposture. Then, oneof thekeyissuesiswhether c
i
iscorrelatedwithelementsof x
it
.
It probablymakesmoresensetodropthei subscript in(1.1), whichwouldemphasizethat
theequationholdsfor anentirepopulation. But (1.1) isuseful toemphasizingwhichfactors
changeonlyacrosst, whichchangeonlychangeacrossi, andwhichchangeacrossi andt. It is
sometimesconvenient tosubsumethetimedummiesinx
it
.
Rulingout correlation(for now) betweenu
it
andx
it
, asensibleassumptionis
contemporaneous exogeneity conditional on c
i
:
Eu
it
|x
it
,c
i
0, t 1,...,T. (1.2)
Thisequationreallydefines inthesensethat, under (1.1) and(1.2),
Ey
it
|x
it
,c
i
p
t
x
it
c
i
, (1.3)
sothe[
j
arepartial effectsholdingfixedtheunobservedheterogeneity(andcovariatesother
thanx
tj
).
Asisnowwell known, isnot identifiedonlyunder (1.3). Of course, if weadd
Covx
it
,c
i
0 for anyt, then isidentifiedandcanbeconsistentlyestimatedbyacross
sectionregressionusingasingletimeperiodt, or bypoolingacrosst. But usuallythewhole
point inhavingpanel dataistoallowtheunobservedeffect tobecorrelatedwithtime-varying
x
it
.
Wecanallowgeneral correlationbetweenc
i
andx
i
x
i1
,x
i2
,...,x
iT
if weaddthe
assumptionof strict exogeneity conditional on c
i
:
Eu
it
|x
i1
,x
i2
,...,x
iT
,c
i
0, t 1,...,T, (1.4)
whichcanbeexpressedas
2
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Ey
it
|x
i1
,...,x
iT
,c
i
Ey
it
|x
it
,c
i
p
t
x
it
c
i
. (1.5)
If theelementsof x
it
: t 1,...,T havesuitabletimevariation, canbeconsistently
estimatedbyfixedeffects(FE) or first differencing(FD), or generalizedleast squares(GLS) or
generalizedmethodof moments(GMM) versionsof them. Thefixedeffects, or within
estimator, isthepooledOLSestimator intheequation

it
p
t
x
it

it
, t 1,...,T,
where
it
y
it
T
1

r1
T
y
ir
isthedeviationof y
it
fromthetimeaverage, y i
andsimilarlyfor
x
it
. Consistencyof pooledOLS(for fixedT andN ) essentiallyrequiresrestson

t1
T
Ex
it


it

t1
T
Ex
it

u
it
0, whichmeanstheerror u
it
shouldbeuncorrelatedwithx
ir
for all r andt. TheFDestimator ispooledOLSon
y
it
o
t
x
it
u
it
, t 2,...,T,
whereo
t
p
t
p
t1
. Sufficient for consistencyisEx
it

u
it
0. SeeWooldridge(2002,
Chapter 10) for further discussion.
If FE or FDareused, standardinferencecanandshouldbemadefullyrobust to
heteroskedasticityandserial dependencethat coulddependontheregressors(or not). These
arethenowwell-knowncluster standarderrors(whichwediscussindetail inthenoteson
cluster sampling). WithlargeN andsmall T, thereislittleexcusenot tocomputethem. Evenif
GLSisusedwithanunrestrictedvariancematrixfor theT 1vector u
i
(intheFDcase) or
theT 1vector
i
(wherewedroponetimeperiod), thesystemhomoskedasticityassumption,
for example, intheFE case, E
i

|x
i
E
i

, neednot hold, andsoacasecanbemade


for robust inference.
(Asanaside, somecall (1.4) or (1.5) strong exogeneity. But intheEngle, Hendry, and
3
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Richard(1983) work, strongexogeneityincorporatesassumptionsonparametersindifferent
conditional distributionsbeingvariationfree, andthat isnot neededhere.)
Thestrict exogeneityassumptionisalwaysviolatedif x
it
containslaggeddependent
variables, but it canbeviolatedinother caseswherex
i,t1
iscorrelatedwithu
it
afeedback
effect. Anassumptionmorenatural thanstrict exogeneityissequential exogeneity condition
on c
i
:
Eu
it
|x
i1
,x
i2
,...,x
it
,c
i
0, t 1,...,T (1.6)
or
Ey
it
|x
i1
,...,x
it
,c
i
Ey
it
|x
it
,c
i
p
t
x
it
c
i
. (1.7)
Thisallowsfor laggeddependent variables(inwhichcaseit impliesthat thedynamicsinthe
meanhavebeencompletelyspecified) and, generally, ismorenatural whenwetaketheview
that x
it
might react toshocksthat affect y
it
. Generally, isidentifiedunder sequential
exogeneity. First differencingandusinglagsof x
it
asinstruments, or forwardfiltering, canbe
usedinsimpleIV proceduresor GMM procedures. (Morelater.)
If wearewillingtoassumec
i
andx
i
areuncorrelated, thenmanymorepossibilitiesarise
(including, of course, identifyingcoefficientsontime-constant explanatoryvariables). The
most convenient wayof statingtherandomeffects(RE) assumptionis
Ec
i
|x
i
Ec
i
, (1.8)
althoughusingthelinear projectioninplaceof Ec
i
|x
i
sufficesfor consistency(but usual
inferencewouldnot generallybevalid). Under (1.8), wecanusedpooledOLSor anyGLS
procedure, includingtheusual RE estimator. Fullyrobust inferenceisavailableandshould
generallybeused. (Note: Theusual RE variancematrix, whichdependsonlyono
c
2
ando
u
2
,
4
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
neednot becorrectlyspecified! It still makessensetouseit inestimationbut makeinference
robust.)
It isuseful todefinetwocorrelated random effects assumptions:
Lc
i
|x
i
x
i
, (1.9)
whichactuallyisnot anassumptionbut adefinition. For nonlinear models, wewill haveto
actuallymakeassumptionsabout Dc
i
|x
i
, theconditional distribution. Methodsbasedon(1.9)
areoftensaidtoimplement theChamberlain device, after Chamberlain(1982).
Mundlak(1978) usedarestrictedversion, andusedaconditional expectation:
Ec
i
|x
i
x
i
, (1.10)
wherex
i
T
1

t1
T
x
it
. Thisformulationconservesondegreesof freedom, andextensionsare
useful for nonlinear models.
If wewritec
i
x
i
a
i
or c
i
x
i
a
i
andplugintotheoriginal equation, for
example
y
it
p
t
x
it
x
i
a
i
u
it
(1.11)
(absorbing intothetimeintercepts), thenwearetemptedtousepooledOLS, or RE
estimationbecauseEa
i
u
it
|x
i
0. Either of theseleadstotheFE estimator of , andtoa
simpletest of H
0
: 0. Later, whenwediscusscontrol functionmethods, it will behandyto
runregressionsdirectlythat includethetimeaverages. (Somewhat surprisingly, weobtainthe
samealgebraicequivalenceusingChamberlainsmoreflexibledevise. That is, if weapply
pooledOLStotheequationy
it
p
t
x
it
x
i1

1
...x
iT

T
a
i
u
it
, theestimateof is
still theFE estimator, eventhoughthe
t
might changesubstantiallyacrosst. Of course, this
estimator isnot generallyefficient, andChamberlainshowshowtoobtaintheefficient
5
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
minimumdistanceestimator. SeealsoWooldridge(2002, Chapter 11).)
Someof ushavebeenpushingfor several yearsthenotionthat specificationtestsshouldbe
maderobust toassumptionsthat arenot directlybeingtested. That is, if atest hasno
asymptoticpower for detectingviolationof certainassumptions, thetest shouldbemodifiedto
haveproper asymptoticsizeif thoseassumptionsareviolated. Muchprogresshasbeenmadein
thetheoretical literature, but onestill seesroutineuseof Hausman(1978) statisticsthat
maintainafull set of assumptionsunder thenull hypothesis. (Ironically, thisoftenhappensin
studieswheretraditional inferenceabout parametersismadefullyrobust.) Takealeadingcase,
comparingrandomeffectstofixedeffects. Oncewemaintain(1.4), whichisusedbyFE and
RE, thekeyassumptionis(1.8), that is, weareinterestedinfindingevidenceof whether c
i
is
correlatedwithx
i
. Of course, theFE estimator isconsistent (for thecoefficientson
time-varyingcovariates) whether or not c
i
iscorrelatedwithx
i
. And, of course, weneedmake
noassumptionsabout Varu
i
|x
i
,c
i
for consistencyof FE. Further, RE isconsistent under
(1.8), whether or not Varv
i
|x
i
hastherandomeffectsstructure, wherev
it
c
i
u
it
. (In
additionto(1.4) and(1.8), sufficient areVaru
i
|x
i
,c
i
o
u
2
I
T
andVarc
i
|x
i
Varc
i
.) In
fact, wemight beperfectlyhappyusingRE under (1.8) eventhoughit might not bethe
asymptoticallyefficient estimator. Therefore, for testingthekeyassumption(1.8), weshould
not addtheauxiliaryassumptionsthat implyRE isasymptoticallyefficient. Moreover, as
shouldbeclear fromthestructureof thestatistic(andcanbeshownformally), theusual form
of theHausmanstatistichasnosystematicpower for detectingviolationsof thesecond
moment assumptionsonVarv
i
|x
i
. Inparticular, if (1.4) and(1.8) hold, theusual statistic
convergesindistributiontosomerandomvariable(not chi-squareingeneral), regardlessof the
structureof Varv
i
|x
i
.
6
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Tosummarize, it makesnosensetoreport fullyrobust variancematricesfor FE andRE but
thentocomputeaHausmantest that maintainsthefull set of RE assumptions. The
regression-basedHausmantest from(1.11) isveryhandyfor obtainingafullyrobust test, as
well asfor usingtheproper degreesof freedominthelimitingdistribution. Specifically,
supposethemodel containsafull set of year interceptsaswell astime-constant and
time-varyingexplanatoryvariables:
y
it
g
t
q z
i
y w
it
c
i
u
it
, t 1,...,T.
Now, it isclear that, becausewecannot estimatey byFE, it isnot part of theHausmantest
comparingtheRE andFE estimates. What islessclear, but alsotrue, isthat thecoefficientson
theaggregatetimevariables, q, cannot beincluded, either. (RE andFE estimationonlywith
variablesthat changeacrosst areidentical.) Infact, wecanonlycomparetheM 1estimates
of , say

FE
and

RE
. If weincludeq
FE
andq
RE
weintroduceanonsingularityinthe
asymptoticvariancematrix. Theregressionbasedtest, fromthepooledregression
y
it
ong
t
, z
i
, w
it
, w
i
, t 1,...,T; i 1,...,N,
makesthisclear (andalsomakesit clear that theareonlyM restrictionstotest). Mundlak
(1978) suggestedthistest andArellano(1993) describedtherobust version.. Unfortunately, the
usual formof theHausmantest doesnot makeit easytoobtainanonnegativetest statistic, and
it iseasytoget confusedabout theappropriatedegreesof freedominthechi-square
distribution. For example, theHausman commandinStataincludesyear dummiesinthe
comparisonbetweenRE andFE; inaddition, thetest maintainsthefull set of RE assumptions
under thenull. Themost important problemisthat unwarranteddegreesof freedomareadded
tothechi-squaredistribution, oftenmanyextradf, whichcanproduceseriouslymisleading
7
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
p-values.
2. New Insights Into Old Estimators
Inthepast several years, thepropertiesof traditional estimatorsusedfor linear models,
particularlyfixedeffectsanditsinstrumental variablecounterparts, havebeenstudiedunder
weaker assumptions. Wereviewsomeof thoseresultshere. Inthesenotes, wefocusonmodels
without laggeddependent variablesor other non-strictlyexogenousexplanatoryvariables,
althoughtheinstrumental variablesmethodsappliedtolinear modelscan, insomecases, be
appliedtomodelswithlaggeddependent variables.
2.1. Fixed Effects Estimation in the Correlated Random Slopes Model
Thefixedeffects(FE) estimator isstill theworkhorseinempirical studiesthat employ
panel datamethodstoestimatetheeffectsof time-varyingexplanatoryvariables. The
attractivenessof theFE estimator isthat it allowsarbitrarycorrelationbetweentheadditive,
unobservedheterogeneityandtheexplanatoryvariables. (Pooledmethodsthat donot remove
timeaverages, aswell astherandomeffects(RE) estimator, essentiallyassumethat the
unobservedheterogeneityisuncorrelatedwiththecovariates.) Nevertheless, theframeworkin
whichtheFE estimator istypicallyanalyzedissomewhat restrictive: theheterogeneityis
assumedtobeadditiveandisassumedtohaveaconstant coefficients(factor loads) over time.
Recently, Wooldridge(2005) hasshownthat theFE estimator, andextensionsthat sweepaway
unit-specifictrends, hasrobustnesspropertiesfor estimatingthepopulationaverageeffect
(PAE) or averagepartial effect (APE).
Webeginwithanextensionof theusual model toallowfor unit-specificslopes,
8
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
y
it
c
i
x
it
b
i
u
it
Eu
it
|x
i
,c
i
,b
i
0,t 1,...,T,
(2.1)
(2.2)
whereb
i
isK 1. Rather thanacknowledgethat b
i
isunit-specific, weignorethe
heterogeneityintheslopesandact asif b
i
isconstant for all i. Wethinkc
i
might becorrelated
withat least someelementsof x
it
, andthereforeweapplytheusual fixedeffectsestimator. The
questionweaddresshereis: whendoestheusual FE estimator consistentlyestimatethe
populationaverageeffect, Eb
i
.
Inadditiontoassumption(2.2), wenaturallyneedtheusual FE rankcondition,
rank

t1
T
Ex
it

x
it
K. (2.3)
Writeb
i
d
i
wheretheunit-specificdeviationfromtheaverage, d
i
, necessarilyhasazero
mean. Then
y
it
c
i
x
it
x
it
d
i
u
it
c
i
x
it
v
it
(2.4)
wherev
it
x
it
d
i
u
it
. A sufficient conditionfor consistencyof theFE estimator alongwith
(2.2) is
Ex
it

v
it
0,t 1,...,T. (2.5)
Alongwith(2.2), it sufficesthat Ex
it

x
it
d
i
0 for all t. A sufficient condition, andonethat is
easier tointerpret, is
Eb
i
|x
it
Eb
i
, t 1,...,T. (2.6)
Importantly, condition(2.6) allowstheslopes, b
i
, tobecorrelatedwiththeregressorsx
it
throughpermanent components. What it rulesout iscorrelationbetweenidiosyncratic
movementsinx
it
. Wecanformalizethisstatement bywritingx
it
f
i
r
it
,t 1,...,T. Then
9
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
(2.6) holdsif Eb
i
|r
i1
,r
i2
,...,r
iT
Eb
i
. Sob
i
isallowedtobearbitrarilycorrelatedwiththe
permanent component, f
i
. (Of course, x
it
f
i
r
it
isaspecial representationof thecovariates,
but it helpstoillustratecondition(2.6).) Condition(2.6) issimilar inspirit totheMundlak
(1978) assumptionappliedtotheslopes(rather totheintercept):
Eb
i
|x
i1
,x
i2
,...,x
iT
Eb
i
|x
i

Oneimplicationof theseresultsisthat it isagoodideatouseafullyrobust variancematrix


estimator withFE evenif onethinksidiosyncraticerrorsareseriallyuncorrelated: theterm
x
it
d
i
isleft intheerror termandcausesheteroskedasticityandserial correlation, ingeneral.
Theseresultsextendtoamoregeneral classof estimatorsthat includestheusual fixed
effectsandrandomtrendestimator. Write
y
it
w
t
a
i
x
it
b
i
u
it
, t 1,...,T (2.7)
wherew
t
isaset of deterministicfunctionsof time. Wemaintainthestandardassumption(2.2)
but witha
i
inplaceof c
i
. Now, thefixedeffects estimator sweepsawaya
i
bynettingout w
t
fromx
it
. Inparticular, nowlet x
it
denotetheresidualsfromtheregressionx
it
on
w
t
,t 1,...,T.
Intherandomtrendmodel, w
t
1,t, andsotheelementsof x
it
haveunit-specificlinear
trendsremovedinadditiontoalevel effect. Removingevenmoreof theheterogeneityfrom
x
it
makesit evenmorelikelythat (2.6) holds. For example, if x
it
f
i
h
i
t r
it
, thenb
i
can
bearbitrarilycorrelatedwithf
i
,h
i
. Of course, individuallydetrendingthex
it
requiresat least
threetimeperiods, andit decreasesthevariationinx
it
comparedtotheusual FE estimator. Not
surprisingly, increasingthedimensionof w
t
(subject totherestrictiondimw
t
T), generally
leadstolessprecisionof theestimator. SeeWooldridge(2005) for further discussion.
Of course, thefirst differencingtransformationcanbeusedinplaceof, or inconjunction
10
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
with, unit-specificdetrending. For example, if wefirst differencefollowedbythewithin
transformation, it iseasilyseenthat aconditionsufficient for consistencyof theresulting
estimator for is
Eb
i
|x
it
Eb
i
, t 2,...,T, (2.8)
wherex
it
x
it
x arethedemeanedfirst differences.
Nowconsider animportant special caseof theprevioussetup, wheretheregressorsthat
haveunit-specificcoefficientsaretimedummies. Wecanwritethemodel as
y
it
x
it
p
t
c
i
u
it
,t 1,...,T, (2.9)
where, withsmall T andlargeN, it makessensetotreat p
t
: t 1,...,T asparameters, like
. Model (2.9) isattractivebecauseit allows, say, thereturntounobservedtalent tochange
over time. Thosewhoestimate, say, firm-level productionfunctionsliketoallowthe
importanceof unobservedfactors, suchasmanagerial skill, tochangeover time. Estimationof
, alongwiththep
t
, isanonlinear problem. What if wejust estimate byfixedeffects? Let
j
c
Ec
i
andwrite(2.9) as
y
it
o
t
x
it
p
t
d
i
u
it
,t 1,...,T, (2.10)
whereo
t
p
t
j
c
andd
i
c
i
j
c
haszeromeanInaddition, thecompositeerror,
v
it
p
t
d
i
u
it
, isuncorrelatedwithx
i1
,x
2
,...,x
iT
(aswell ashavingazeromean). It iseasy
toseethat consistencyof theusual FE estimator, whichallowsfor different timeperiod
intercepts, isensuredif
Covx
it
,c
i
0,t 1,...,T. (2.11)
Inother words, theunobservedeffectsisuncorrelatedwiththedeviationsx
it
x
it
x
i
.
If weusetheextendedFE estimatorsfor randomtrendmodels, asabove, thenwecan
11
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
replacex
it
withdetrendedcovariates. Then, c
i
canbecorrelatedwithunderlyinglevelsand
trendsinx
it
(providedwehaveasufficient number of timeperiods).
Usingusual FE (withfull timeperioddummies) doesnot allowustoestimatethep
t
, or
evendeterminewhether thep
t
changeover time. Evenif weareinterestedonlyin whenc
i
andx
it
areallowedtobecorrelated, beingabletodetect time-varyingfactor loadsisimportant
because(2.11) isnot completelygeneral. It isuseful tohaveasimpletest of
H
0
: p
2
p
3
... p
T
withsomepower against thealternativeof time-varyingcoefficients.
Then, wecandeterminewhether amoresophisticatedestimationmethodmight beneeded.
Wecanobtainasimplevariableadditiontest that canbecomputedusinglinear estimation
methodsif wespecifyaparticular relationshipbetweenc
i
andx
i
. WeusetheMundlak(1978)
assumption
c
i
x
i
a
i
. (2.12)
Then
y
it
p
t
x
it
p
t
x
i
p
t
a
i
u
it
o
t
x
it
x
i
z
t
x
i
a
i
z
t
a
i
u
it
, (2.13)
wherez
t
p
t
1for all t. Under thenull hypothesis, z
t
0,t 2,...,T. If weimposethe
null hypothesis, theresultingmodel islinear, andwecanestimateit bypooledOLSof y
it
on
1,d2
t
,...,dT
t
,x
it
,x
i
acrosst andi, wherethedr
t
aretimedummies. A variableadditiontest
that all z
t
arezerocanbeobtainedbyapplyingFE totheequation
y
it
o
1
o
2
d2
t
...o
T
dT
t
x
it
z
2
d2
t
x
i

...z
T
dT
t
x
i

error
it
, (2.14)
andtest thejoint significanceof theT 1termsd2
t
x
i

,...,dT
t
x
i

. (Thetermx
i

would
dropout of anFE estimation, andsowejust omit it.) Notethat x
i

isascalar andsothetest as
T 1degreesof freedom. Asalways, it isprudent touseafullyrobust test (eventhough, under
12
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
thenull, z
t
a
i
disappearsfromtheerror term).
A fewcommentsabout thistest areinorder. First, althoughweusedtheMundlakdeviceto
obtainthetest, it doesnot havetorepresent theactual linear projectionbecausewearesimply
addingtermstoanFE estimation. Under thenull, wedonot needtorestrict therelationship
betweenc
i
andx
i
. Of course, thepower of thetest maybeaffectedbythischoice. Second, the
test onlymakessenseif 0; inparticular, it cannot beusedinapurerandomeffects
environment. Third, arejectionof thenull doesnot necessarilymeanthat theusual FE
estimator isinconsistent for : assumption(11) couldstill hold. Infact, thechangeinthe
estimateof whentheinteractiontermsareaddedcanbeindicativeof whether accountingfor
time-varyingp
t
islikelytobeimportant. But, because

hasbeenestimatedunder thenull, the


estimated from(1.14) isnot generallyconsistent.
If wewant toestimatethep
t
alongwith, wecanimposetheMundlakassumptionand
estimateall parameters, including, bypoolednonlinear regressionor someGMM version.
Or, wecanuseChamberlains(1982) lessrestrictiveassumption. But, typically, whenwewant
toallowarbitrarycorrelationbetweenc
i
andx
i
, weworkdirectlyfrom(2.9) andeliminatethe
c
i
. Thereareseveral waystodothis. If wemaintainthat all p
t
aredifferent fromzerothenwe
canuseaquasi-differencingmethodtoeliminatec
i
. Inparticular, for t 2wecanmultiplythe
t 1equationbyp
t
/p
t1
andsubtract theresult fromthetimet equation:
y
it
p
t
/p
t1
y
i,t1
x
it
p
t
/p
t1
x
i,t1
p
t
c
i
p
t
/p
t1
p
t1
c
i
u
it
p
t
/p
t1
u
i,t1

x
it
p
t
/p
t1
x
i,t1
u
it
p
t
/p
t1
u
i,t1
, t 2.
Wedefine0
t
p
t
/p
t1
andwrite
y
it
0
t
y
i,t1
x
it
0
t
x
i,t1
e
it
, t 2,...,T, (2.15)
wheree
it
u
it
0
t
u
i,t1
. Under thestrict exogeneityassumption, e
it
isuncorrelatedwithevery
13
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
element of x
i
, andsowecanapplyGMM to(2.15) toestimate and0
2
,...,0
T
. Again, this
requiresusingnonlinear GMM methods, andthee
it
wouldtypicallybeseriallycorrelated. If
wedonot imposerestrictionsonthesecondmoment matrixof u
i
, thenwewouldnot useany
informationonthesecondmomentsof e
i
; wewould(eventually) useanunrestrictedweighting
matrixafter aninitial estimation.
Usingall of x
i
ineachtimeperiodcanresult intoomanyoveridentifyingrestrictions. At
timet wemight use, say, z
it
x
it
,x
i,t1
, andthentheinstrument matrixZ
i
(withT 1rows)
wouldbediagz
i2
,...,z
iT
. Aninitial consistent estimator canbegottenbychoosingweighting
matrixN
1

i1
N
Z
i

Z
i

1
. Thentheoptimal weightingmatrixcanbeestimated. Ahn, Lee, and
Schmidt (2001) providefurther discussion.
If x
it
containssequentiallybut not strictlyexogenousexplanatoryvariables suchasa
laggeddependent variable theinstrumentsat timet canonlybechosenfromx
i,t1
,...,x
i1
.
Holtz-Eakin, Newey, andRosen(1988) explicitlyconsider modelswithlaggeddependent
variables; moreonthesemodelslater.
Other transformationscanbeused. For example, at timet 2wecanusetheequation
p
t1
y
it
p
t
y
i,t1
p
t1
x
it
p
t
x
i,t1
e
it
, t 2,...,T,
wherenowe
it
p
t1
u
it
p
t
u
i,t1
. Thisequationhastheadvantageof allowingp
t
0for some
t. Thesamechoicesof instrumentsareavailabledependingonwhether x
it
arestrictlyor
sequentiallyexogenous.
2.2. Fixed Effects IV Estimation with Random Slopes
Theresultsfor thefixedeffectsestimator (inthegeneralizedsenseof removing
14
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
unit-specificmeansandpossiblytrends), extendtofixedeffectsIV methods, providedweadd
aconstant conditional covarianceassumption. Murtazashvili andWooldridge(2007) derivea
simpleset of sufficient conditions. Inthemodel withgeneral trends, weassumethenatural
extensionof AssumptionFEIV.1, that is, Eu
it
|z
i
,a
i
,b
i
0for all t, alongwithAssumption
FEIV.2. Wemodifyassumption(2.6) intheobviousway: replacex
it
withz
it
, the
individual-specificdetrendedinstruments:
Eb
i
|z
it
Eb
i
, t 1,...,T (2.16)
But somethingmoreisneeded. Murtazashvili andWooldridge(2007) showthat, alongwiththe
previousassumptions, asufficient conditionis
Covx
it
,b
i
|z
it
Covx
it
,b
i
,t 1,...,T. (2.17)
Notethat thecovarianceCovx
it
,b
i
, aK K matrix, neednot bezero, or evenconstant across
time. Inother words, wecanallowthedetrendedcovariatestobearbitrarilycorrelatedwiththe
heterogeneousslopes, andthat correlationcanchangeinanywayacrosstime. But the
conditional covariancecannot dependonthetime-demeanedinstruments. (Thisisanexample
of howit isimportant todistinguishbetweenaconditional expectationandanunconditional
one: theimplicit error intheequationgenerallyhasanunconditional meanthat changeswitht,
but itsconditional meandoesnot dependonz
it
, andsousingz
it
asIVsisvalidprovidedwe
allowfor afull set of dummies.) Condition(2.17) extendstothepanel datacasethe
assumptionusedbyWooldridge(2003) inthecrosssectioncase.
Wecaneasilyshowwhy(2.17) sufficeswiththepreviousassumptions. First, if
Ed
i
|z
it
0 whichfollowsfromEb
i
|z
it
Eb
i
thenCovx
it
,d
i
|z
it
Ex
it
d
i

|z
it
, and
soEx
it
d
i
|z
it
Ex
it
d
i
,
t
under thepreviousassumptions. Writex
it
d
i
,
t
r
it
where
15
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Er
iti
|z
it
0,t 1,...,T. Thenwecanwritethetransformedequationas

it
x
it
x
it
d
i

it

it
x
it
,
t
r
it

it
. (2.18)
Now, if x
it
containsafull set of timeperioddummies, thenwecanabsorb,
t
intox
it
, andwe
assumethat here. Thenthesufficient conditionfor consistencyof IV estimatorsappliedtothe
transformedequationsisEz
it

r
it

it
0,.andthisconditionismet under themaintained
assumptions. Inother words, under (2.16) and(2.17), thefixedeffects2SLSestimator is
consistent for theaveragepopulationeffect, . (Remember, weusefixedeffects hereinthe
general senseof eliminatingtheunit-specifictrends, a
i
.) Wemust remember toincludeafull
set of timeperioddummiesif wewant toapplythisrobustnessresult, somethingthat shouldbe
doneinanycase. Naturally, wecanalsouseGMM toobtainamoreefficient estimator. If b
i
trulydependsoni, thenthecompositeerror r
it

it
islikelyseriallycorrelatedand
heteroskedastic. SeeMurtazashvili andWooldridge(2007) for further discussionand
simulationresultsontheperformanceof theFE2SLSestimator. Theyalsoprovideexamples
wherethekeyassumptionscannot beexpectedtohold, suchaswhenendogenouselementsof
x
it
arediscrete.
3. Behavior of Estimators without Strict Exogeneity
Asiswell known, boththeFE andFDestimatorsareinconsistent (withfixedT, N )
without theconditional strict exogeneityassumption. But it isalsoprettywell knownthat, at
least under certainassumptions, theFE estimator canbeexpectedtohavelessbias (actually,
inconsistency) for larger T. Oneassumptioniscontemporaneousexogeneity, (1.2). If we
maintainthisassumption, assumethat thedataseriesx
it
,u
it
: t 1,...,T isweakly
16
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
dependent intimeseriesparlance, integratedof order zero, or I(0) thenwecanshowthat
plim

FE
OT
1

plim

FD
O1.
(3.1)
(3.2)
Insomespecial cases theAR(1) model without extracovariates thebias termscanbe
calculated. But not generally. TheFE (within) estimator averagesacrossT, andthistendsto
reducethebias.
Interestingly, thesameresultscanbeshownif x
it
: t 1,...,T hasunit rootsaslongas
u
it
isI(0) andcontemporaneousexogeneityholds. But thereisacatch: if u
it
isI(1) so
that thetimeseriesversionof themodel wouldbeaspuriousregression(y
it
andx
it
arenot
cointegrated), then(3.1) isnolonger true. And, of course, thefirst differencingmeansanyunit
rootsareeliminated. So, oncewestart appealingtolargeT toprefer FE over FD, wemust
start beingawareof thetimeseriespropertiesof theseries.
Thesamecommentsholdfor IV versionsof theestimators. Providedtheinstrumentsare
contemporaneouslyexogenous, theFEIV estimator hasbiasof order T
1
, whilethebiasinthe
FDIV estimator doesnot shrinkwithT. Thesamecaveatsabout applicationstounit root
processesalsoapply.
Becausefailureof strict exogeneitycausesinconsistencyinbothFE andFDestimation, it
isuseful tohavesimpletests. OnepossibilityistoobtainaHausmantest directlycomparing
theFE andFDestimators. Thisisabit cumbersomebecause, whenaggregatetimeeffectsare
included, thedifferenceintheestimatorshasasingular asymptoticvariance. Plus, it is
somewhat difficult tomakethetest fullyrobust.
Instead, simpleregression-basedstrategiesareavailable. Let w
it
bethe1 Q vector, a
subset of x
it
suspectedof failingstrict exogeneity. A simpletest of strict exogeneity,
17
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
specificallylookingfor feedbackproblems, isbasedon
y
it
p
t
x
it
w
i,t1
c
i
e
it
, t 1,...,T 1. (3.3)
Estimatetheequationbyfixedeffectsandtest H
0
: 0 (usingafullyrobust test). Of course,
thetest mayhavelittlepower for detectingcontemporaneousendogeneity.
Inthecontext of FEIV wecantest whether asubset of instrumentsfailsstrict exogeneity
bywriting
y
it
p
t
x
it
h
i,t1
c
i
e
it
, t 1,...,T 1, (3.4)
whereh
it
isasubset of theinstruments, z
it
. Now, estimatetheequationbyFEIV using
instrumentsz
it
,h
i,t1
andtest coefficientsonthelatter.
It isalsoeasytotest for contemporaneousendogeneityof certainregressors, evenif we
allowsomeregressorstobeendogenousunder thenull. Writethemodel nowas
y
it1
z
it1

1
y
it2
o
1
y
it3
y
1
c
i1
u
it1
, (3.5)
where, inanFE environment, wewant totest H
0
: Ey
it3

u
it1
0 . Actually, becauseweare
usingthewithintransformation, wearereallytestingstrict exogeneityof y
it3
, but weallowall
variablestobecorrelatedwithc
i1
. Thevariablesy
it2
areallowedtobeendogenousunder the
null provided, of course, that wehavesufficient instrumentsexcludedfromthestructural
equationthat areuncorrelatedwithu
it1
ineverytimeperiod. Wecanwriteaset of reduced
formsfor elementsof y
it3
as
y
it3
z
it
H
3
c
i3
v
it3
, (3.6)
andobtaintheFE residuals,

v
it3

it3
z
it
H

3
, wherethecolumnsof H

3
aretheFE estimates
of thereducedforms, andthedoubledotsdenotestime-demeaning, asusual. Then, estimate
18
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
theequation

it1
z
it1

1

it2
o
1

it3
y
1


v
it3
p
1
error
it1
(3.7)
bypooledIV, usinginstrumentsz
it
,
it3
,

v
it3
. Thetest of thenull that y
it3
isexogenousisjust
the(robust) test that p
1
0, andtheusual robust test isvalidwithout adjustingfor the
first-stepestimation.
Anequivalent approachistodefinev
it3
y
it3
z
it
H

3
, whereH

3
isstill thematrixof FE
coefficients, addthesetoequation(3.5), andapplyFE-IV, usingafullyrobust test. Usinga
built-incommandcanleadtoproblemsbecausethetest israrelymaderobust andthedegrees
of freedomareoftenincorrectlycounted.
4. Instrumental Variables Estimation under Sequential Exogeneity
Wenowconsider IV estimationof themodel
y
it
x
it
c
i
u
it
, t 1,...,T, (4.1)
under sequential exogeneityassumptions. Someauthorssimplyuse
Ex
is

u
it
0, s 1,...,t,t 1,...,T. (4.2)
Asalways, x
it
probablyincludesafull set of timeperioddummies. Thisleadstosimple
moment conditionsafter first differencing:
Ex
is

u
it
0, s 1,...,t 1; t 2,...,T. (4.3)
Therefore, at timet, theavailableinstrumentsintheFDequationareinthevector x
i,t1
o
, where
x
it
o
x
i1
,x
i2
,...,x
it
. (4.4)
Therefore, thematrixof instrumentsissimply
19
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
W
i
diagx
i1
o
,x
i2
o
,...,x
i,T1
o
, (4.5)
whichhasT 1rows. Becauseof sequential exogeneity, thenumber of validinstruments
increaseswitht.
GivenW
i
, it isroutinetoapplyGMM estimation. But somesimpler strategiesareavailable
that canbeusedfor comparisonor asthefirst-stageestimator incomputingtheoptimal
weightingmatrix. Oneuseful oneistoestimateareducedformfor x
it
separatelyfor eacht.
So, at timet, runtheregressionx
it
onx
i,t1
o
, i 1,...,N, andobtainthefittedvalues, x
it
. Of
course, thefittedvaluesareall 1 K vectorsfor eacht, eventhoughthenumber of available
instrumentsgrowswitht. Then, estimatetheFDequation
y
it
x
it
u
it
, t 2,...,T (4.6)
bypooledIV usinginstruments(not regressors) x
it
. It issimpletoobtainrobust standard
errorsandtest statisticsfromsuchaprocedurebecausethefirst stageestimationtoobtainthe
instrumentscanbeignored(asymptotically, of course).
Onepotential problemwithestimatingtheFDequationbyIVsthat aresimplylagsof x
it
is
that changesinvariablesover timeareoftendifficult topredict. Inother words, x
it
might
havelittlecorrelationwithx
i,t1
o
, inwhichcasewefaceaproblemof weakinstruments. Inone
case, weevenloseidentification: if x
it
k
t
x
i,t1
e
it
whereEe
it
|x
i,t1
,...,x
i1
0 that is,
theelementsof x
it
arerandomwalkswithdrift thenEx
it
|x
i,t1
,...,x
i1
0, andtherank
conditionfor IV estimationfails.
If weimposewhat isgenerallyastronger assumption, dynamic completeness in the
conditional mean,
Eu
it
|x
it
,y
i,t1
x
i,t1
,...,y
i1
,x
i1
,c
i
0, t 1,...,T, (4.7)
20
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
thenmoremoment conditionsareavailable. While(4.7) impliesthat virtuallyanynonlinear
functionof thex
it
canbeusedasinstruments, thefocushasbeenonlyonzerocovariance
assumptions(or (4.7) isstatedasalinear projection). Thekeyisthat (4.7) impliesthat
u
it
: t 1,...,T isaseriallyuncorrelatedsequenceandu
it
isuncorrelatedwithc
i
for all t. If
weusethesefacts, weobtainmoment conditionsfirst proposedbyAhnandSchmidt (1995) in
thecontext of theAR(1) unobservedeffectsmodel; seealsoArellanoandHonor(2001). They
canbewrittengenerallyas
Ey
i,t1
x
i,t1

y
it
x
it
0, t 3,...,T. (4.8)
Whydothesehold? Becauseall u
it
areuncorrelatedwithc
i
, andu
i,t1
,...,u
i1
are
uncorrelatedwithc
i
u
it
. Sou
i,t1
u
i,t2
isuncorrelatedwithc
i
u
it
, andtheresulting
moment conditionscanbewrittenintermsof theparametersas(4.8). Therefore, under (4.7),
wecanaddtheconditions(4.8) to(4.3) toimproveefficiency insomecasesquite
substantiallywithpersistent data.
Of course, wedonot alwaysintendfor modelstobedynamicallycompleteinthesenseof
(4.7). Often, weestimatestaticmodelsor finitedistributedlagmodels that is, modelswithout
laggeddependent variables that haveseriallycorrelatedidiosyncraticerrors, andthe
explanatoryvariablesarenot strictlyexogenousandsoGLSproceduresareinconsistent. Plus,
theconditionsin(4.8) arenonlinear inparameters.
ArellanoandBover (1995) suggestedinsteadtherestrictions
Covx
it

,c
i
0, t 2,...,T. (4.9)
Interestingly, thisisthezerocorrelation, FDversionof theconditionsfromSection2that
implywecanignoreheterogeneouscoefficientsinestimationunder strict exogeneity. Under
21
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
(4.9), wehavethemoment conditionsfromthelevelsequation:
Ex
it

y
it
o x
it
0, t 2,...,T, (4.10)
becausey
it
x
it
c
i
u
it
andu
it
isuncorrelatedwithx
it
andx
i,t1
. Weaddanintercept, o,
explicitlytotheequationtoallowanonzeromeanfor c
i
. Blundell andBond(1999) apply
thesemoment conditions, alongwiththeusual conditionsin(4.3), toestimatefirm-level
productionfunctions. Becauseof persistenceinthedata, theyfindthemomentsin(4.3) arenot
especiallyinformativefor estimatingtheparameters. Of course, (4.9) isanextraset of
assumptions.
ThepreviousdiscussioncanbeappliedtotheAR(1) model, whichhasreceivedmuch
attention. Initssimplest formwehave
y
it
y
i,t1
c
i
u
it
,t 1,...,T, (4.11)
sothat, byconvention, our first observationony isat t 0. Typicallytheminimal assumptions
imposedare
Ey
is
u
it
0, s 0,...,t 1, t 1,...,T, (4.12)
inwhichcasetheavailableinstrumentsat timet arew
it
y
i0
,...,y
i,t2
intheFDequation
y
it
y
i,t1
u
it
,t 2,...,T. (4.13)
Inother words, wecanuse
Ey
is
y
it
y
i,t1
0, s 0,...,t 2, t 2,...,T. (4.14)
AndersonandHsiao(1982) proposedpooledIV estimationof theFDequationwiththesingle
instrument y
i,t2
(inwhichcaseall T 1periodscanbeused) or y
i,t2
(inwhichcaseonly
T 2periodscanbeused). WecanusepooledIV whereT 1separatereducedformsare
22
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
estimatedfor y
i,t1
asalinear functionof y
i0
,...,y
i,t2
. Thefittedvaluesy
i,t1
, canbeused
astheinstrumentsin(4.13) inapooledIV estimation. Of course, standarderrorsandinference
shouldbemaderobust totheMA(1) serial correlationinu
it
. ArellanoandBond(1991)
suggestedfull GMM estimationusingall of theavailableinstrumentsy
i0
,...,y
i,t2
, andthis
estimator usestheconditionsin(4.12) efficiently.
Under thedynamiccompletenessassumption
Eu
it
|y
i,t1
,y
i,t2
,...,y
i0
,c
i
0, (4.15)
theAhn-Schmidt extramoment conditionsin(4.8) become
Ey
i,t1
y
i,t2
y
it
y
i,t1
0, t 3,...,T. (4.16)
Blundell andBond(1998) notedthat if thecondition
Covy
i1
,c
i
Covy
i1
y
i0
,c
i
0 (4.17)
isaddedto(4.15) thenthecombinedset of moment conditionsbecomes
Ey
i,t1
y
it
o y
i,t1
0, t 2,...,T, (4.18)
whichcanbeaddedtotheusual moment conditions(4.14). Therefore, wehavetwosetsof
momentslinear intheparameters. Thefirst, (4.14), usethedifferencedequationwhilethe
second, (4.18), usethelevels. ArellanoandBover (1995) analyzedGMM estimatorsfrom
theseequationsgenerally.
AsdiscussedbyBlundell andBond(1998), condition(4.17) canbeinterpretedasa
restrictionontheinitial condition, y
i0
. Toseewhy, write
y
i1
y
i0
y
i0
c
i
u
i1
y
i0
1 y
i0
c
i
u
i1
. Becauseu
i1
isuncorrelatedwithc
i
,
(4.17) becomes
23
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Cov1 y
i0
c
i
,c
i
0. (4.19)
Writey
i0
asadeviationfromitssteadystate, c
i
/1 (obtainedfor || 1byrecursive
substitutionandthentakingthelimit), as
y
i0
c
i
/1 r
i0
. (4.20)
Then1 y
i0
c
i
1 r
i0
, andso(4.17) reducesto
Covr
i0
,c
i
0. (4.21)
Inother words, thedeviationof y
i0
fromitssteadystateisuncorrelatedwiththesteadystate.
Blundell andBond(1998) containsdiscussionof whenthisconditionisreasonable. Of course,
it isnot for 1, andit maynot befor close toone. Ontheother hand, asshownby
Blundell andBond(1998), thisrestriction, alongwiththeAhn-Schmidt conditions, isvery
informativefor closetoone. Hahn(1999) showstheoreticallythat suchrestrictionscan
greatlyincreasetheinformationabout .
TheAhn-Schmidt conditions(4.16) areattractiveinthat theyareimpliedbythemost
natural statement of themodel, but theyarenonlinear intheparametersandthereforemore
difficult touse. Byaddingtherestrictionontheinitial condition, theextramoment condition
alsomeansthat thefull set of moment conditionsislinear. Plus, thisapproachextendsto
general modelswithonlysequentiallyexogenousvariables, asin(4.10). Extramoment
assumptionsbasedonhomoskedasticityassumptions either conditional or unconditional
havenot beenusednearlyasmuch, probablybecausetheyimposeconditionsthat havelittleif
anythingtodowiththeeconomichypothesesbeingtested.
Other approachestodynamicmodelsarebasedonmaximumlikelihoodestimationor
generalizedleast squaresestimationof aparticular set of conditional means. Approachesthat
24
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
conditionontheinitial conditiony
i0
, anapproachsuggestedbyChamberlain(1980), Blundell
andSmith(1991), andBlundell andBond(1998), seemespeciallyattractive. For example,
supposeweassumethat
Dy
it
|y
i,t1
,y
i,t2
,...,y
i1
,y
i0
,c
i
Normaly
i,t1
c
i
,o
u
2
, t 1,2,...,T.
Thenthedistributionof y
i1
,...,y
iT
giveny
i0
y
0
,c
i
c isjust theproduct of thenormal
distributions:

t1
T
o
u
T
y
t
y
t1
c/o
u
.
Wecanobtainausabledensityfor (conditional) MLE byassuming
c
i
|y
i0
~Normal
0

0
y
i0
,o
a
2
.
Theloglikelihoodfunctionfor arandomdrawi is
log

t1
T
1/o
u

T
y
it
y
i,t1
c/o
u
. 1/o
a
c
0

0
y
i0
/o
a
dc .
Of course, if theloglikelihoodrepresentsthecorrect densityof y
i1
,...,y
iT
giveny
i0
, the
MLE isconsistent and N -asymptoticallynormal (andefficient amongestimatorsthat
conditionony
i0
.
A morerobust approachistouseageneralizedleast squaresapproach, whereEy
i
|y
i0
and
Vary
i
|y
i0
areobtained, andwherethelatter couldevenbemisspecified. LikewiththeMLE
approach, thisresultsinestimationthat ishighlynonlinear intheparametersandisusedless
oftenthantheGMM procedureswithlinear moment conditions. SeeBlundell andBond(1998)
for further discussion.
Thesamekindsof moment conditionscanbeusedinextensionsof theAR(1) model, such
25
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
as
y
it
y
i,t1
z
it
y c
i
u
it
, t 1,...,T.
If wedifferencetoremovec
i
, wecanthenuseexogeneityassumptionstochooseinstruments.
TheFDequationis
y
it
y
i,t1
z
it
y u
it
, t 1,...,T,
andif thez
it
arestrictlyexogenouswithrespect tou
i1
,...,u
iT
thentheavailableinstruments
(inadditiontotimeperioddummies) arez
i
,y
i,t2
,...,y
i0
. Wemight not want touseall of z
i
for everytimeperiod. Certainlywewouldusez
it
, andperhapsalag, z
i,t1
. If weadd
sequentiallyexogenousvariables, sayh
it
, to(11.62) thenh
i,t1
,...,h
i1
wouldbeaddedtothe
list of instruments(andh
it
wouldappear intheequation). Wemight alsoaddtheArellano
andBover conditions(4.10), or at least theAhnandSchmidt conditions(4.8).
Asasimpleexampleof methodsfor dynamicmodels, consider adynamicair fareequation
for routesintheUnitedStates:
lfare
it
0
t
lfare
i,t1
, concen
it
c
i
u
it
,
whereweincludeafull set of year dummies. Weassumetheconcentrationratio, concen
it
, is
strictlyexogenousandthat at most onelagof lfare isneededtocapturethedynamics. Thedata
arefor 1997through2000, sotheequationisspecifiedfor threeyears. After differencing, we
haveonlytwoyearsof data:
lfare
it
p
t
lfare
i,t1
,concen
it
u
it
, t 1999,2000.
If weestimatethisequationbypooledOLS, theestimatorsareinconsistent becauselfare
i,t1
iscorrelatedwithu
it
; weincludetheOLSestimatesfor comparison. Weapplythesimple
pooledIV procedure, whereseparatereducedformsareestimatedfor lfare
i,t1
: onefor 1999,
26
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
withlfare
i,t2
andconcen
it
inthereducedform, andonefor 2000, withlfare
i,t2
, lfare
imt3
and
concen
it
inthereducedform. ThefittedvaluesareusedinthepooledIV estimation, with
robust standarderrors. (Weonlyuseconcen
it
intheIV list at timet.) Finally, weapplythe
ArellanoandBond(1991) GMM procedure. Thedataset canbeobtainedfromthewebsitefor
Wooldridge(2002), andiscalledAIRFARE.RAW.
Dependent Variable: lfare
(1) (2) (3)
ExplanatoryVariable PooledOLS PooledIV Arellano-Bond
lfare
1
.126 .219 .333
.027 .062 .055
concen .076 .126 .152
.053 .056 .040
N 1,149 1,149 1,149
Asisseenfromcolumn(1), thepooledOLSestimateof isactuallynegativeand
statisticallydifferent fromzero. Bycontrast, thetwoIV methodsgivepositiveandstatistically
significant estimates. TheGMM estimateof islarger, andit alsohasasmaller standarderror
(aswewouldhopefor GMM).
Thepreviousexamplehassmall T, but somepanel dataapplicationshavereasonablylarge
T. Alvarez andArellano(2003) showthat theGMM estimator that accountsfor theMA(1)
serial correlationintheFDerrorshasdesirablepropertieswhenT andN arebothlarge, while
thepooledIV estimator isactuallyinconsistent under asymptoticswhereT/N a 0. See
Arellano(2003, Chapter 6) for discussion.
5. Estimating Production Functions Using Proxy Variables
Wehavealreadycoveredtwocommonmethodsfor estimatingproductionfunctionsfrom
firm-level panel data: fixedeffectsandfirst differencing. Typically, oneassumesa
27
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Cobb-Douglasproductionfunctionwithadditivefirmheterogeneity. Unfortunately, theFE and
FDestimatorsassumestrict exogeneityof theinputs, conditional onfirmheterogeneity; see,
for example, Wooldridge(2002). Theeconomicassumptionisthat inputscannot bechosenin
responsetoproductivityshocks, asevererestrictiononfirmbehavior.
Instrumental variablesmethodscanbeusedtorelaxthestrict exogeneityassumption. In
particular, after differencingor quasi-differencing, laggedinputscanbeusedasinstruments
for changesintheinputs. Holtz-Eakin, Newey, andRosen(1988), ArellanoandBover (1995),
andBlundell andBond(2000) areexamplesof thisapproach. Unfortunately, differencing
removesmuchof thevariationintheexplanatoryvariablesandcanexacerbatemeasurement
error intheinputs. Often, theinstrumentsavailableafter differencingoftenareonlyweakly
correlatedwiththedifferencedexplanatoryvariables.
OlleyandPakes(1996) (OP for short) suggest adifferent approach. Rather thanallowfor
time-constant firmheterogeneity, OP showhowinvestment canbeusedasaproxyvariablefor
unobserved, time-varyingproductivity. Specifically, productivitycanbeexpressedasan
unknownfunctionof capital andinvestment (wheninvestment isstrictlypositive). OP present
atwo-stepestimationmethodwhere, inthefirst stage, semiparametricmethodsareusedto
estimatethecoefficientsonthevariableinputs. Inasecondstep, theparametersoncapital
inputscanbeidentifiedunder assumptionsonthedynamicsof theproductivityprocess.
LevinsohnandPetrin(2003) (LP for short) proposeamodificationof theOP approachto
addresstheproblemof lumpyinvestment. LP suggest usingintermediateinputstoproxyfor
unobservedproductivity. Their paper containsassumptionsunder whichproductivitycanbe
writtenasafunctionof capital inputsandintermediateinputs(suchasmaterialsand
electricity). AswithOP, LP proposeatwo-stepestimationmethodtoconsistentlyestimatethe
28
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
coefficientsonthevariableinputsandthecapital inputs.
InimplementingtheOP or LP approaches, it isconvenient toassumethat unknown
functionsarewell approximatedbylow-order polynomials. Petrin, Poi, andLevinsohn(2004)
(PPL for short) suggest usingthird-degreepolynomials, andLP notethat suchachoiceleadsto
estimatedparametersthat areverysimilar tolocallyweightedestimation.
Becauseof thecomplicatedtwo-stepnatureof theLP estimationmethod, theauthors
suggest usingbootstrappingmethodstoobtainstandarderrorsandtest statistics. Hereweshow
howthegeneral problemcanbeset upasatwo-equationsystemfor panel datawiththesame
dependent variable, but wheretheset of instrumentsdiffersacrossequation, asinWooldridge
(1996). Thetreatment herefollowsWooldridge(2007).
Writeaproductionfunctionfor firmi intimeperiodt as
y
it
o w
it
x
it
y v
it
e
it
,t 1,...,T, (5.1)
wherey
it
istypicallythenatural logarithmof thefirmsoutput, w
it
isa1 J vector of variable
inputs suchaslabor andx
it
isa1 K vector of observedstatevariables suchascapital
all inlogarithmicform. Thesequencev
it
: t 1,...,T isunobservedproductivity, and
e
it
: t 1,2,...,T isasequenceof shocksthat, aswewill see, areassumedtobe
conditional-meanindependent of current andpast inputs.
A keyimplicationof thetheoryunderlyingOP andLP isthat for somefunctiong,,
v
it
gx
it
,m
it
,t 1,...,T, (5.2)
wherem
it
isa1 M vector of proxyvariables. InOP, m
it
consistsof investment andinLP,
m
it
containsintermediateinputs. Initially, weassumethat g, istimeinvariant.
Under theassumption
29
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Ee
it
|w
it
,x
it
,m
it
0,t 1,2,...,T, (5.3)
wehavethefollowingregressionfunction:
Ey
it
|w
it
,x
it
,m
it
o w
it
x
it
y gx
it
,m
it

w
it
hx
it
,m
it
,t 1,...,T, (5.4)
wherehx
it
,m
it
o x
it
y gx
it
,m
it
. Sinceg, isallowedtobeageneral function in
particular, linearityinx isaspecial case y (andtheintercept, o) areclearlynot identified
from(5.4). Nevertheless at least at first sight equation(5.4) appearstoidentify. However,
thisneednot betrue, especiallyif webelievetheeconomicsthat leadsto(5.2). Particularly
problematical iswhenm
it
containsintermediateinputs, asinLP. AsshownbyAckerberg,
Caves, andFrazer (2006) (ACF for short), if labor inputsarechosenat thesametimeas
intermediateinputs, thereisafundamental identificationproblemin(5.4): w
it
isa
deterministicfunctionof x
it
,m
it
, whichmeans isnonparametricallyunidentified. Tomake
mattersworse, ACF showthat w
it
actuallydropsout of (5.4) whentheproductionfunctionis
Cobb-Douglas.
AsinOP andLP, assumethat estimationof y isalsoimportant. Inorder toidentifyy along
with, followOP andLP andstrengthen(5.3) to
Ee
it
|w
it
,x
it
,m
it
,w
i,t1
,x
i,t1
,m
i,t1
,...,w
i1
,x
i1
,m
i1
0,t 1,2,...,T. (5.5)
Assumption(5.5) canbeweakenedsomewhat inparticular, identificationcouldholdwith
just current valuesandonelagintheconditioningset but assumingconditional mean
independencegivenoutcomesat t andt 1, without alsoassuming(5.5), isadhoc.
Assumption(5.5) doesallowfor serial dependenceintheidiosyncraticshocks
e
it
: t 1,2,...,T becauseneither past valuesof y
it
nor e
it
appear intheconditioningset.
30
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Finally, weuseanassumptionrestrictingthedynamicsintheproductivityprocess,
v
it
: t 1,2,.... LP statetheassumptionas
Ev
it
|v
i,t1
,...,v
i1
Ev
it
|v
i,t1
,t 2,3,...,T, (5.6)
alongwithanassumptionthat x
it
isuncorrelatedwiththeinnovation
a
it
v
it
Ev
it
|v
i,t1
. (5.7)
Theseassumptionsarenot quiteenough. Inthesecondstageof theLP procedure, the
conditional expectationusedtoidentifyy dependsonx
i,t1
,m
i,t1
. Thus, consistencyrequires
that a
it
isadditionallyuncorrelatedwithx
i,t1
,m
i,t1
. A sufficient conditionthat mesheswell
with(5.5) is
Ev
it
|x
it
,w
i,t1
x
i,t1
,m
i,t1
,...,w
i1
,x
i1
,m
i1
Ev
it
|v
i,t1
fgx
i,t1
,m
i,t1
, (5.8)
wherethelatter equivalenceholdsfor somef becausev
i,t1
gx
i,t1
,m
i,t1
. Animportant
point isthat thevariableinputsinw
it
areallowedtobecorrelatedwiththeinnovationsa
it
, but
(5.8) meansthat x
it
, past outcomesonw
it
,x
it
,m
it
, andall functionsof theseareuncorrelated
witha
it
.
Pluggingv
it
fgx
i,t1
,m
i,t1
a
it
into(5.1) gives
y
it
o w
it
x
it
y fgx
i,t1
,m
i,t1
a
it
e
it
. (5.9)
Now, wecanspecifythetwoequationsthat identify,y:
y
it
o w
it
x
it
y gx
it
,m
it
e
it
,t 1,...,T (5.10)
and
y
it
o w
it
x
it
y fgx
i,t1
,m
i,t1
u
it
,t 2,...,T, (5.11)
whereu
it
a
it
e
it
. Importantly, theavailableorthogonalityconditionsdiffer acrossthese
31
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
twoequations. In(5.10), theorthogonalityconditionontheerror isgivenby(5.5). The
orthogonalityconditionsfor (5.11) are
Eu
it
|x
it
,w
i,t1
x
i,t1
,m
i,t1
,...,w
i1
,x
i1
,m
i1
0,t 2,...,T. (5.12)
Inother words, in(5.10) and(5.11) wecanusethecontemporaneousstate(capital) variables,
x
it
, anylaggedinputs, andfunctionsof these, asinstrumental variables. In(5.10) wecan
further addtheelementsof m
it
(investment or intermediateinputs).
IntheACF setting, where(5.10) doesnot identify, (5.11) wouldstill generallyidentify
andy providedwehavetheorthogonalityconditionsin(5.12). Effectively, x
it
, x
i,t1
, andm
i,t1
act astheir owninstrumentsandw
i,t1
actsasaninstrument for w
it
. Then, (5.11) canbe
estimatedbyaninstrumental variablesversionof Robinsons(1988) estimator toallowf andg
tobecompletelyunspecified. Semykina(2006) proposedsuchamethodinthecontext of
sampleselectioncorrectionsinpanel datawithendogenousexplanatoryvariables.
A simpler approachistoapproximateg, andf in(5.10) and(5.11) bylow-order
polynomials. Inimplementingtheir two-stepmodificationof OP, LP findthird-degree
polynomialsworkaswell aslocal smoothing. So, if x
it
andm
it
arebothscalars, gx,m is
linear intermsof theformx
p
m
q
, wherep andq arenonnegativeintegerswithp q 3.
Moregenerally, gx,m containsall polynomialsof order threeor less. Inanycase, assume
that wecanwrite
gx
it
,m
it
z
0
cx
it
,m
it
k (5.13)
for a1 Q vector of functionscx
it
,m
it
. I assumethat cx
it
,m
it
containsat least x
it
andm
it
separately, sincealinear versionof gx
it
,m
it
shouldalwaysbeanallowedspecial case.
Further, assumethat f canbeapproximatedbyapolynomial inv:
32
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
fv
0

1
v ...
G
v
G
. (5.14)
Whenweplugthesechoicesinto(5.10) and(5.11), it isevident that neither theoriginal
intercept o nor theinterceptsz
0
and
0
areidentified.
Giventhefunctionsin(5.13) and(5.14), wenowhave
y
it
o
0
w
it
x
it
y c
it
k e
it
,t 1,...,T (5.15)
and
y
it
p
0
w
it
x
it
y
1
c
i,t1
k ...
G
c
i,t1
k
G
u
it
,t 2,...,T, (5.16)
whereo
0
andp
0
arethenewinterceptsandweusethenotationc
it
cx
it
,m
it
. Given(5.5)
and(5.12), wecaneasilyspecifyinstrumental variables(IVs) for eachof thesetwoequations.
Themost straightforwardchoiceof IVsfor (5.15) issimply
z
it1
1,w
it
,x
it
,c
it
o
, (5.17)
wherec
it
o
isc
it
but without x
it
. Thechoicein(5.17) correspondstotheregressionanalysisin
OP andLP for estimating inafirst stage. Of course, under (5.5), anynonlinear functionof
w
it
,x
it
,c
it
o
isalsoavalidIV, asareall lagsandall functionsof theselags. Addingalagcould
beuseful for generatingoveridentifyingrestrictionstotest themodel assumptions, particularly
(5.2).
Instrumentsfor (5.16) wouldincludex
it
,w
i,t1
,c
i,t1
and, especiallyif G 1, nonlinear
functionsof c
i,t1
(probablylow-order polynomials). Lagsmorethanoneperiodbackarevalid,
too, but addingmorelagscanbecostlyintermsof lost initial timeperiods. So, write
z
it2
1,x
it
,w
i,t1
,c
i,t1
,q
i,t1
, (5.18)
whereq
i,t1
isaset of nonlinear functionsof c
i,t1
, probablyconsistingof low-order
33
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
polynomials.
Wecaneasilyverifythat wehaveenoughmoment conditionstoidentifythe
2 J K Q G parametersin(5.16). Infact, wecanidentifytheparameters andy off
equation(5.16). Asremarkedearlier, x
it
,w
i,t1
,c
i,t1
wouldact astheir owninstruments, and
thenwewouldincludeenoughnonlinear functionsinq
i,t1
toidentify
1
,...,
G
.
A keydifferencebetween(5.17) and(5.18) isthat z
it2
doesnot containcontemporaneous
valuesof w
it
andm
it
. Onepossibilityistochoose, for eachi andt, amatrixof instruments,
withtworows, as
Z
it

w
it
,c
it
,z
it2
0
0 z
it2
,t 2,...,T. (5.19)
Thischoicemakesit clear that all instrumentsavailablefor (5.17) arealsovalidfor (5.18), and
wehavesomeadditional moment restrictionsin(3.4).
GMM estimationof all parametersin(5.15) and(5.16) isnowstraightforward. For each
t 1, definea2 1residual functionas
r
it
0
r
it1
0
r
it2
0

y
it
o
0
w
it
x
it
y c
it
k
y
it
p
0
w
it
x
it
y
1
c
i,t1
k ...
G
c
i,t1
k
G
, (5.20)
sothat
EZ
it

r
it
0 0,t 2,...,T. (5.21)
Then, theseT 1conditionscanbestackedfor eachi, andstandardGMM estimationcanbe
used; see, for example, Wooldridge(1996, 2002, Chapter 14).
Interestingly, inoneleadingcase namely, that productivityfollowsarandomwalkwith
drift themoment conditionsarelinear intheparameters. UsingG 1and
1
1, the
34
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
residual functionsbecomer
it1
0 y
it
o
0
w
it
x
it
y c
it
k and
r
it2
0 y
it
p
0
w
it
x
it
y c
i,t1
k, whichresultsinaparticularlystraightforwardGMM
estimationproblem. Infact, wecanwritethesystemasy
it
X
it
0 r
it
, wherey
it
isthe2 1
vector withy
it
inbothelements,
X
it

1 0 w
it
x
it
c
it
0 1 w
it
x
it
c
i,t1
, (5.22)
and0 o
0
,p
0
,

,y

,k

. WecanchooseZ
it
asin(5.19). Identificationdoesnot require
includingq
i,t1
inz
it2
, but wemight includeq
i,t1
amongtheinstrumentsandtest theseveral
overidentifyingrestrictions.
6. Pseudo Panels from Pooled Cross Sections
Incaseswherepanel datasetsarenot available, wecanstill estimateparametersinan
underlyingpanel populationmodel if wecanobtainrandomsamplesindifferent periods.
Manysurveysaredoneannuallybyobtainingadifferent random(or stratified) samplefor each
year. Deaton(1985) showedhowtoidentifyandestimateparametersinpanel datamodels
frompooledcrosssections. Aswewill see, however, identificationof theparametersecanbe
tenuous.
Deaton(1985) wascareful about distinguishingbetweenthepopulationmodel ontheone
handandthesamplingschemeontheother. Thisdistinctioniscritical for understandingthe
natureof theidentificationproblem, andindecidingtheappropriateasymptoticanalysis. The
recent literaturehastendedtowritemodels at thecohort or grouplevel, whichisnot inthe
spirit of Deatonsoriginal work. (Angrist (1991) actuallyhaspanel data, but usesaveragesin
eachtimeperiodtoestimateparametersof alabor supplyfunction.)
35
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Inwhat follows, weareinterestedinestimatingtheparametersof thepopulationmodel
y
t
p
t
x
t
f u
t
, t 1,...,T, (6.1)
whichisbest viewedasrepresentingapopulationdefinedover T timeperiods. For thissetupto
makesense, it must bethecasethat wecanthinkof astationarypopulation, sothat thesame
unitsarerepresentedineachtimeperiod. Becauseweallowafull set of periodintercepts, Ef
isnever separatelyidentified, andsowemight aswell set it tozero.
Therandomquantitiesin(6.1) aretheresponsevariable, y
t
, thecovariates, x
t
(a1 K
vector), theunobservedeffect, f, andtheunobservedidiosyncraticerrors, u
t
: t 1,...,T.
Likeour previousanalysis, wearethinkingof applicationswithasmall number of time
periods, andsoweviewtheintercepts, p
t
, asparameterstoestimate, alongwiththeK 1
vector parameter whichisultimatelyof interest. Weconsider thecasewhereall elements
of x
t
havesometimevariation.
Asit turnsout, tousethestandardanalysis, wedonot evenhavetoassume
contemporaneousexogeneityconditional onf, that is,
Eu
t
|x
t
,f 0,t 1,...,T, (6.2)
althoughthisisagoodstartingpoint todeterminereasonablepopulationassumptions.
Naturally, iteratedexpectationsimplies
Eu
t
|f 0,t 1,...,T, (6.3)
and(6.3) issensibleinthecontext of (6.1). Fromhereon, wetakeit tobetrue. Becausef
aggregatesall time-constant unobservables, weshouldthinkof (6.3) asimplyingthat
Eu
t
|g 0for anytime-constant variableg, whether unobservedor observed. Inother words,
intheleadingcaseweshouldthinkof (6.1) asrepresentingEy
t
|x
t
,f whereanytimeconstant
36
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
factorsarelumpedintof.
Witha(balanced) panel dataset, wewouldhavearandomsampleinthecrosssection.
Therefore, for arandomdrawi , x
it
,y
it
,t 1,...,T, wewouldthenwritethemodel as
y
it
p
t
x
it
f
i
u
it
, t 1,...,T. (6.4)
Whilethisnotationcancauseconfusionlater whenwesamplefromeachcrosssection, it has
thebenefit of explicitlylabellingquantitiesaschangingonlyacrosst, changingonlyacrossi,
or changingacrossboth.
Theideaof usingindependent crosssectionstoestimateparametersfrompanel data
modelsisbasedonasimpleinsight of Deatons. Assumethat thepopulationfor which(6.1)
holdsisdividedintoG groups(or cohorts). Thisdesignationcannot dependontime. For
example, it iscommontobirthyear todefinethegroups, or evenrangesof birthyear. For a
randomdrawi satisfying(6.4), let g
i
bethegroupindicator, takingonavaluein1,2,...,G.
Then, byour earlier discussion,
Eu
it
|g
i
0,t 1,...,T, (6.5)
essentiallybydefinition. Inother words, thep
t
account for anychangeintheaverage
unobservablesover timeandf
i
accountsfor anytime-constant factors.
Takingtheexpectedvalueof (6.4) conditional ongroupmembershipandusingonly(6.5),
wehave
Ey
it
|g
i
g p
t
Ex
it
|g
i
g Ef
i
|g
i
g, t 1,...,T. (6.6)
Again, thisexpressionrepresentsanunderlyingpopulation, but wherewehavepartitionedthe
populationintoG groups.
Several authorsafter Deaton, includingCollado(1998) andVerbeekandVella(2005),
37
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
haveleft Eu
it
|g
i
g aspart of theerror term, withthenotationu
gt

Eu
it
|g
i
g. Infact,
theseauthorshavecriticizedpreviousworkbyMoffitt (1993) for makingtheassumption
that u
gt

0. But, asDeatonshowed, if westart withtheunderlyingpopulationmodel (6.1),


thenEu
it
|g
i
g 0for all g followsdirectly. Nevertheless, aswewill discusslater, thekey
assumptionisthat thestructural model (6.1) doesnot requireafull set of group/timeeffects. If
sucheffectsarerequired, thenonewaytothinkabout theresultingmisspecificationisthat
Eu
it
|g
i
g isnot zero.
If wedefinethepopulationmeans
o
g
Ef
i
|g
i
g
j
gt
y
Ey
it
|g
i
g

gt
x
Ex
it
|g
i
g
(6.7)
for g 1,...,G andt 1,...,T wehave
j
gt
y
p
t

gt
x
o
g
, g 1,...,G, t 1,...,T. (6.8)
(Manyauthorsusethenotationy
gt

inplaceof j
gt
y
, andsimilarlyfor
gt
x
, but, at thispoint, such
anotationgivesthewrongimpressionthat themeansdefinedin(6.7) arerandomvariables.
Theyarenot. Theyaregroup/timemeansdefinedontheunderlyingpopulation.)
Equation(6.8) isremarkableinthat it holdswithout anyassumptionsrestrictingthe
dependencebetweenx
it
andu
ir
acrosst andr. Infact, x
it
cancontainlaggeddependent
variables, most commonlyy
i,t1
, or explanatoryvariablesthat arecontemporaneously
endogenous(asoccursunder measurement error intheoriginal populationmodel, anissuethat
wasimportant toAngrist (1991)). Thisprobablyshouldmakeusalittlesuspicious, asthe
problemsof laggeddependent variable, measurement error, andother violationsof strict
exogeneityaretrickytohandlewithtruepanel data.
38
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
(Inestimation, wewill deal withthefact that therearenot reallyT G parametersinp
t
ando
g
toestimate; thereareonlyT G 1. Thelost degreeof freedomcomesfromEf 0,
whichputsarestrictionontheo
g
. Withthegroupsof thesamesizeinthepopulation, the
restrictionisthat theo
g
sumtozero.)
If wetake(6.8) asthestartingpoint for estimating (alongwithp
t
ando
g
, thentheissues
becomefairlyclear. If wehavesufficient observationsinthegroup/timecells, thenthemeans
j
gt
y
and
gt
x
canbeestimatedfairlyprecisely, andthesecanbeusedinaminimumdistance
estimationframeworktoestimate0, where0 consistsof , q, ando (where, say, weset p
1
0
asthenormalization).
Beforediscussingestimationdetails, it isuseful tostudy(6.8) inmoredetail todetermine
somesimple, andcommon, strategies. Because(6.8) looksitself likeapanel dataregression
equation, methodssuchasOLS, fixedeffects, andfirst differencing havebeenapplied
tosampleaverages. It isinformativetoapplythesetothepopulation. First supposethat weset
eacho
g
tozeroandset all of thetimeintercepts, p
t
, tozero. For notational simplicity, wealso
dropanoverall intercept, but that wouldbeincludedat aminimum. Thenj
gt
y

gt
x
andif
wepremultiplyby
gt
x
, averageacrossg andt, andthenassumewecaninvert

g1
G

t1
T

gt
x

gt
x
, wehave


g1
G

t1
T

gt
x

gt
x
1

g1
G

t1
T

gt
x
j
gt
y
. (6.9)
Thismeansthat thepopulationparameter, , canbewrittenasapooledOLSregressionof the
populationgroup/timemeansj
gt
y
onthegroup/timemeans
gt
x
. Naturally, if wehavegood
estimatesof thesemeans, thenit will makesensetoestimate byusingthesameregressionon
39
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
thesamplemeans. But, sofar, thisisall inthepopulation. Wecanthinkof (6.9) asthebasis
for amethodof momentsprocedure. It isimportant that wetreat
gt
x
andj
gt
y
symmetrically,
that is, aspopulationmeanstobeestimated, whether thex
it
arestrictly, sequentially, or
contemporaneousexogenous or noneof these intheoriginal model.
Whenweallowdifferent groupmeansfor f
i
, asseemscritical, anddifferent timeperiod
intercepts, whichalsoisnecessaryfor aconvincinganalysis, wecaneasilywrite asan
OLS estimator bysubtractingof timeandgroupaverages. Whilewecannot claimthat these
expressionswill result inefficient estimators, theycanshedlight onwhether wecanexpect
(6.8) toleadtopreciseestimationof . First, without separatetimeinterceptswehave
j
gt
y
j g
y

gt
x

g
x
, g 1,...,G,; t 1,...,T, (6.10)
wherethenotationshouldbeclear, andthenoneexpressionfor is(6.9) but with
gt
x

g
x
in
placeof
gt
x
. Of course, thismakesit clear that identificationof moredifficult whentheo
g
areallowedtodiffer. Further, if weaddintheyear intercepts, wehave


g1
G

t1
T

gt
x

gt
x
1

g1
G

t1
T

gt
x
j
gt
y
(6.11)
where
gt
x
isthevector of residualsfromthepooledregression

gt
x
on1, d2,...,dT, c2, ..., cG, (6.12)
wheredt denotesadummyfor periodt andcg isadummyvariablefor groupg.
Thereareother expressionsfor , too. (Because isgenerallyoveridentified, thereare
manywaystowriteit intermsof thepopulationmoments. For example, if wedifferenceand
thentakeawaygroupaverages, wehave
40
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08


g1
G

t2
T

gt
x

gt
x
1

g1
G

t2
T

gt
x
j
gt
y
(6.13)
where
gt
x

gt
x

g,t1
x
and
gt
x

gt
x
G
1

h1
G

ht
x
.
Equations(6.11) and(6.13) makeit clear that theunderlyingmodel inthepopulation
cannot containafull set of group/timeinteractions. So, for example, if thegroups(cohorts) are
definedbybirthyear, therecannot beafull set of birthyear/timeperiodinteractions. Wecould
allowthisfeaturewithindividual-level databecausewewouldtypicallyhavevariationinthe
covariateswithineachgroup/periodcell. Thus, theabsenceof full cohort/timeeffectsinthe
populationmodel isthekeyidentifyingrestriction.
Evenif weexcludefull group/timeeffects, maynot bepreciselyestimable. Clearly is
not identifiedif wecanwrite
gt
x
k
t

g
for vectorsk
t
and
g
, t 1,...,T, g 1,...,G. In
other words, whilewemust excludeafull set of group/timeeffectsinthestructural model, we
needsomeinteractionbetweentheminthedistributionof thecovariates. Onemight beworried
about thiswayof identifying. But evenif weaccept thisidentificationstrategy, thevariation
in
gt
x
: t 1,..,T, g 1,...,G or
gt
x
: t 2,..,T, g 1,...,G might not besufficient
tolearnmuchabout evenif wehaveprettygoodestimatesof thepopulationmeans.
Wearenowreadytoformallydiscussestimationof . Wehavetwoformulas(andthere
aremanymore) that canbeuseddirectly, onceweestimatethegroup/timemeansfor y
t
andx
t
.
Wecanuseeither truepanel dataor repeatedcrosssections. Angrist (1991) usedpanel data
andgroupedthedatabytimeperiod(after differencing). Our focushereisonthecasewhere
wedonot havepanel data, but thegeneral discussionappliestoeither case. Onedifferenceis
that, withindependent crosssections, weneednot account for dependenceinthesample
41
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
averagesacrossg andt (except inthecaseof dynamicmodels morelater).
Assumewehavearandomsampleonx
t
,y
t
of sizeN
t
, andwehavespecifiedtheG
groupsor cohorts. Writex
it
,y
it
: i 1,...,N
t
. Someauthors, wantingtoavoidconfusion
withatruepanel dataset, prefer toreplacei withit toemphasizethat thecrosssectionunits
aredifferent ineachtimeperiod. (Plus, several authorsactuallywritetheunderlyingmodel in
termsof thepooledcrosssectionsrather thanusingtheunderlyingpopulationmodel a
mistake, inmyview.) Aslongasweunderstandthat wehavearandomsampleineachtime
period, andthat randomsampleisusedtoestimatethegroup/timemeans, thereshouldbeno
confusion.
For eachrandomdrawi, it isuseful tolet r
i
r
it1
,r
it2
,...,r
itG
beavector of group
indicators, sor
itg
1if observationi isingroupg. Thenthesampleaverageontheresponse
variableingroup/timecell g,t canbewrittenas
j
gt
y
N
gt
1

i1
Nt
r
itg
y
it
N
gt
/N
t

1
N
t
1

i1
Nt
r
itg
y
it
, (6.14)
whereN
gt

i1
Nt
r
itg
isproperlytreatedasarandomoutcome. (Thisdiffersfromstandard
stratifiedsampling, wherethegroupsarefirst chosenandthenrandomsamplesareobtained
withineachgroup(stratum). Here, wefixthegroupsandthenrandomlysamplefromthe
population, keepingtrackof thegroupfor eachdraw.) Of course, j
gt
y
isgenerallyconsistent for
j
gt
y
. First,
gt
N
gt
/N
t
convergesinprobabilityto
g
Pr
itg
1 thefractionof the
populationingroupor cohort g (whichissupposedtobeconstant acrosst). So
42
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08

gt
1
N
t
1

i1
Nt
r
itg
y
it
p

g
1
Er
itg
y
it


g
1
Pr
itg
1 0 Pr
itg
1Ey
it
|r
itg
1
Ey
it
|r
itg
1 j
gt
y
.
Naturally, theargument for other meansisthesame. Let w
it
denotetheK 1 1vector
y
it
,x
it

. Thentheasymptoticdistributionof thefull set of meansiseasytoobtain:


N
t

gt
w

gt
w
Normal0,
g
1
O
gt
w
,
where
gt
w
isthesampleaveragefor group/timecell g,t and
O
gt
w
Varw
t
|g
istheK 1 K 1 variancematrixfor group/timecell g,t. Whenwestackthemeans
acrossgroupsandtimeperiods, it ishelpful tohavetheresult
N
gt
w

gt
w
Normal0,
g
k
t

1
O
gt
w
, (6.15)
whereN
t1
T
N
t
andk
t

N
limN
t
/N is, essentially, thefractionof all observations
accountedfor bycrosssectiont. Of course,
g
k
t
isconsistentlyestimatedbyN
gt
/N, andso, the
implicationof (6.15) isthat thesampleaveragefor cell g,t getsweightedbyN
gt
/N, the
fractionof all observationsaccountedfor bycell g,t.
Inimplementingminimumdistanceestimation, weneedaconsistent estimator of O
gt
w
, and
thegroup/timesamplevarianceservesthat purpose:
O

gt
w
N
gt
1

i1
Nt
r
itg
w
it

gt
w
w
it

gt
w

p
O
gt
w
. (6.16)
Nowlet v bethevector of all cell means. For eachg,t, thereareK 1means, andsov is
aGTK 1 1vector. It makessensetostackv startingwiththeK 1meansfor g 1,
43
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
t 1, g 1, t 2, ..., g 1, t T, ..., g G, t 1, ..., g G, t T. Now, the
gt
w
arealways
independent acrossg becauseweassumerandomsamplingfor eacht. Whenx
t
doesnot
containlagsor leads, the
gt
w
areindependent acrosst, too. (Whenweallowfor lagsof the
responsevariableor explanatoryvariables, wewill adjust thedefinitionof v andthemoment
conditions. Thus, wewill alwaysassumethat the
gt
w
areindependent acrossg andt.) Then,
N v v Normal0,O, (6.17)
whereO istheGTK 1 GTK 1 blockdiagonal matrixwithg,t blockO
gt
w
/
g
k
t
.
Notethat O incorporatesbothdifferent cell variancematricesaswell asthedifferent
frequenciesof observations.
Theset of equationsin(6.8) constitutetherestrictionson, q, ando. Let 0 bethe
K T G 1 vector of theseparameters, writtenas
0

,q

,o

.
ThereareGTK 1 restrictionsinequations(6.8), so, ingeneral, therearemany
overidentifyingrestrictions. Wecanwritetheset of equationsin(6.8) as
hv,0 0, (6.18)
whereh, isaGTK 1 1vector. Becausewehave N -asymptoticallynormal estimator
v , aminimumdistanceapproachsuggestsitself. It isdifferent fromtheusual MDproblem
becausetheparametersdonot appear inaseparableway, but MDestimationisstill possible.
Infact, for thecurrent application, hv,0 islinear ineachargument, whichmeansMD
estimatorsof 0 areinclosedform.
Beforeobtainingtheefficient MDestimator, weneed, becauseof thenonseparability, an
initial consistent estimator of 0. Probablythemost straightforwardisthefixedeffects
44
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
estimator describedabove, but whereweestimateall componentsof 0. Theestimator usesthe
just identifiedset of equations.
For notational simplicity, let
gt
denotetheK 1 1vector of group/timemeansfor
eachg,t cell. Thenlet
gt
betheK T G 1 1vector
gt
x
,d
t
,c
g

, whered
t
isa
1 T 1 vector of timedummiesandc
g
isa1 G vector of groupdummies. Thenthe
moment conditionsare

g1
G

t1
T

gt

gt

0

g1
G

t1
T

gt
j
gt
y
0. (6.19)
Whenwepluginv that is, thesampleaveragesfor all g,t, then0
`
isobtainedasthe
so-calledfixedeffects estimator withtimeandgroupeffects. Theequationscanbewrittenas
qv ,0
`
0, (6.20)
andthisrepresentationcanbeusedtofindtheasymptoticvarianceof N 0
`
0; naturally, it
dependsonA andisstraightforwardtoestimate.
But thereisapracticallyimportant point: thereisnothingnonstandardabout theMD
problem, andbootstrappingisjustifiedfor obtainingasymptoticstandarderrorsandtest
statistics. (Inoue(2008) assertsthat theunconditional limitingdistributionof N 0
`
0 is
not standard, but that isbecausehetreatsthesamplemeansof thecovariatesandof the
responsevariabledifferently; ineffect, heconditionsontheformer.) Thebootstrappingis
simple: resampleeachcrosssectionseparately, findthenewgroupsfor thebootstrapsample,
andobtainthefixedeffects estimates. It makesnosenseheretoresamplingthegroups.
Becauseof thenonlinear waythat thecovariatemeansappear intheestimation, the
bootstrapmaybepreferred. Theusual asymptoticnormal approximationobtainedfrom
45
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
first-order asymptoticsmaynot beespeciallygoodinthiscase, especiallyif
g1
G

t1
T

gt
x

gt
x
isclosetobeingsingular, inwhichcase ispoorlyidentified. (Inoue(2008) providesevidence
that thedistributionof theFE estimator, andwhat hecallsaGMM estimator that accounts
for different cell samplesizes, donot appear tobenormal evenwithfairlylargecell sizes. But
hissetupfor generatingthedataisdifferent inparticular, hespecifiesequationsdirectlyfor
therepeatedcrosssections, andthat ishowhegeneratesdata. Asmentionedabove, his
asymptoticanalysisdiffer fromtheMDframework, andimpliesnonnormal limiting
distributions. If randomsamplesaredrawnfromeachpopulation, thecell sizesarereasonably
large, andthereissufficient variationin
gt
x
, theminimumdistanceestimatorsshouldhave
reasonablefinite-sampleproperties. But becausethelimitingdistributiondependsonthe
N
t

gt
x

gt
x
, whichappear inahighlynonlinear way, asymptoticnormal approximation
might still bepoor.
Withtherestrictionswrittenasin(6.18), Chamberlain(lecturenotes) showsthat the
optimal weightingmatrixistheinverseof

v
hv,0O
v
hv,0

, (6.21)
where
v
hv,0 istheGT GTK 1 J acobianof hv,0 withrespect tov. (Inthestandard
case,
v
hv,0 istheidentitymatrix.) Wealreadyhavetheconsistent estimator of v thecell
averages weshowedhowtoconsistentlyestimateO inequations(6.16), andwecanuse0
`
as
theinitial consistent estimator of 0.

v
hv,0
v
h I
GT
1,

. Therefore,
v
hv,0O
v
hv,0 isablockdiagonal
matrixwithblocks
1,

g
k
t

1
O
gt
w
1,

. (6.22)
46
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
But
t
gt
2
1,

O
gt
w
1,

Vary
t
x
t
|g, (6.23)
andaconsistent estimator issimply
N
gt
1

i1
Nt
r
itg
y
it
x
it

`
p
t
o
g

2
istheresidual varianceestimatedwithincell g,t.
Now,
0
hv,0 Wv, theGT K T G 1 matrixof regressors intheFE
estimation, that is, therowsof Wv are
gt

gt
x
,d
t
,c
g
. Now, theFOC for theoptimal MD
estimator is

1
0


y
0,
andso
0

y
. (6.24)
So, asinthestandardcases, theefficient MDestimator lookslikeaweightedleast squares
estimator. Theestimatedasymptoticvarianceof 0

, followingChamberlain, isjust

1
/N. Because
1
isthediagonal matrixwithentriesN
gt
/N/t
gt
2
, it iseasyto
weight eachcell g,t andthencomputeboth0

anditsasymptoticstandarderrorsviaa
weightedregression; fullyefficient inferenceisstraightforward. But onemust computethet
gt
2
usingtheindividual-level dataineachgroup/timecell.
It iseasilyseenthat theso-calledfixedeffects estimator, 0
`
, is
0
`

y
, (6.25)
that is, it usestheidentitymatrixastheweightingmatrix. FromChamberlain(lecturenotes),
47
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
theasymptoticvarianceof 0
`
isestimatedas

1
, where isthematrix
describedabovebut with
`
usedtoestimatethecell variances. (Note: Thismatrixcannot be
computedbyjust usingtheheteroskedasticity-robust standarderrorsintheregressj
gt
y
on
gt
x
,
d
t
, c
g
.) Becauseinferenceusing0
`
requirescalculatingthegroup/timespecificvariances, we
might aswell usetheefficient MDestimator in(6.24).
Of course, after theefficient MDestimation, wecanreadilycomputetheoveridentifying
restrictions, whichwouldberejectedif theunderlyingmodel needstoincludecohort/time
effectsinaricher fashion.
A fewremainingcommentsareinorder. First, several papers, includingDeaton(1985),
VerbeekandNijman(1993), andCollado(1998), useadifferent asymptoticanalysis. Inthe
current notation, GT (Deaton) or G , withthecell sizesfixed. Theseapproaches
seemsunnatural for thewaypseudopanelsareconstructed, andthethought experiment about
howonemight samplemoreandmoregroupsisconvoluted. WhileT conceptuallymakes
sense, it isstill thecasethat theavailablenumber of timeperiodsismuchsmaller thanthe
crosssectionsamplesizesfor eachT. McKenzie(2004) hasshownthat estimatorsderived
under largeG asymptoticscanhavegoodpropertiesunder theMDasymptoticsusedhere. One
waytoseethisisthat theIV estimatorsproposedbyCollado(1998), VerbeekandVella
(2005), andothersarejust different waysof usingthepopulationmoment conditionsin(6.8).
(Someauthorsappear towant it bothways. For example, VerbeekandNijman(1993) use
largeG asymptotics, but treat thewithin-cell variancesandcovariancesasknown. Thisstance
assumesthat onecanget preciseestimatesof thesecondmomentswithineachcell, which
meansthat N
gt
shouldbelarge.)
Basingestimationon(6.8) andusingminimumdistance, assuminglargecell sizes, makes
48
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
applicationtomodelswithlagsrelativelystraightforward. Theonlydifferencenowisthat the
vectorsof means,
gt
w
: g 1,...,G;t 1,...,T nowcontainredundancies. (Inother
approachestotheproblem, for exampleCollado(1998), McKenzie(2004), theproblemwith
addingy
t1
tothepopulationmodel isthat it generatescorrelationintheestimatingequation
basedonthepooledcrosssections. Here, thereisnoconceptual distinctionbetweenhaving
exogenousor endogenouselementsinx
t
; all that mattersishowaddingonemodifiestheMD
moment conditions. Asanexample, supposewewrite
y
t
p
t
y
t1
z
t
y f u
t
Eu
t
|g 0, g 1,...,G
(6.26)
whereg isthegroupnumber. Then(6.8) isstill valid. But, nowwewoulddefinethevector of
meansasj
gt
y
,
gt
z
, andappropriatelypickoff j
gt
y
indefiningthemoment conditions. The
alternativeistodefine
gt
x
toincludej
g,t1
y
, but thisresultsinasingularityintheasymptotic
distributionof v . It ismuchmorestraightforwardtokeeponlynonredundant elementsinv and
readjust howthemoment conditionsaredefinedintermsof v. Whenwetakethat approach, it
becomesclear that wenowhavefewer momentstoestimatetheparameters. If z
t
is1 J, we
havenowhaveJ T G parameterstoestimatefromGTJ 1 populationmoments. Still, we
haveaddedjust onemoreparameter.
Tothebest of myknowledge, thetreatment hereisthefirst tofollowtheMDapproach,
appliedto(6.8), toitslogical conclusion. Itsstrengthisthat theestimationmethodiswidely
knownandused, andit separatestheunderlyingpopulationmodel fromsamplingassumptions.
It alsoshowswhyweneednot makeanyexogeneityassumptionsonx
t
. Perhapsmost
importantly, it revealsthekeyidentificationcondition: that separategroup/timeeffectsarenot
neededintheunderlyingmodel, but enoughgroup/timevariationinthemeansEx
t
|g is
49
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
neededtoidentifythestructural parameters. Thissort of conditionfallsout of other approaches
totheproblem, suchastheinstrumental variablesapproachof but it isharder tosee. For
example, VerbeekandVella(2005) proposeinstrumental variablesmethodsontheequationin
timeaveragesusinginteractionsbetweengroup(cohort) andtimedummies. Withafull set of
separatetimeandgroupeffectsinthemainequation derivableherefromthepopulation
panel model thekeyidentificationassumptionisthat afull set of group/timeeffectscanbe
excludedfromthestructural equation, but themeansof thecovariateshavetovarysufficiently
acrossgroup/time. That isexactlytheconclusionwereachwithaminimumdistanceapproach.
Interestingly, theMDapproacheasilyappliestoextensionsof thebasicmodel. For
example, wecanallowfor unit-specifictimetrends(asintherandomgrowthmodel of
HeckmanandHotz (1989)):
y
t
p
t
x
t
f
1
f
2
t u
t
, (6.27)
where, for arandomdrawi, theunobservedheterogeneityisof theformf
i1
f
i2
t. Then, using
thesameargumentsasbefore,
j
gt
y
p
t

gt
x
o
g

g
t, (6.28)
andthisset of moment conditionsiseasilyhandledbyextendingthepreviousanalysis. Wecan
evenestimatemodelswithtime-varyingfactor loadsontheheterogeneity:
y
t
p
t
x
t
z
t
f u
t
,
wherez
1
1(say) asanormalization. Nowthepopulationmomentssatisfy
j
gt
y
p
t

gt
x
z
t
o
g
.
TherearenowK G 2T 1 freeparameterstoestimatefromGTK 1 moments. This
extensionmeansthat theestimatingequationsallowthegroup/timeeffectstoenter more
50
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
flexibly(although, of course, wecannot replacep
t
z
t
o
g
withunrestrictedgroup/time
effects.) TheMDestimationproblemisnownonlinear becauseof theinteractionterm, z
t
o
g
.
Withmoreparametersandperhapsnot muchvariationinthe
gt
x
, practical implementationmay
beaproblem, but thetheoryisstandard.
Thisliteraturewouldbenefit fromacareful simulationstudy, wheredatafor eachcross
sectionaregeneratedfromtheunderlyingpopulationmodel, andwhereg
i
thegroup
identifier israndomlydrawn, too. Toberealistic, theunderlyingmodel shouldhavefull time
effects. VerbeekandVella(2005) comeclose, but theyomit aggregatetimeeffectsinthemain
model whilegeneratingtheexplanatoryvariablestohavemeansthat differ bygroup/timecell.
Probablythispaintstoooptimisticapicturefor howwell theestimatorscanworkinpractice.
Remember, evenif wecanget preciseestimatesof thecell means, thevariationin
gt
x
acrossg
andt might not beenoughtotiedown precisely.
Finally, wecanreturntothecomment about howthemoment conditionsin(6.8) onlyuse
theassumptionEu
t
|g 0for all t andg. It seemslikelythat weshouldbeabletoexploit
contemporaneousexogeneityassumptions. Let z
t
beaset of observedvariablessuchthat
Eu
t
|z
t
,f 0, t 1,...,T. (Inatruepanel, thesevaryacrossi andt. Wemight havez
t
x
t
,
but perhapsz
t
isjust asubset of x
t
, or wehaveextrainstruments.) Thenwecanaddto(6.8) the
moment conditions
Ez
t

y
t
|g p
t
Ez
t
|g Ez
t

x
t
|g Ez
t

f|g Ez
t

u
t
|g
p
t
Ez
t
|g Ez
t

x
t
|g Ez
t

f|g, (6.29)
whereEz
t

u
t
|g 0 whenweviewgroupdesignationascontainedinf. Themoments
Ez
t

y
t
|g, Ez
t
|g, andEz
t

x
t
|g canall beestimatedbyrandomsamplesfromeachcross
section, whereweaveragewithingroup/timeperiod. (Thiswouldnot workif x
t
or z
t
contains
51
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
lags.) Thiswouldappear toaddmanymoremoment restrictionsthat shouldbeuseful for
identifying, but that dependsonwhat weassumeabout theunobservedmomentsEz
t

f|g.
52
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
References
Ackerberg, D., K. Caves, andG. Frazer (2006), Structural Identificationof Production
Functions, mimeo, UCLA Department of Economics.
Ahn, S.C. Y.H. Lee, andP. Schmidt (2001), GMM Estimationof Linear Panel Data
ModelswithTime-VaryingIndividual Effects, Journal of Econometrics 101, 219-255.
Ahn, S.C. andP. Schmidt (1995), Efficient Estimationof Modelsfor DynamicPanel
Data, Journal of Econometrics 68, 5-27.
Alvarez, J . andM. Arellano(2003), TheTimeSeriesandCross-SectionAsymptoticsof
DynamicPanel DataEstimators, Econometrica 71, 1121-1159
Anderson, T.W. andC. Hsiao(1982), FormulationandEstimationof DynamicModels
UsingPanel Data, Journal of Econometrics 18, 47-82.
Angrist, J .D. (1991), Grouped-DataEstimationandTestinginSimpleLabor-Supply
Models, Journal of Econometrics 47, 243-266.
Arellano, M. (1993), OntheTestingof CorrelatedEffectswithPanel Data, Journal of
Econometrics 59, 87-97.
Arellano, M. (2003), Panel Data Econometrics. OxfordUniversityPress: Oxford.
Arellano, M. andS.R. Bond(1991), SomeTestsof Specificationfor Panel Data: Monte
CarloEvidenceandanApplicationtoEmployment Equations, Review of Economic Studies
58, 277-297.
Arellano, M. andO. Bover (1995), Another Lookat theInstrumental VariableEstimation
of Error ComponentsModels, Journal of Econometrics 68, 29-51.
Arellano, M. andB. Honor(2001), Panel DataModels: SomeRecent Developments, in
Handbook of Econometrics, Volume5, ed. J .J . HeckmanandE. Leamer. Amsterdam: North
53
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Holland, 3229-3296.
Blundell, R. andS.R. Bond(1998), Initial ConditionsandMoment Restrictionsin
DynamicPanel DataModels, Journal of Econometrics 87, 115-143.
Blundell, R. andS.R. Bond(2000). GMM EstimationwithPersistent Panel Data: An
ApplicationtoProductionFunctions, Econometric Reviews 19, 321-340.
Chamberlain, G. (1982), MultivariateRegressionModelsfor Panel Data, Journal of
Econometrics 1, 5-46.
Chamberlain, G. (1984), Panel Data, inHandbook of Econometrics, Volume2, ed. Z.
GrilichesandM.D. Intriligator. Amsterdam: NorthHolland, 1248-1318.
Collado, M.D. (1998), EstimatingDynamicModelsfromTimeSeriesof Independent
Cross-Sections, Journal of Econometrics 82, 37-62.
Deaton, A. (1985), Panel DatafromTimeSeriesof Cross-Sections, Journal of
Econometrics 30, 109-126.
Engle, R.F., D.F. Hendry, andJ .-F. Richard(1983), Exogeneity, Econometrica 51,
277-304.
Hausman, J .A. (1978), SpecificationTestsinEconometrics, Econometrica 46,
1251-1271.
Heckman, J .J . andV.J . Hotz (1989), ChoosingamongAlternativeNonexperimental
Methodsfor EstimatingtheImpact of Social Programs: TheCaseof Manpower Training,
J ournal of theAmericanStatistical Association84, 862-874.
Holtz-Eakin, D., W. Newey, andH.S. Rosen(1988), EstimatingVector Autoregressions
withPanel Data, Econometrica 56, 1371-1395.
Inoue, A. (2008), Efficient EstimationandInferenceinLinear Pseudo-Panel Data
54
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Models, Journal of Econometrics 142, 449-466.
Levinshohn, J . andA. Petrin(2003), EstimatingProductionFunctionsUsingInputsto
Control for Unobservables, Review of Economic Studies 70, 317-341.
McKenzie, D.J . (2004), AsymptoticTheoryfor HeterogeneousDynamicPseudo-panels,
Journal of Econometrics 120, 235-262.
Moffitt, R. (1993), IdentificationandEstimationof DynamicModelswithaTimeSeries
of RepeatedCross-Sections, Journal of Econometrics 59, 99-123
Mundlak, Y. (1978), OnthePoolingof TimeSeriesandCrossSectionData,
Econometrica 46, 69-85.
Murtazashvili, I. andJ .M. Wooldridge(2007), FixedEffectsInstrumental Variables
EstimationinCorrelatedRandomCoefficient Panel DataModels, Journal of Econometrics
142, 539-552.
Olley, S. andA. Pakes(1996), TheDynamicsof ProductivityintheTelecommunications
Equipment Industry, Econometrica 64, 1263-1298.
Petrin, A., B.P. Poi, andJ . Levinsohn(2004), ProductionFunctionEstimationinStata
UsingInputstoControl for Observables, Stata Journal 4, 113-123.
Robinson, P.M. (1988), Root-n Consistent SemiparametricRegression, Econometrica 55,
931-954.
Semykina, A. (2006), A SemiparametricApproachtoEstimatingPanel DataModelswith
EndogenousExplanatoryVariablesandSelection, mimeo, FloridaStateUniversity
Department of Economics.
Verbeek, M. andT.E. Nijman(1993), MinimumMSE Estimationof aRegressionModel
withFixedEffectsfromaSeriesof Cross-Sections, Journal of Econometrics 59, 125-136.
55
Imbens/Wooldridge, IRP Lecture Notes 3&4, August 08
Verbeek, M. andF. Vella(2005), EstimatingDynamicModelsfromRepeated
Cross-Sections, Journal of Econometrics 127, 83-102.
Wooldridge, J .M. (1996), EstimatingSystemsof EquationswithDifferent Instrumentsfor
Different Equations, Journal of Econometrics 74, 387-405.
Wooldridge, J .M. (2002), Econometric Analysis of Cross Section and Panel Data. MIT
Press: Cambridge, MA.
Wooldridge, J .M. (2003), Further ResultsonInstrumental VariablesEstimationof
AverageTreatment EffectsintheCorrelatedRandomCoefficient Model, Economics Letters
79, 185-191.
Wooldridge, J .M. (2005), Fixed-EffectsandRelatedEstimatorsfor Correlated
Random-Coefficient andTreatment-Effect Panel DataModels, Review of Economics and
Statistics 87, 385-390.
Wooldridge, J .M. (2007), OnEstimatingFirm-Level ProductionFunctionsUsingProxy
VariablestoControl for Unobservables, mimeo, MichiganStateUniversityDepartment of
Economics.
56

You might also like