Econometric Si I 20079

Econometrics II Heij et al. Chapter 7.
Panel Data, SUR and GLS

Marius Ooms
Tinbergen Institute Amsterdam
TI Econometrics II 2006/2007,
7.7.1-7.7.3 p. 1/22
Heij et al. (2004) 7.7.1-7.7.3

Panel data
Seemingly unrelated regression model (SUR)
Generalized least squares (GLS)
Feasible GLS
Panel data with fixed effects
Panel data with random effects
7.7.1-7.7.3 p. 2/22
Panel data
Panel data consist of cross-section observations for
different time points.

We observe one dependent variable yit for individual i at
time t where i = 1, . . . , m and t = 1, . . . , n.

In most applications m is (much) larger than n: m >> n.
We have k strongly exogenous explanatory variables in a
vector xit and n >> k .

In the literature one often uses N (big N ) for m and T (big T ) for
n. As Heij et al. is mostly time series oriented, they use n as the
time series dimension and m as the cross-section dimension (or
number of equations in multivariate time series).
For n = 1 and large m , we have simple cross-section data.
When m = 1 and n is large, we have univariate time-series.
7.7.1-7.7.3 p. 3/22
General Panel data Model

The considered models are of the following regression form:
yit = it + xit it + it , Var() = ,
where is the mn mn covariance matrix of the mn 1

disturbances vector and xit is a (k 1) 1 vector of
explanatory variables. Parameters depend on time t and on
individual i.
This general model is not empirically identified since it
contains more parameters than observations!
Therefore, we have to impose restrictions on the regression
parameters (it , it ) and on the covariance matrix , before we
can estimate parameters.
7.7.1-7.7.3 p. 4/22
Seemingly unrelated regression model (SUR)

The SUR model is given by
yit = i + xit i + it
Where E[it jt ] = ij ,
E[it js ] = 0 for all i, j and t 6= s.
All individuals have their own regression parameters, but

these are restricted to be constant over time.
The regression relations for the different individuals are only
related via the correlation of the error terms, but the error
covariance across individuals is unrestricted
No error covariance across time: no serial correlation, or serial
cross correlation.
7.7.1-7.7.3 p. 5/22
SUR model specification and estimation per individual

Denote the observations for unit i by the n 1 vector yi and by
the n k matrix Xi , with corresponding parameter vector
i = (i , i ) and n 1 vector of disturbances i . The model for
SUR unit i can be written as
yi = Xi i + i
Estimating the parameters i by OLS per equation is

consistent, but is inefficient if the disturbances for the different
individuals display contemporaneous correlation and the
regressor sets differ from individual (equation) to individual
(equation)
This is easily shown if we combine data for all the units in one
big transformed regression model, where we can apply OLS
theory of 3.1.4. See next slides.
7.7.1-7.7.3 p. 6/22
Complete SUR model in matrix notation

Combining the models for the m units gives
y1
y2
..
.
ym
X1 0
0 X2
..
..
.
.
0
0
E[] = 0, var() = =
0
0
..
.
Xm
11 I
12 I
..
.
12 I
22 I
..
.
1
2
..
.
m
1
2
..
.
m
1m I
2m I
..
.
1m I 2m I
mm I
7.7.1-7.7.3 p. 7/22
Inefficiency of OLS for SUR model

Simultaneous OLS estimation of the mk 1 parameters i for
i = 1, . . . , m in the above model is equivalent to applying OLS
per unit.
Exercise (1) Prove this proposition. Hint: Consider the
method-of-moments equations for OLS.
This simultaneous OLS estimator is not BLUE since the
covariance matrix is not of the form 2 I . Next we will discus a
general method to deal with this problem.
7.7.1-7.7.3 p. 8/22
Generalized least squares (GLS) idea

First suppose we know up to one constant term.
The idea is to transform the model so that the error covariance
matrix becomes scalar 2 I . This idea is similar to WLS.
Transform the joint model by a (big) square and invertible but
nondiagonal weighting matrix A, transform:
y = X +
into
Ay = AX + A
As the variance matrix of A is AA , we choose A s.t.

AA = I , or A1 (A )1 = (A A)1 = .
A is a square root of 1 . The decomposition of is standard in
matrix algebra, see section A.6, think of A as A = 1/2 . Note
that A is not uniquely defined.
7.7.1-7.7.3 p. 9/22
GLS, theoretical infeasible version

Assume the following notation for the transformed variables:
y = Ay, X = AX, = A,
so that
y = X +
with E[ ] = 0 and Var( ) = Inm . Now the BLUE estimator of

is given by
bGLS = (X X )1 X y = (X 1 X)1 X 1 y.
This is called the Generalized Least Squares estimator. In

practice this estimator is infeasible as we do not know .
7.7.1-7.7.3 p. 10/22
Two step Feasible GLS in SUR model

When is unknown, we first estimate with OLS. We then
perform the so-called Feasible GLS in two steps:
Estimate .
Do m regressions, one per unit to estimate j by OLS,

j = 1, . . . , m. Let ei be the n 1 vector of OLS residuals for
unit i. The unknown (co)variances ij are then estimated by
1 Pn
by replacing ij by sij .
sij = n t=1 eit ejt . Then obtain
Estimate the parameters j jointly by GLS.
That is
1 X)1 X
1 y
bF GLS = (X
7.7.1-7.7.3 p. 11/22
Asymptotic distribution of FGLS in SUR application

Under the assumption of correct specification and normally
distributed error terms, it can be shown that the FGLS estimator
in the SUR model has the same asymptotic properties as ML,
i.c. for n and m fixed one can derive
1 X)1 ).
bF GLS N (, (X X )1 ) N (, (X
We can use this result to perform asymptotic t- and F -tests. In

practice n is finite and one has to take extra precautions to make
1 X is a full rank positive definite matrix, see also next
sure X
slide.
NB: if the number of unknown parameters in increases linearly with n, FGLS does not
work. Compare the GMM standard error derivation for general in 5.5.2
7.7.1-7.7.3 p. 12/22
Finite Sample Rank condition SUR, efficiency SUR

is an mn mn block
The estimated SUR covariance matrix
diagonal matrix with the m m matrix S on the diagonal blocks
with elements sij = n1 ei ej with ei the n 1 vector of OLS
is invertible, and 2-step
residuals of unit i. This means that
FLGS for SUR possible, if and only if the m m matrix S is
invertible, i.e. if and only if rank(S ) = m.
Necessary condition for Feasible GLS in SUR: Define the n m

matrix E = (e1 , , em ) , then S is n1 E E and
can be
m = rank(S) = rank(E) n. Therefore S and
invertible and we can estimate with FGLS only if m n.
There are special cases in which OLS is efficient for SUR
models. Exercise (2) 7.10, page 715.
7.7.1-7.7.3 p. 13/22
Panel data with fixed effects

When m >> n the data are typical panel data or longitudinal
data. We cannot apply the SUR model, as this requires m n.
The model has to be simplified by parameter restrictions.
E.g., the coefficients of the explanatory variables are assumed
to be the same for all units (pooling): we impose (pooling)
restrictions on the slope parameters i : i = , i = 1, . . . , m.
We then obtain the panel data model with fixed effects:
yit = i + xit + it , it IID(0, 2 ),
The constant terms i are fixed unknown parameters, but they

differ from unit to unit (not pooled). The errors are independent
and homoskedastic in time and across units.
NB: the number of parameters increases linearly in m, so
standard asymptotic theory still requires n , although nearly
all parameters are pooled in the cross section.
7.7.1-7.7.3 p. 14/22
Fixed effects model in matrix notation I

We can rewrite the model in standard regression form using unit
dummy variables
(
1, i = j
Dit (j) =
, i = 1, . . . , m, j = 1, . . . , m
to get
0, i 6= j
yit =
Pm
j=1 j Dit (j)
+ xit + it , it IID(0, 2 )
Next, define the n 1 vector yi with elements yit , define it

accordingly and define the n (k 1) matrix Xi with tth row xit ,
t = 1, . . . , n and let be an n 1 vector of ones.
For the ith unit we obtain the matrix notation
yi = i + Xi + i
7.7.1-7.7.3 p. 15/22
Fixed effects model in matrix notation II

Now stack the equations yi = i + Xi + i for the m time
series.
Next, define the mn 1 vector y consisting of the stacked yi s,
define accordingly, define the mn (k 1) matrix X as the
matrix of stacked Xi and define the stacked mn m matrix D as
0 0
0 0
D=
..
.. ..
.
. .
0 0
If = (1 , , m ) , then following single regression model

arises
y = X + D + , N (0, 2 I)
7.7.1-7.7.3 p. 16/22
Fast Fixed Effects estimation in regression form I

Efficient estimators of and can be obtained by OLS. When m
is large, direct OLS is computationally unattractive as it requires
the inverse of (X D) (X D) . An intuitive and easier method
(m+k1)(m+k1)
applies partial regression, following the Frisch-Waugh theorem

(3.2.5), in matrix notation:
1. Regress y and (all columns of) X on D and save the
residuals, MD y and MD X , MD = I D(D D)1 D . Since

(D D)1 = n1 I , MD y and MD X have elements yit yi and
xit xi : just removing individual sample means!
2. Regress MD y on MD X to obtain
OLS
OLS = (X MD X)1 X MD y
7.7.1-7.7.3 p. 17/22
Interpretation Fixed effects estimation in regression II

The Fixed Effect estimator or Least Squares Dummy Variable
Estimator (LSDVE) of is therefore obtained by regressing
unit-mean adjusted y on unit-mean adjusted X . The fixed effect
OLS estimates
follow from the last m OLS normal equations
(3.41). In matrix regression form:
D X + D D
= D y,
so that
= (D D)1 (D y D X ).
which has the familiar interpretation of the estimates of constant

terms in regressions per individual (but here with a given
common ) :
i = yi x
i
7.7.1-7.7.3 p. 18/22
Panel data model with random effects

The model with fixed effects cannot be consistently estimated if
n is fixed and m , and it cannot be used to forecast a new
unit ym+1 given xm+1 : i is not modelled.
The simplest model for this purpose is the random effects
model, which has a random intercept with a common mean
for all units. In social sciences (SPSS) this specification is called
called mixed model (mix of random and fixed coefficients).
i = + i ,
i IID(0, 2 )
yit = + xit + it it = it + i , it IID(0, 2 )
with i and it independent. The disturbances it are correlated

with their own past because of the i . The properties of it are:
2 ] = 2 + 2 ,
E[it is ] = 2 for t 6= s
E[it ] = 0, E[it
E[it js ] = 0 for all t, s and i 6= j

7.7.1-7.7.3 p. 19/22
Random effects models FGLS, I

We can estimate the parameters and by OLS, but this
estimator is not BLUE since the disturbances it are cross
correlated. An efficient estimator can be obained by feasible
GLS.
In the first step of FGLS we need to estimate 2 and 2 .
Since i is fixed in the ith unit, it can be removed from the model
by taking the unit de-meaned variables. Consider
yit yi = (xit x
i ) + (it i ),
i = 1, . . . , m,
t = 1, . . . , n
Let be the OLS estimate of for the above model. Then the
within variance, 2 = E(2it ), is estimated by
2 =
1
m(n 1)
m X
n
X
(yit yi (xit x
i ) )2 .
i=1 t=1
7.7.1-7.7.3 p. 20/22
Random effects models FGLS, II

To estimate 2 we combine the within variance estimator
2
and the between variance estimator which estimates the
unexplained variance between unit-means in:
yi = + x
i + (
i + i ),
i = 1, . . . , m
2,
The variance estimate of this regression, denoted by
B
2 and 2
estimates var(i + i ) = var(n1 2 + 2 ). Combining
B
one derives the estimator
2
2 =
B
n1
2
Given
2 and
2 one can do the second step of FGLS to
reestimate and . The resulting estimator is also known as the
EGLS (Estimated GLS) estimator of .
Exercise (3): Check the derivation on page 695-696 for
m = 3, n = 2.
7.7.1-7.7.3 p. 21/22
Conclusion
The courses Econometrics I and II have introduced you to
the main ideas of parametric econometric modelling
(Data analysis, parsimonious specification, consequences

of modellling errors, diagnostic checking, testing) and
the basics of econometric (asymptotic) inference (Exact
statistical inference, likelihood based inference, moment

based inference, stationarity, rate of convergence)
in
Static linear and nonlinear single equation models
Binary Discrete choice models
Dynamic linear single- and multiple equation models
Panel data models
Many different parametric models and methods exist, but these

are (all) based on (combinations of) the ideas mentioned above.
Not discussed: Bayesian and nonparametric estimation and infererence
7.7.1-7.7.3 p. 22/22

Econometric Si I 20079

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometric Si I 20079

Uploaded by

Copyright:

Available Formats

Econometrics II Heij et al. Chapter 7.

Panel Data, SUR and GLS

Heij et al. (2004) 7.7.1-7.7.3

different time points.

time t where i = 1, . . . , m and t = 1, . . . , n.

vector xit and n >> k .

General Panel data Model

where is the mn mn covariance matrix of the mn 1

Seemingly unrelated regression model (SUR)

E[it js ] = 0 for all i, j and t 6= s.

All individuals have their own regression parameters, but

SUR model specification and estimation per individual

Estimating the parameters i by OLS per equation is

Complete SUR model in matrix notation

Inefficiency of OLS for SUR model

Generalized least squares (GLS) idea

As the variance matrix of A is AA , we choose A s.t.

GLS, theoretical infeasible version

with E[ ] = 0 and Var( ) = Inm . Now the BLUE estimator of

This is called the Generalized Least Squares estimator. In

Two step Feasible GLS in SUR model

Do m regressions, one per unit to estimate j by OLS,

Estimate the parameters j jointly by GLS.

Asymptotic distribution of FGLS in SUR application

We can use this result to perform asymptotic t- and F -tests. In

Finite Sample Rank condition SUR, efficiency SUR

Necessary condition for Feasible GLS in SUR: Define the n m

Panel data with fixed effects

The constant terms i are fixed unknown parameters, but they

Fixed effects model in matrix notation I

j=1 j Dit (j)

Next, define the n 1 vector yi with elements yit , define it

Fixed effects model in matrix notation II

If = (1 , , m ) , then following single regression model

Fast Fixed Effects estimation in regression form I

applies partial regression, following the Frisch-Waugh theorem

residuals, MD y and MD X , MD = I D(D D)1 D . Since

Interpretation Fixed effects estimation in regression II

which has the familiar interpretation of the estimates of constant

Panel data model with random effects

with i and it independent. The disturbances it are correlated

E[it js ] = 0 for all t, s and i 6= j

Random effects models FGLS, I

Random effects models FGLS, II

(Data analysis, parsimonious specification, consequences

statistical inference, likelihood based inference, moment

Many different parametric models and methods exist, but these

You might also like