You are on page 1of 6

PanelData

Thislecture
Introduction
Advantages of panel data
Issues involved in using panel data
Panel data models typology

Whatisapaneldataset?
A data set that combines time-series of cross-sections is called a panel
data set.

Examples:
Data on the expenditure pattern of a sample of households followed
over time.
Data on macro aggregates of many countries over time.
Balance sheet data for a set of firms over time.
Agricultural profile of states over time.
Two famous panels (from USA) are, the National Longitudinal Survey
of Labour Market Experience (NLS) and the Michigan Panel Study of
Income Dynamics (PSID).
In the Indian context, the Village Level Surveys conducted by
ICRISAT.
In-house, panel constructed with firm-level data from PROWESS,
CMIE.

Observations on a variable y from a panel data set are typically denoted


y i ,t i 1, , n; t 1, , T

Panel data sets may be


balanced i.e., the same cross-sectional units and the same
number of cross-sectional units are tracked over time

1
unbalanced i.e., the number of cross-sectional units may vary
across time periods

Typically, a panel data set is wide i.e., covers a large number of cross-
sectional units, but is short i.e., covers only a few points in time.
Advantages of panel data sets
Permits analysis of number of economic questions that cannot be
addressed using cross-section or time-series data sets. Complicated
behavioural models can be constructed and tested that would not be
possible with just cross-section / time-series data sets
Example 1: Ben-Porath (1973) observes that at a certain point in
time, in a cohort of women, 50% may appear to be working. Does
this imply that (i) in this cohort one-half of the women on average
will be working, or (ii) the same one-half will be working in every
period? Cross-sectional data alone are inadequate to answer this
question.
Example 2: Disentangling economies of scale and technological
change in a production function analysis. Cross-sectional data
provide information on scale economies alone, while time-series
data muddle the two effects. Panel data allows estimation of both
the rate of technological change (over time) and economies of
scale (across firms of different sizes).
Example 3: Estimation of a demand system. With cross-section
data pertaining to a particular time period one can estimate only
the income elasticity but not the price elasticity (as all individuals
in the sample would face the same price). With a panel data, both
income and price variation would be captured in the model.

Large number of data points (total nT observations)


increases the degrees of freedom
reduces collinearity amongst explanatory variables thus improving
efficiency of econometric estimates

Provides a means of resolving / reducing the problems arising due to


omitted variables (either wrongly measured / unobserved) that are
correlated with explanatory variables in the model.

2
Since panel data allows us to capture both intertemporal dynamics
as well as heterogeneity of individuals, one can better control for
the effects of missing or unobserved variables.

Consider the linear regression y i ,t x i ,t z i ,t i ,t , where z


are unobserved and that the covariances between x and z are nonzero.
In this setting, it is well known that OLS estimates of are biased.
Setting 1: If the unobserved variables are constant over time but vary
across individuals, i.e., z i ,t z i , then with panel data, first-
differencing would allow us to wash away the missing variables from
the model. y i ,t y i ,t 1 (x i ,t x i ,t 1 ) ( i ,t i ,t 1 ) .
Setting 2: If the unobserved variables are constant across individuals
but vary over time, i.e., z i ,t z t , then with panel data, taking
deviations from the mean across individuals would wash away the
missing variables from the model.
1 n
y i.t yt (x i.t x t ) ( i .t t ), where y t yi,t and so on.
n i 1

In both these settings, OLS estimation of the transformed model


would produce unbiased estimates of .

3
Issuesinvolvedinusingpaneldata

Heterogeneity bias:
Consider modelling total factor productivity (TFP) for a panel of firms.
Typically, the data would reveal differences across firms and over time in
TFP. An important source of these differences in productivity levels
across firms is economies of scale and over time is technological change.

Economies of scale can be captured through a variable that measures


firm size (such as Net Fixed Assets). But such a size variable may not
account for all the intra-firm differences in productivity levels. There
could be other factors that affect the relative efficiency of firms (such as
managerial capacity), which are often unobserved or hard to measure but
nevertheless an important source of heterogeneity across firms.

Similarly, the technology change may not account for all the inter-
temporal differences in productivity levels. There could be some time
specific factor (such as energy price shock, tax policy changes, etc.) that
are important, which are not captured in the observed data.

Such unobserved heterogeneity across individuals or time specific factors


imply that (usually) the intercept and perhaps the slope parameters differ
across individuals and / or over time.

Ignoring such unobserved heterogeneity across individuals or time


specific factors would result in biased estimates heterogeneity bias.

4
Paneldatamodels
Approaches to modelling with panel data can be categorised based on the
way heterogeneity is specified. The basic framework of panel data models
that characterise heterogeneity across individuals considers a regression
model of the form,
yi ,t xi ,t z i i ,t i 1, , n; t 1, , T

In this model, xi,t are K regressors. The constant term is not part of xi,t.

Heterogeneity or individual effect is modelled as z i . zi contains a


constant term and a set of individual specific variables, which are constant
over time.

We assume that all the assumptions of the CLRM are satisfied.

If zi is observed for all individuals, then the model can estimated with
OLS.

Pooled regression
If zi contains only a constant term. y i ,t xi ,t i ,t
This is a CLRM with a common intercept term and common slope
vector . OLS provides consistent and efficient estimates of and .

Fixed effects models


If zi is unobserved, but correlated with xi,t, then OLS of is biased and
inconsistent this is the omitted variables problem. However, if we
specify that heterogeneity is embodied in a individual specific constant
term i, i.e., z i i , then the resultant model is called a fixed effects
model and is written as,
y i ,t xi ,t i i ,t

The term fixed is used only to indicate that the individual specific effect
does not vary over time.

5
Random effects models
If the unobserved individual heterogeneity are assumed to be uncorrelated
with xi,t, then the model may be formulated as,
y i ,t xi ,t E[z i ] {z i E[z i ]} i ,t x i ,t u i i ,t

This is a linear regression model, but with a compound disturbance term.


OLS in this set up will be consistent but inefficient.

In the random effects approach, ui is an individual specific random


element, similar to i.t, except that for each individual the same draw of
the random element appears for all the time periods. Hence, the random
effects model can be viewed as a regression model with a random
constant term.

The key difference between the fixed and random effects models is
whether the unobserved individual effect embodies elements that are
correlated with the regressors.

Random parameters models


The idea behind the random effects model, viz., random constant term,
can be extended to the slope coefficients also. Then we have a random
parameters model.
y i ,t xi ,t ( h i ) ( u i ) i ,t

where, hi is a random vector that induces variation in the slope parameters


across individuals.

You might also like