Professional Documents
Culture Documents
School of Economics
Introductory Econometrics
ECON2206
Tutorial Program
Sample Answers
Session 2, 2015
Table of Contents
Week 2 Tutorial Exercises ............................................................................................................................................................................................... 3
Problem Set ...................................................................................................................................................................................................................... 3
Readings .......................................................................................................................................................................................................................... 23
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 23
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 24
Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 29
Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 30
(vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which
shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the
maximum is $11,957.64.]
(vii) 8.7%. Note that, by Taylor expansion, log(x + h) log(x) = log(1 + h/x) h/x, for
positive x and small h/x (in absolute value).
STATA Hints
Some of STATA do-files for solving the Problem Set are provided in doFilesUpload.zip. It is
important to understand the commands in the do-files so that you will be able write your
own do-files for your assignments and the course project.
The minimum requirement for OLS to be carried out for the data set {(xi, yi), i=1,,n} with
the sample size n > 2 is that the sample variance of x is positive. In what circumstances is the
sample variance of x zero?
When all observations on x have the same value, there is no variation in x.
The OLS estimation of the simple regression model has the following properties:
a) the sum of the residuals is zero;
b) the sample covariance of the residuals and x is zero.
Why? How would you relate them to the least squares principle?
These are derived from the first-order conditions for minimising the sum of squared
residuals (SSR).
Convince yourself that the point ( x , y ) , the sample means of x and y, is on the sample
Problem Set
at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the
observed birth weights at cigs = 0, and so we will under predict high birth rates.
100
just as in the second row of Table 2.3. So, if %expend = 10, math10 = 1/10.
(iii) The regression results are
= 69.34 + 11.16 log() , = 408, 2 = .0297
10
increases by about 1.12 points. This is not a
(iv) If expend increases by 10 percent, 10
huge effect, but it is not trivial for low-spending schools, where a 10 percent increase in
spending might be a fairly small dollar amount.
(v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100.
In fact, the largest fitted values is only about 30.2.
(iv) The slope coefficient from part (iii) means that each mailing per year is associated with
perhaps even causes an estimated 2.65 additional guilders, on average. Therefore, if
each mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65
guilders. This is only the average, however. Some mailings generate no contributions, or a
contribution less than the mailing cost; other mailings generated much more than the
mailing cost.
(v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts
is 2.01 + 2.65(.25) 2.67. Even if we look at the overall population, where some people have
received no mailings, the smallest predicted value is about two. So, with this estimated
equation, we never predict zero charitable gifts.
10
How would you test hypotheses about a single linear combination of parameters?
We may re-parameterise the regression model to isolate the single linear combination.
Then the OLS on the re-parameterised model will provide the estimate of the single linear
combination and the related standard error. See the example in the textbook and lecture
slides
What are exclusion restrictions for a regression model?
Exclusion restrictions are the null hypothesis that a group of x-variables have zero
coefficients in the regression model.
What are restricted and unrestricted models?
When the null hypothesis is a set of restrictions on the parameters, the regression under the
null is called the restricted model, while the regression under the alternative (which simply
states that the null is false) is called the unrestricted model.
How do you compute the F-statistic, given that you have SSRs?
Clearly, the general F-stat is based on the relative difference between the SSRs under the
restricted model and unrestricted model: F = [(SSRr SSRur)/q]/[SSRur/(n-k-1)], where you
should be able to explain the meanings of various symbols.
What are general linear restrictions on parameters?
These are linear equations on the parameters. For example, 1 22 + 33 = 0.
What is the test for the overall significance of a regression?
This is the F-test for the null hypothesis that all the coefficients of x-variables are zero.
How would you report your regression results?
See the guidelines in Section 4.6 of the textbook.
Why would you care about the asymptotic properties of the OLS estimators?
When MLR.6 does not hold, the finite-sample distribution of the OLS estimators is not
available. For reasonably large samples (large n), we use the asymptotic distribution of the
OLS estimators, which is known from studying the asymptotic properties, to approximate
the finite-sample distribution of the OLS estimators, which is unknown.
Comparing the inference procedures in Chapter 5 with those in Chapter 4, can you list the
similarities and differences?
It is beneficial to make a list by yourself.
Under MLR.1-MLR.5, the OLS estimators are consistent, asymptotically normal, and
asymptotically efficient. Try to explain these properties in your own words.
This is really a test on your understanding of these notions in econometrics.
11
which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is
4.85.
(iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there
are 88 5 = 83 df in the unrestricted model. The F statistic is
F = [(.829 .820)/3]/[(1 .829)/83] 1.46. The 10% critical value (again using 90
denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact,
the p-value is about .23.
(iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F
statistic would not have an F distribution under the null hypothesis. Therefore, comparing
the F statistic against the usual critical values, or obtaining the p-value from the F
distribution, would not be especially meaningful.
log()
= 1.46 + .073 prpblck + .137 log(income) + .380 prppov
(0.29) (.031)
(.027)
(.133)
2
n = 401, R = .087
The p-value for testing H0: 1= 0 against the two-sided alternative is about .018, so that we
reject H0 at the 5% level but not at the 1% level.
(ii) The correlation is about .84, indicating a strong degree of multicollinearity. Yet each
coefficient is very statistically significant: the t statistic for log() is about 5.1 and that
for prppov is about 2.86 (two-sided p-value = .004).
log()
= .84 + .098 prpblck .053 log(income) + .052 prppov + .121 log(hseval)
(.29) (.029)
(.038)
(.134)
(.018)
12
n = 401, R2 = .184
The coefficient on log(hseval) is an elasticity: a one percent increase in housing value,
holding the other variables fixed, increases the predicted price by about .12 percent. The
two-sided p-value is zero to three decimal places.
(iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the
15% significance level against a two-sided alternative for log(income), and prppov is does
not have a t statistic even close to one in absolute value). Nevertheless, they are jointly
significant at the 5% level because the outcome of the F2,396 statistic is about 3.52 with pvalue = .030. All of the control variables log(income), prppov, and log(hseval) are highly
correlated, so it is not surprising that some are individually insignificant.
(v) Because the regression in (iii) contains the most controls, log(hseval) is individually
significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable.
It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is
that if prpblck increases by .10, psoda is estimated to increase by 1%, other factors held fixed.
13
log()
= .284 + .092 educ + .0041 exper + .022 tenure
(.104) (.007)
(.0017)
(.003)
2
n = 526, R = .316, = .441.
The histogram for the residuals from this equation, with the best-fitting normal distribution
overlaid, is given below:
(iii) The residuals from the log(wage) regression appear to be more normally distributed.
Certainly the histogram in part (ii) fits under its comparable normal density better than in
part (i), and the histogram for the wage residuals is notably skewed to the left. In the wage
regression there are some very large residuals (roughly equal to 15) that lie almost five
estimated standard deviations ( = 3.085) from the mean of the residuals, which is
identically zero, of course. Residuals far from zero does not appear to be nearly as much of a
problem in the log(wage) regression.
The skewness and kurtosis of the residuals from the OLS regression may be used to test the
null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null). The null
distribution of the Jarque-Bera test statistic (which takes into account both skewness and
kurtosis) is approximately the chi-squared with 2 degrees of freedom. Hence, at the 5% level,
we reject the normality of u whenever the JB statistic is greater than 5.99. In STATA, this test
is carried out by the command sktest. With the data in WAGE1.RAW, JB = (extremely
large) for the wage model and JB = 10.59 for the log(wage) model. While the normality is
rejected for both models, we do see that the JB statistic is much smaller for the log(wage)
model [also compare its p-values of skewness and excess kurtosis tests against those from
the wage model]. Hence the normality appears to be a reasonable approximation to the
distribution of the error term in the log(wage) model.
14
What are the advantages of using the log of a variable in regression? Find the rules of
thumb for taking logs.
See page 191 of Wooldridge (p198-199 for the 3rd edition)
Be careful when you interpret the coefficients of explanatory variables in a model where
some variables are in logarithm. Do you remember Table 2.3?
Consult Table 2.3.
How do you compute the change in y caused by x when the model is built for log(y)?
A precise computation is given by Equation (6.8) of Wooldridge.
Why do we need the interaction terms in regression models?
An interaction term is needed if the partial effect of an explanatory variable is linearly
related to another explanatory variable. See (6.17) for example.
What is the adjusted R-squared? What is the difference between it and the R-squared?
The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding
additional explanatory variables to a model. The R-squared can never fall when a new
explanatory variable is added. However, the adjusted R-squared will fall if the t-ratio on the
new variable is less than one in absolute value.
How do you construct interval prediction for given x-values?
This is nicely summarised in Equations (6.27)-(6.31) and the surrounding text.
How do you predict y for given x-values when the model is built for log(y)?
Check the list on page 212 (p220 for the 3rd edition).
What is involved in residual analysis?
See pages 209-210 (p217-218 for the 3rd edition).
15
performance on the science test fixed while studying the effects of staff on the math pass
rate? This would be an example of controlling for too many factors in a regression
equation. The variable scill could be a dependent variable in an identical regression
equation.
log()
= 5.95 + .0440 educ .0215 exper + .00320 educexper
(0.24) (.0174)
(.0200)
(.00153)
2
2
n = 935, R = .135, adjusted-R = .132.
The t statistic on the interaction term is about 2.13, which gives a p-value below .02 against
H1: 3 > 0. Therefore, we reject H0: 3 = 0 in favor of H1: 3 > 0 at the 2% level.
(iv) We rewrite the equation as
log(wage) = 0 + 1educ + 2exper + 3 educ (exper 10) + u,
and run the regression log(wage) on educ, exper, and educ(exper 10). We want the
coefficient on educ. We obtain 1 .0761 and se(1) .0066. The 95% CI for 1 is about .063
to .089.
16
17
strictly holds with a single explanatory variable included in the regression, but we often
ignore the presence of other independent variables and use this table as a rough guide. (Or,
we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are
more likely to receive training, then train and u are negatively correlated. If we ignore the
presence of educ and exper, or at least assume that train and u are negatively correlated
after netting out educ and exper, then we can use Table 3.2: the OLS estimator of 1 (with
ability in the error term) has a downward bias. Because we think 1 0, we are less likely to
conclude that the training program was effective. Intuitively, this makes sense: if those
chosen for training had not received training, they would have lowers wages, on average,
than the control group.
(.165) (.109)
(.132)
(.00053)
(.013)
+ .025 educ .00050 age
(.008)
(.00125)
n = 660, R2 = .110
If ecoprc increases by, say, 10 cents (.10), then the probability of buying eco-labeled apples
falls by about .080. If regprc increases by 10 cents, the probability of buying eco-labeled
apples increases by about .072. (Of course, we are assuming that the probabilities are not
close to the boundaries of zero and one, respectively.)
(iii) The F test, with 4 and 653 df, is 4.43, with p-value = .0015. Thus, based on the usual F
test, the four non-price variables are jointly very significant. Of the four variables, educ
appears to have the most important effect. For example, a difference of four years of
education implies an increase of .025(4) = .10 in the estimated probability of buying ecolabeled apples. This suggests that more highly educated people are more open to buying
produce that is environmentally friendly, which is perhaps expected. Household size (hhsize)
also has an effect. Comparing a couple with two children to one that has no children other
factors equal the couple with two children has a .048 higher probability of buying ecolabeled apples.
(iv) The model with log(faminc) fits the data slightly better: the R-squared increases to
about .112. (We would not expect a large increase in R-squared from a simple change in the
functional form.) The coefficient on log(faminc) is about .045 (t = 1.55). If log(faminc)
increases by .10, which means roughly a 10% increase in faminc, then P(ecobuy = 1) is
estimated to increase by about .0045, a pretty small effect.
18
(v) The fitted probabilities range from about .185 to 1.051, so none are negative. There are
two fitted probabilities above 1, which is not a source of concern with 660 observations.
0.5 and zero
(vi) Using the standard prediction rule predict one when
otherwise gives the fraction correctly predicted for ecobuy = 0 as 102/248 .411, so about
41.1%. For ecobuy = 1, the fraction correctly predicted is 340/412 .825, or 82.5%. With the
usual prediction rule, the model does a much better job predicting the decision to buy ecolabeled apples. (The overall percent correctly predicted is about 67%.)
19
20
Notice that 1, which is the slope on inc in the original model, is now a constant in the
transformed equation. This is simply a consequence of the form of the heteroskedasticity
and the functional forms of the explanatory variables in the original equation.
Thus, the estimated probability of smoking for this person is close to zero. (In fact, this
person is not a smoker, so the equation predicts well for this particular observation.)
21
(iv) The smallest fitted value is about .030 and the largest is about .697. The WLS estimates
of the LPM are
= .488 + .0126 inc .000062 inc2 + .0255 age .00030 age2 .0055 male
401
(.076) (.0005) (.000004)
(.0037)
(.00004)
(.0117)
n = 9,275, R2 = .108.
There are no important differences with the OLS estimates. The largest relative change is in
the coefficient on male, but this variable is very insignificant using either estimation method.
22
23
24
(iv) Both math10 and lnchprg are percentages. Therefore, a ten percentage point increase in
lnchprg leads to about a 3.23 percentage point fall in math10, a sizeable effect.
(v) In column (1) we are explaining very little of the variation in pass rates on the MEAP
math test: less than 3%. In column (2), we are explaining almost 19% (which still leaves
much variation unexplained). Clearly most of the variation in math10 is explained by
variation in lnchprg. This is a common finding in studies of school performance: family
income (or related factors, such as living in poverty) are much more important in explaining
student performance than are spending per student or other school characteristics.
25
What are the main features of time series data? How do time series data differ from crosssectional data?
The main features include the following. First, time series data have a temporal ordering,
which matters in regression analysis because of the second feature. Second, many economic
time series are serially correlated (or autocorrelated), meaning that future observation are
dependent on present and past observations. Third, many economic time series contain
trends and seasonality. In comparison to cross-sectional data, the major difference is that
time series data are not random sample. Recall that, for a random sample, observations
are required to be independent of one another.
What is a stochastic process and its realisation?
A stochastic process (SP) is a random variable that depends on the time index. At any fixed
point in time, the SP is a random variable (hence has a distribution). The SP can be viewed
as a random curve with the horizontal axis being the time index, where the outcome of the
underlying experiment is a curve (or a time series plot). The time series we observe is
viewed as a realisation (or outcome) of the random curve or the SP.
What is serial correlation or autocorrelation?
The serial (or auto) correlation of a time-series variable is the correlation between the
variable at one point in time and the same variable at another point in time. The serial (or
auto) correlation of yt is usually denoted as Corr(yt, yt-h) = Cov(yt, yt-h)/[Var(yt)Var(yt-h)]1/2.
What is a finite distributed lag model? What is the long-run propensity (LRP)? How would
you estimate the LRP and the associated standard error (say in STATA)?
Read ie_Slides10 pages 5-7. Read Section 10.2. Try Wooldridge 10.3.
What are TS1-6 (assumptions about time series regression)? How do they differ from the
assumptions in MLR1-6?
TS1-6 include: 1) linear in parameter; 2) no perfect collinearity; 3) strict zero conditional
mean; 4) homoskedasticity; 5) no serial correlation; 6) normality. These differ from MLR1-6
mainly in that MLR2 (random sampling) do not hold for time series data. To ensure the
unbiasedness of the OLS estimators, TS3 is needed. To ensure the validity of OLS standard
errors, t-stat and F-stat for statistical inference, TS3-TS6 are needed. These are very strong
assumptions and can be relaxed when sample sizes are large.
What are strictly exogenous regressors and contemporaneously exogenous regressors?
Strictly exogenous regressors satisfy the assumption TS3, whereas the contemporaneously
exogenous regressors only satisfy the condition (10.10) (or (z10) of the Slides) that is
weaker than TS3.
What is a trending time series? What is a time trend?
A time series is trending if it has the tendency of growing (or shrinking) over time. A time
trend is a function of time index (eg, linear, quadratic, etc). A time trend can be used to
mimic (or model) the trending component of a time series.
Why may a regression with trending time series produce spurious results?
First, two time trends are always correlated because they grow together in the same (or
opposite) direction. Now consider two unrelated time series, each of which contains a time
trend. Since the time trends in the two series are correlated, regressing one on the other
will produce a significant slope coefficient. Such statistical significance is spurious because
26
it is purely induced by the time trends and has little to say about the true relationship
between the two time series.
Why would you include a time trend in regressions with trending variables?
Including a time trend in regressions allows one to study the relationship among time series
variables that is not induced by the time trends in the time series.
What is seasonality in a time series? Give an example of time series variable with seasonality.
Seasonality in a time series is the fluctuation that repeats itself every year (or every week, or
every day). An example would be the daily revenue a pub, which is likely to have week-day
seasonality.
For quarterly data, how would you define seasonal dummy variables for a regression model?
We need to define three dummies. For instance, if we take the first quarter as the base, we
define dummy variables Q2, Q3 and Q4 for the second, third and fourth quarters and include
them in the regression model.
Problem Set
Q1. Wooldridge 10.1
(i) Disagree. Most time series processes are correlated over time, and many of them strongly
correlated. This means they cannot be independent across observations, which simply
represent different time periods. Even series that do appear to be roughly uncorrelated
such as stock returns do not appear to be independently distributed, as they have dynamic
forms of heteroskedasticity.
(ii) Agree. This follows immediately from Theorem 10.1. In particular, we do not need the
homoskedasticity and no serial correlation assumptions.
(iii) Disagree. Trending variables are used all the time as dependent variables in a
regression model. We do need to be careful in interpreting the results because we may
simply find a spurious association between yt and trending explanatory variables. Including
a trend in the regression is a good idea with trending dependent or independent variables.
As discussed in Section 10.5, the usual R-squared can be misleading when the dependent
variable is trending.
(iv) Agree. With annual data, each time period represents a year and is not associated with
any season.
Q2. Wooldridge 10.2
We follow the hint and write
gGDPt-1 = 0 + 0 intt-1 + 1intt-2 + ut-1,
and plug this into the right-hand-side of the intt equation:
int t = 0 + 1( 0 + 0int t-1 + 1int t-2 + ut-1 3) + vt
= (0 + 1 0 3 1) + 1 0int t-1 + 1 1int t-2 + 1ut-1 + vt .
Now by assumption, u t-1 has zero mean and is uncorrelated with all right-hand-side
variables in the previous equation, except itself of course. So
Cov(intt , ut-1) = E(int t ut-1) = 1E(ut-12) > 0
because 1 > 0. If 2 = E(ut2 ) for all t then Cov(intt , ut-1) = 1 2. This violates the strict
exogeneity assumption, TS3. While u t is uncorrelated with intt, intt-1, and so on, ut is
correlated with intt+1.
27
28
What is a strictly stationary stochastic process (SP)? What is a covariance stationary SP?
A SP is strictly stationary if its joint distribution (at arbitrary points in time) is invariant
under time shifts, meaning that, for example, the joint distribution at the time points (t1, t2,
t3) is the same as the joint distribution at (t1+h, t2+h, t3+h) for any shift h. On the other hand,
a SP is covariance stationary if it has a finite variance and its mean and autocovariances are
independent of the time index. In other words, a covariance stationary SP has an invariant
structure for the mean and autocovariances.
In this course, what do we mean by weakly dependent (WD) time series?
By WD time series, we mean the time series to which the LLN and CLT apply. The WD time
series is not independent but is not far away from being independent either, so that the
LLN and CLT still apply.
What is a dynamically complete (DC) model?
A regression model is DC if the lags of the dependent variable (yt) and the lags of the
explanatory variables (xt) do not help to predict the dependent variable. In math notation, a
DC model satisfies E(yt | xt, yt-1, xt-1, ) = E(yt | xt). That is, once xt is controlled for, the lags of
(yt, xt) have no explanatory or predictive power.
How would you test for the serial correlation in the disturbance of a regression model?
Use the Breusch-Godfrey LM test. The test involves the following steps: (i) run the original
regression and save the residuals; (ii) regress the residual on the original explanatory
variables and q lags of the residual and save the R-squared; (iii) the LM statistic, equal to the
product of the sample size and the R-squared from (ii), follows the Chi-squared distribution
with q df under the null hypothesis of no serial correlation; (iv) reject the null if the LM
statistic is too large in comparison to the Chi-square, q df, critical value. A rule of thumb is
that q is in the neighbourhood of [sample size]1/4.
Will the usual OLS standard errors be valid when the disturbance has autocorrelation?
No, the OLS standard errors, the usual t-stat and F-stat are all invalid in the presence of
autocorrelated disturbance. A remedy is to use the Newey-West standard errors that are
robust to the autocorrelated disturbance.
What is a random walk? What are the main properties of the random walk? What are the
properties of the difference of the random walk?
A random walk yt is defined as the cumulation of an iid random sequence {et}:
E(et) = 0,
Var(et) = 2, t = 1, 2, ....
yt = yt-1 + et = et + et-1 + ... + e1 + y0,
The main properties include (i) its variance grows linearly with the time index; (i) the
conditional mean of future yt+h given the current yt is always the current yt, no matter how
distant the future is (or how large h is); (iii) the autocorrealtions are close to one. The
difference of the random walk is simply the iid random variable et, which is of course strictly
stationary, zero mean and finite variance.
What are I(1) and I(0) time series? How do we decide whether a time series is I(1) or I(0)?
A nonstationary time series yt is called I(1) if its difference yt is stationary. A stationary
time series is called I(0). The augmented Dickey-Fuller (ADF) test may be used to detect
whether a time series is I(1) or I(0). The ADF test involves regressing yt on {1, t, yt-1, yt-1,...,
yt-q}. The test statistic is the t-ratio on yt-1 that follows the Dickey-Fuller distribution under
the null of a unit root (ie, the time series is I(1)). The null is rejected in favour of the
29
alternative (that the time series is I(0)) if the t-ratio is too negative in comparison to the
Dickey-Fuller critical value.
What is a stochastic trend? What is a spurious regression?
The stochastic trend refers to the random-walk component in a time series (if there is one).
A regression is spurious if the seemingly-significant estimation result is merely a reflection
of that the dependent variable and the explanatory variables have unrelated stochastic
trends and has little to do with the true relationship among these variables.
Can you explain the notion of cointegration?
A set of nonstaionary time series is cointegrated if there is a linear combination of these
time series that is stationary. For example, the logs of the aggregate investment and income
of an economy are nonstationary. We say that log(investment) and log(income) are
cointegrated if there is a constant such that log(investment) log(income) is stationary.
30
31
What are pooled cross sections? How do they compare to a single cross sectional sample?
Pooled cross sections consist of a number of cross-sectional samples that are obtained at
different points in time, usually in different years. For pooled cross sections, the individuals
in one cross-sectional sample may be different from the individuals in another crosssectional sample. Similar to a single cross-sectional sample (with observations on the
characteristics of randomly selected individuals at the same point in time), the observations
in pooled cross sections are independent. However, because the observations are collected
at different point in time, they should generally be treated as coming from different
populations (or having different distributions). This is a reason why yearly dummies are
usually included in the regression with pooled cross sections.
What is policy analysis? Give an example.
The goal of policy (or event) analysis is to quantify the causal effect of a policy on a
dependent variable. For example, it is interesting to find out the effect of the government
stimulus package on individual expenditures. A policy (eg. stimulus package) can be viewed
as a natural experiment in the sense that it alters the environment of a group of individuals
(eg. stimulus package receivers) but not others (eg. non-receivers). With data sets collected
before and after the policy, the difference in the differences of the treatment group and
control group after and before the policy can be estimated.
How are pooled cross sections used for policy analysis? What is the difference-in-difference
estimator? Estimating what?
With pooled cross sections before and after the policy implementation, the policy analysis
can be carried out in a regression model, where the after-policy dummy and the treatment
group dummy are key variables. In this framework, the parameter of interest (the effect of
the policy on the treatment group) is the coefficient of the interaction of the after-policy
dummy and the treatment group dummy. The estimator of this coefficient is known as
difference-in-difference estimator (of course, estimating the ceteris paribus effect of the
policy on the treatment group), as it is the difference of the treatment control after the
policy and the treatment control before the policy.
What is a set of panel data? How does it differ from a set of pooled cross sections?
A set of panel data consists of the observations on the characteristics of individuals at a
number of different points in time, where the individuals are randomly-selected initially and
are followed subsequently through the whole sampling duration. Therefore, unlike pooled
cross sections, panel data are collected from the same individuals at different points in time.
What is an unobserved (or fixed) effect model? What does the unobserved (or fixed) effect
represent? Is there a fixed effect in the usual cross sectional regression?
A fixed effect model postulates that the effect (on a dependent variable) of the unobserved
factors that are specific to each individual can be summarised into a time-invariant term (ai),
known as unobserved or fixed effect, in the panel regression model. The fixed effect ai
represents the impact of the unobserved individual-specific factors on the dependent
variable. In the usual cross sectional regression, the fixed effect is absorbed into the
disturbance term and, if it is correlated with the regressors in the model, it will make the
usual OLS estimation biased.
32
What is the first-differenced estimation for two-period panel? Dose the validity of the firstdifferenced estimation rely on the assumption that the unobserved or fixed effect is
uncorrelated with the regressors?
With a two-period panel data set, because the fixed effect does not change over time, it can
be eliminated by differencing, ie, subtracting the equation of the first period from that of the
second period. The resultant equation is a linear regression model in differenced variables.
Hence the original parameters of the panel model can be estimated by regressing the
differenced dependent variables on the differenced explanatory variables. This approach is
known as the first-differenced estimation. Its validity does not rely on the assumption that
the fixed effect is uncorrelated with the regressors since the fixed effect is removed from the
first-differenced equation.
Can we estimate the unobserved or fixed effect with a two-period panel?
While the fixed effects (ai, i = 1, , N) can be viewed as parameters, they cannot be estimated
with reasonable accuracy for a small T (number of time periods) panel data set because
there are too many of them. For example, for a panel with N = 1000 individuals and T = 2
time periods, there are 1000 fixed effects, which are difficult to estimate with only NT =
2000 observations.
Comment on the advantage of using panel data for policy analysis.
The main advantage of using panel data is the fact that the unobserved individual-specific
factors, which in a single cross section regression will bias the OLS estimation, can be
removed by differencing. Hence, for policy analysis, panel data make it possible to properly
estimate the ceteris paribus effect of the key variables in the presence of unobserved
individual-specific factors.
33
only .0077 (and its t statistic is very small). The increase from .191 to .198 is easily
explained by sampling error.
(ii) If highearn is dropped from the equation [so that 1 = 0 in (3.10)], then we are assuming
that, prior to the change in policy, there is no difference in average duration between high
earners and low earners. But the very large (.256), highly statistically significant estimate on
highearn in (3.12) shows this presumption to be false. Prior to the policy change, the high
earning group spent about 29.2% [exp(.256) 1 .292] longer on unemployment
compensation than the low earning group. By dropping highearn from the regression, we
attribute to the policy change the difference between the two groups that would be
observed without any intervention.
34