Linear Model

The linear model
Alfonso
c Miranda (p. 1 of 33)
General framework
We have a random sample for the response varible yi and the con-
trol variables (including the constant) xi = (x1i , . . . , xKi ), with
i = 1 . . . N. For each observation the population equation of in-
terest is:
yi = E [yi |xi ] + ui (1)

yi = xi β + ui (2)
where yi is an scalar, xi is a 1 × K vector, β = (β1 , . . . , βK )0 is a

K × 1 vector, and ui is an scalar that represents the error. Note
that the vectors (yi , xi ), i = 1, . . . , N, are i.i.d (because we have a
random sample of the population).
I The model is lineal in parameters and is correctly specified.
I The error is additive.
Alfonso
Consistency of the OLS estimator
The least-squares estimator is consistent under the following two key
key conditions:
E (u|x1i , . . . , xKi ) = E (u|x) = 0, (OLS.1)

0
rank(x x) = K . (OLS.2)
Condition OLS.1 is stronger than needed, as it requires u to be

correlated neither with the control variables x nor with any function
of x. A weaker condition:
E (x0 u) = 0, (OLS.1a)
equivalent to Cov(u, x) = 0, is sufficient for the consistency of the

OLS estimator. However, to have an unbiased estimator we need
E (u|x) = 0. If OLS.1 holds the the regressorts x are said to be son
weakly exogenous.
Alfonso
Conditión (OLS.2) rules out an exact linear relationship among con-
trol variables. This is equivalent to ask that E (x0 x) be positive
definite. In short, we should be able to invert E (x0 x).
I Both OLS.1 como OLS.2 are assumptions about the
population and cannot be tested!
I Condition OLS.2 is easy to satisfy, though it may fail if the
model is not correctly specified.
I Condition OLS.1 is more substantial and must be carefully
justified (based on theory and intuition).
Under OLS.1 and OLS.2 the vector of parameters is identified. That
is, β can be written as a function of population moments of observed
variables.
Alfonso
y = xβ + u (3)
0 0 0
x y = x xβ + x u (4)
0 0 0
E (x y ) = E (x x)β + E (x u) (5)
Hence, given that E (x0 u) = 0 by OLS.1 we have:

−1
β̂ OLS = E (x0 x) E (x0 y ).

(6)
Note that the linear model can be written in full matrix notation as:
y = E [y|X] + u (7)
y = Xβ + u (8)
where y is a N × 1 vector, X is the N × K data matrix, and u is a

N × 1 vector of errors.
Alfonso
Then, the OLS estimator can be written as
b = (X0 X)−1 X0 y
β (9)
The analogy principle (see Goldberger (1968) and Manski (1988)) sug-
gest to substitute sample moment conditions for population moment
conditions to obtain estimators. This lead us to the method of mo-
ments that substitutes the corresponding sample averages for E (x0 x)
and E (x0 y ), respectively.
XN
!−1
XN
!
β
b= N −1 x0i xi N −1 x0i yi
i=1 i=1
N
!−1 N
!
X X
−1
β
b= N x0i xi N −1
x0i (xi β + ui )
i=1 i=1
N
!−1 N
!
X X
−1
β
b =β+ N x0i xi N −1
x0i ui
i=1 i=1
−1
x0i xi = plimN −1 X0 X.
P
Define A = plimN i
Alfonso
Then,
plimβb = β + A−1 · 0
plimβb = β. (10)
because plim N −1 i x0i ui = plimN −1 X0 u = 0. Therefore, the

P
OLS estimator is consistent.

I We did not impose any restriction to the nature of y
I In fact y can be binary or ordinal and OLS gives, even in those
cases, a consistent estimator.
I OLS gives us a consistent estimator of ant linear projection of
y in x.
I Note that we do not need u to be independent of x, which is

a much more restrictive condition than E (u|x) = 0.
Alfonso
Asymptotic inference with OLS
From our previous discussion we have
N
!−1 N
!
√ X X
N βb −β = N −1 x0i xi N −1/2 x0i ui
i=1 i=1
suppose that (x0i ui ), i = 1, . . . , N, is an i.i.d sequence with zero

mean and, element by element, finite variance. Then the conditions
for the central limit (CLT) theorem hold. The CLT says that
N
d
X
N −1/2 x0i ui → N (0, B)
i=1
where B = E (u 2 x0 x). For future reference we define Ω = E [uu0 |X].

Then, B = E [X0 uu0 X] = E [X0 ΩX].
Alfonso
We also know that
N
!−1
X
N −1 x0i xi − A−1 = op (1)
i=1
That is, the difference converges to zero when N goes to infinite.

This means that when investigating the asymptotic distribution of
−1
√ N

−1 0
P
N β − β we can substitute the term N
b xi xi for the
i=1
√
fix matrix A−1 . Then, we have that asymptotically N β b −β
is simply a linear transformation of something that is distributed
as a Normal. Here we can use the well known result that a linear
transformation of a Normal variable is distributed as a Normal as
well.
Alfonso
In particular we have that:
N
!
d
X
0 −1/2
W N x0i ui → N (0, W0 BW)
i=1
for any positive definite W. This implies

√
a
N β b −β ∼ N (0, A−1 BA−1 )
because A−1 is symetric. Under the assumption of homoskedas-

ticity Ω = E [uu0 |X] = E [uu0 ] = σ 2 I, or more generally,
B = E (u 2 x0 x) = σ 2 E (x0 x); σ 2 ≡ E (u 2 ), (OLS.3)
such that Var(u|x) = Var(u) = σ 2 . Therefore, we have

√
a
N βb −β ∼ N (0, σ 2 A−1 ).
Alfonso
This result saya that we can treat β
b as asymptotically Normal with
0 −1
mean β and variance σ [E (x x)] . A consistent estimator for σ 2 is
2
N
ûi2
P
i=1
σ̂ 2 =
N −K
N
If we substitute E (x0 x) by the sample average N −1 x0i xi =
P
i=1
N −1 (X0 X) we can write:
b∼a
β N (β, V̂).
where
b = σ̂ 2 (X0 X)−1
V̂ = Avar(β)
is the asymptotic estimator of the variance.
Alfonso
For the j-th coefficient the asymptotic standard error is definied as:
1/2
v̂jj
se(βbj ) =
N
where v̂jj is the j-th element of the diagonal of V̂.
Alfonso
In short. . .
I standard errors and confidence intervals.
I t tests
I F tests
I χ2 tests
are all asymptotically valid under OLS.1-OLS.3. Hence, we can do
hypothesis tests as usual in the basis that
βbj − βj
∼ tN−K
se(βbj )
Remember, you can use the p-value = P(|T | > |t|) for inference.
A p-value < 0.05 (p-value < 0.01) is evidence for rejecting H0 at a
significance level of 5% (1%).
Alfonso
Alfonso
Heteroskedasticity robust inference
Violations to OLS.1 leads to an inconsistent OLS estimator. This
is the worst problem the econometrician can ever face, as if OLS.1
fails we need to solve the endogeneity problem [i.e. E (u|x) 6= 0]
using some advanced method. If OLS.3 fails standard errors will
be inflated or deflated and inference will be invalid under the
assumption of homoskedasticity. This is a less troublesome issue as
we can use a heteroskedasticity robust estimator of the covariance
matrix. In general we have that:
Avar(β)b = N −1 A−1 BA−1
is a consistent estimator for A is Â = N −1 i x0i xi = N −1 (X0 X)−1 .

P
For B a consistent estimator is B̂ = N −1 i ûi2 x0i xi = N −1 X0 ûû0 X.

P
Therefore, we can write:
Alfonso
b = (X0 X)−1 X0 Ω̂X(X0 X)−1
Avar(β)
where a consistent estimator for Ω = E (uu0 ) is Ω̂ = Diag[ûi2 ] with
ûi = yi − xi β.
This estimator of the asymptotic variance β b is robust to het-

eroskedasticity violations. It is known in the literature as the sand-
wish or the White-Huber-Eiker estimator of the covariance matrix.
Using this estimator we can obtain robust standard errors and with
them we can obtain robust t statistics, confidence intervals and do
inference on the basis of those statistics.
I When OLS.3 fails the F tests are not valid ⇒use a Wald test.
I The OLS estimator is not an efficient estimator when OLS.3 fails
⇒use GLS to achieve efficiency gains.
Alfonso
The Delta method
Suppose that
√
d
N βb −β → N(0, V)
where V is positive definite. Let c : Ω → RQ be a continuously

differentiable function over the parameter space Ω ⊂ RK , where
Q ≤ K . Assume that β is in the interior of the parameter space.
Define C(β) ≡ ∇β c(β) as the Q × K Jacobian of c. Then,
√ h i
a
b − c(β) ∼
N c(β) N[0, C(β)VC(β)0 ] (DM)
Define ĈN ≡ C(β N ). Therefore, plimCN = C(β). If plimV̂N = V,

we have that
n√ h io0 h i−1 n√ h io
a
N c(β̂) − c(β) ĈN V̂N Ĉ0N N c(β̂) − c(β) ∼ χ2Q
(WT)
Alfonso
DM can be used to obtain asymptotic standard errors of nonlinear
functions of β̂. The asymptotic variance is given by
h i h i
Avar c(β̂) = ĈN Avar(β̂ N ) Ĉ0N
The asymptotic standard

h i errors are the square root of the diagonal
elements of Avar c(β̂) . The result (WT) is used for doing hypoth-
esis tests of the form H0 : c(β) = 0 vs H1 : c(β) 6= 0. The Wald
statistic is
√ h i−1 √
a
WN = Nc(β̂)0 ĈN V̂N Ĉ0N Nc(β̂) ∼ χ2Q
under the null hypothesis.
Alfonso
Example: Delta method
b = (β̂1 , β̂2 )0 with V̂ given by:

Let β

1 0.5
V̂ =
0.5 1
where I do not write explicitly the dependence on the sample size

to simplify notation. We want to obtain the asymptotic variance of
b = [exp(β̂1 ), exp(β̂2 )]0 . Ĉ in this case
exp(β)

∂ exp(β̂1 )/∂ β̂1 ∂ exp(β̂1 )/∂ β̂2 exp(β̂1 ) 0
Ĉ = =
∂ exp(β̂2 )/∂ β̂1 ∂ exp(β̂2 )/∂ β̂2 0 exp(β̂2 )
Alfonso
i
b = exp(β̂1 ) 0 1 0.5 exp(β̂1 ) 0
h
Avar exp(β)
0 exp(β̂2 ) 0.5 1 0 exp(β̂2 )
2

exp(β̂1 ) 0.5 exp(β̂1 ) exp(β̂2 )
=
0.5 exp(β̂1 ) exp(β̂2 ) exp(β̂2 )2
Alfonso
Three equivalent ways of doing asymptotic inference
The objective is to perform hypothesis tests of the form
H0: c(β) = 0 vs H1: c(β) 6= 0
where c(β) is Q × 1 and θ = β in the graph.

Alfonso
I Likelihood ratio test. If the constraint c(β) = 0 is valid,
then imposing it does not lead to a large reduction on the
log-likelihood.
!
L̂R a
−2 ln(λ) = −2 ln ∼ χ2Q
L̂U
where Q is the number of restrictions.
Alfonso
I Wald test. If the restriction is valid, then c(β MLE ) most be
close zero. Hence, we can base the hypothesis test on
c(β̂ MLE ) and reject H0 if the value that we get is significantly
different from zero. If c(β) = Rβ − r
0 −1
a
W = Rβ̂ − r RV̂R Rβ̂ − r ∼ χ2Q .
where R is Q × K and has range Q ≤ K , and r is Q × 1. In

the linear model with normal errors an exact-sample result
can be obtained
W
∼ F (Q, N − K )
Q
exactly, under H0 . Exact results are not available in nonlinear
models.
Alfonso
I Lagrange multiplier test (‘score’ test). This test is based on
the restricted model. The idea is that the likelihood function
is maximised under the constraint c(β) = 0
ln L∗ (β) = L(β) + λ0 c(β)
The solution is found solving the following equations

∂ ln L∗ ∂ ln L
= + C(β)0 λ = 0 (11)
∂β ∂β
∂ ln L∗
= c(β) = 0 (12)
∂λ
where C(β) ≡ ∇β c(β). If the restriction is valid, imposing
the restriction will not change much the log-likelihood and the
second term in (11) will be small.
Alfonso
In particular λ will be small. This can be tested with
H0: λ = 0, which leads us to the Lagrange multiplier test. An
equivalent form makes use of the fact that
∂ ln L
s(β̂ R ) = = −C(β)0 λ
∂β
where s(β) is known as the ’score’. For this reason the LM

test is also known as the score test. If the restriction is valid
then s(β̂ R ) = 0.
h i−1
a
LM = N −1 s(β̂ R )0 I(β̂ R ) s(β̂ R ) ∼ χ2Q
where I(β̂ R ) is a consistent estimator of the information

matrix evaluated at β̂ R
2

−1 ∂ ln L

I(β̂ R ) = plimN 0
∂β∂β β̂R
Alfonso
Note that
N
X
s(β̂ R ) = si (β̂ R )
i=1
Define S as the N × K matrix with ŝ0i (β̂ R ) in the i-th row. A

consistent estimator for I(β̂ R ) is the BHHH o OPG estimator.
N
X
−1
I(β̂ R ) = N si (β̂ R )s0i (β̂ R )
i=1
Let 1 be a N × 1 vector of ones. Then,

−1 0
LM = 10 S S0 S 2
S 1 = NRutre
where Rutre 2 is the uncentred R 2 of a regression of 1s on

2
si (β̂ R ). Note that NRutre can be calculated as N − RSS,
where RSS represents the sum of squared residuals in that
regression.
Alfonso
With linear restrictions there is a easy way of calulating the
LM statistic (no need to obtain the scores). Partition β
y = x1 β 1 + x2 β 2 + u
where x1 is 1 × K1 and x2 is 1 × K2 . In this case is possible to

do the test for H0: β 2 = 0 running the constrained regression
of y on x1 and calculating ūi = yi − xi1 β̄ 1 , i = 1, . . . , N. In a
second regression run ūi on x1 and x2 . The LM statistic is
a
NR 2 ∼ χ2K2 of this second regression. Note that here we use
the usual R 2 .
Alfonso
An example
We have data on medical expenditures of individuals 65 and over
who qualify to Medicare (USA) (see Cameron and Trivedi (2009)
Stata book, page 73). Original data comes from the Medical
Expenditure Panel Survey (MEPS). Medicare does not cover all
medical expenses and about 1/2 of eligible individuals purchase
suplementary insurance.
. use mus03data
. describe totexp ltotexp posexp suppins phylim actlim totchr age female income
storage display value

variable name type format label variable label
----------------------------------------------------------------------------------
totexp double %12.0g Total medical expenditure
ltotexp float %9.0g ln(totexp) if totexp > 0
posexp float %9.0g =1 if total expenditure > 0
suppins float %9.0g =1 if has supp priv insurance
phylim double %12.0g =1 if has functional limitation
actlim double %12.0g =1 if has activity limitation
totchr double %12.0g # of chronic problems
age double %12.0g Age
female double %12.0g =1 if female
income double %12.0g annual household income/1000
Alfonso
We want to test
H0: actlim = totchr = 0 vs H1: actlim 6= totchr 6= 0
How many restrictions?

. regress ltotexp suppins phylim actlim totchr age female income
Source | SS df MS Number of obs = 2955

-------------+------------------------------ F( 7, 2947) = 124.98
Model | 1264.72124 7 180.674463 Prob > F = 0.0000
Residual | 4260.16814 2947 1.44559489 R-squared = 0.2289
-------------+------------------------------ Adj R-squared = 0.2271
Total | 5524.88938 2954 1.87030785 Root MSE = 1.2023
------------------------------------------------------------------------------
ltotexp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
suppins | .2556428 .0462264 5.53 0.000 .1650034 .3462821
phylim | .3020598 .0569709 5.30 0.000 .190353 .4137666
actlim | .3560054 .0621118 5.73 0.000 .2342185 .4777923
totchr | .3758201 .0184227 20.40 0.000 .3396974 .4119429
age | .0038016 .0036561 1.04 0.299 -.0033672 .0109705
female | -.0843275 .0455442 -1.85 0.064 -.1736292 .0049741
income | .0025498 .0010194 2.50 0.012 .000551 .0045486
_cons | 6.703737 .27676 24.22 0.000 6.161075 7.2464
------------------------------------------------------------------------------
. estimates store UM
Alfonso
/* LR test */
. regress ltotexp suppins phylim age female income

-------------+------------------------------ F( 5, 2949) = 63.59
Model | 537.725487 5 107.545097 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.0958
Total | 5524.88938 2954 1.87030785 Root MSE = 1.3004
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
suppins | .2762374 .0499206 5.53 0.000 .1783547 .3741202
phylim | .8048815 .0501609 16.05 0.000 .7065276 .9032353
age | .0061206 .0039343 1.56 0.120 -.0015936 .0138349
female | -.0704256 .0492262 -1.43 0.153 -.1669468 .0260956
income | .0010507 .0010982 0.96 0.339 -.0011027 .0032041
_cons | 7.108237 .2969732 23.94 0.000 6.525942 7.690533
------------------------------------------------------------------------------
. estimates store RM
. lrtest UM RM
Likelihood-ratio test LR chi2(2) = 465.59

(Assumption: RM nested in UM) Prob > chi2 = 0.0000
Alfonso
/* Wald test */
. estimates restore UM
(results UM are active now)
. regress

-------------+------------------------------ F( 7, 2947) = 124.98
Model | 1264.72124 7 180.674463 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.2271
Total | 5524.88938 2954 1.87030785 Root MSE = 1.2023
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
suppins | .2556428 .0462264 5.53 0.000 .1650034 .3462821
phylim | .3020598 .0569709 5.30 0.000 .190353 .4137666
actlim | .3560054 .0621118 5.73 0.000 .2342185 .4777923
totchr | .3758201 .0184227 20.40 0.000 .3396974 .4119429
age | .0038016 .0036561 1.04 0.299 -.0033672 .0109705
female | -.0843275 .0455442 -1.85 0.064 -.1736292 .0049741
income | .0025498 .0010194 2.50 0.012 .000551 .0045486
_cons | 6.703737 .27676 24.22 0.000 6.161075 7.2464
------------------------------------------------------------------------------
. test actlim=totchr=0
( 1) actlim - totchr = 0
( 2) actlim = 0
F( 2, 2947) = 251.45
Prob > F = 0.0000
Alfonso
/* Score test */
. estimates restore RM
. predict scores, score
. testomit (mean: actlim totchr), score(scores)
regress: score tests for omitted variables
Term | score df p
---------------------+----------------------
mean |
actlim (as factor) | 71.72 1 0.0000
totchr | 459.94 1 0.0000
---------------------+----------------------
simultaneous test | 487.55 2 0.0000
---------------------+----------------------
Nonlinear tests
. estimates restore UM
(results UM are active now)
. nlcom (product:_b[actlim]*_b[totchr]), post

. test _b[product]=0
( 1) product = 0
F( 1, 2947) = 33.39
Prob > F = 0.0000
Alfonso
References
I Eicker, Friedhelm (1967), ”Limit Theorems for Regression with Unequal and
Dependent Errors”, Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, pp. 59–82.
I Goldberger, A.S. (1968). Topics in Regression Analysis. New-York: Macmillan.
I Huber, Peter J. (1967), ”The behavior of maximum likelihood estimates under
nonstandard conditions”, Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, pp. 221–233.
I Huber, P. J. (1981) Robust Statistics. New York: John Wiley and Sons.
I MacKinnon, James G.; White, Halbert (1985), ”Some
Heteroskedastic-Consistent Covariance Matrix Estimators with Improved Finite
Sample Properties”, Journal of Econometrics 29: 305–325.
I Manski, C. (1988). Analog Estimation Methods in Econometrics. New-York:
Chapman and Hall.
I White, Halbert (1980). ”A heteroskedasticity-consistent covariance matrix
estimator and a direct test for heteroskedasticity”. Econometrica 48 (4):
817–838.
Alfonso

Linear Model

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Model

Uploaded by

Copyright:

Available Formats

The linear model

yi = E [yi |xi ] + ui (1)

where yi is an scalar, xi is a 1 × K vector, β = (β1 , . . . , βK )0 is a

E (u|x1i , . . . , xKi ) = E (u|x) = 0, (OLS.1)

Condition OLS.1 is stronger than needed, as it requires u to be

equivalent to Cov(u, x) = 0, is sufficient for the consistency of the

Hence, given that E (x0 u) = 0 by OLS.1 we have:

where y is a N × 1 vector, X is the N × K data matrix, and u is a

because plim N −1 i x0i ui = plimN −1 X0 u = 0. Therefore, the

OLS estimator is consistent.

I Note that we do not need u to be independent of x, which is

suppose that (x0i ui ), i = 1, . . . , N, is an i.i.d sequence with zero

where B = E (u 2 x0 x). For future reference we define Ω = E [uu0 |X].

That is, the difference converges to zero when N goes to infinite.

for any positive definite W. This implies

because A−1 is symetric. Under the assumption of homoskedas-

B = E (u 2 x0 x) = σ 2 E (x0 x); σ 2 ≡ E (u 2 ), (OLS.3)

such that Var(u|x) = Var(u) = σ 2 . Therefore, we have

where v̂jj is the j-th element of the diagonal of V̂.

Avar(β)b = N −1 A−1 BA−1

is a consistent estimator for A is Â = N −1 i x0i xi = N −1 (X0 X)−1 .

For B a consistent estimator is B̂ = N −1 i ûi2 x0i xi = N −1 X0 ûû0 X.

This estimator of the asymptotic variance β b is robust to het-

where V is positive definite. Let c : Ω → RQ be a continuously

Define ĈN ≡ C(β N ). Therefore, plimCN = C(β). If plimV̂N = V,

The asymptotic standard

under the null hypothesis.

b = (β̂1 , β̂2 )0 with V̂ given by:

where I do not write explicitly the dependence on the sample size

where c(β) is Q × 1 and θ = β in the graph.

where Q is the number of restrictions.

where R is Q × K and has range Q ≤ K , and r is Q × 1. In

ln L∗ (β) = L(β) + λ0 c(β)

The solution is found solving the following equations

where s(β) is known as the ’score’. For this reason the LM

where I(β̂ R ) is a consistent estimator of the information

Define S as the N × K matrix with ŝ0i (β̂ R ) in the i-th row. A

Let 1 be a N × 1 vector of ones. Then,

where Rutre 2 is the uncentred R 2 of a regression of 1s on

where x1 is 1 × K1 and x2 is 1 × K2 . In this case is possible to

storage display value

H0: actlim = totchr = 0 vs H1: actlim 6= totchr 6= 0

How many restrictions?

Source | SS df MS Number of obs = 2955

Source | SS df MS Number of obs = 2955

Likelihood-ratio test LR chi2(2) = 465.59

Source | SS df MS Number of obs = 2955

regress: score tests for omitted variables

. nlcom (product:_b[actlim]*_b[totchr]), post

You might also like