You are on page 1of 33

The linear model

Alfonso
c Miranda (p. 1 of 33)
General framework
We have a random sample for the response varible yi and the con-
trol variables (including the constant) xi = (x1i , . . . , xKi ), with
i = 1 . . . N. For each observation the population equation of in-
terest is:

yi = E [yi |xi ] + ui (1)


yi = xi β + ui (2)

where yi is an scalar, xi is a 1 × K vector, β = (β1 , . . . , βK )0 is a


K × 1 vector, and ui is an scalar that represents the error. Note
that the vectors (yi , xi ), i = 1, . . . , N, are i.i.d (because we have a
random sample of the population).
I The model is lineal in parameters and is correctly specified.
I The error is additive.

Alfonso
c Miranda (p. 2 of 33)
Consistency of the OLS estimator
The least-squares estimator is consistent under the following two key
key conditions:

E (u|x1i , . . . , xKi ) = E (u|x) = 0, (OLS.1)


0
rank(x x) = K . (OLS.2)

Condition OLS.1 is stronger than needed, as it requires u to be


correlated neither with the control variables x nor with any function
of x. A weaker condition:

E (x0 u) = 0, (OLS.1a)

equivalent to Cov(u, x) = 0, is sufficient for the consistency of the


OLS estimator. However, to have an unbiased estimator we need
E (u|x) = 0. If OLS.1 holds the the regressorts x are said to be son
weakly exogenous.
Alfonso
c Miranda (p. 3 of 33)
Conditión (OLS.2) rules out an exact linear relationship among con-
trol variables. This is equivalent to ask that E (x0 x) be positive
definite. In short, we should be able to invert E (x0 x).
I Both OLS.1 como OLS.2 are assumptions about the
population and cannot be tested!
I Condition OLS.2 is easy to satisfy, though it may fail if the
model is not correctly specified.
I Condition OLS.1 is more substantial and must be carefully
justified (based on theory and intuition).
Under OLS.1 and OLS.2 the vector of parameters is identified. That
is, β can be written as a function of population moments of observed
variables.

Alfonso
c Miranda (p. 4 of 33)
y = xβ + u (3)
0 0 0
x y = x xβ + x u (4)
0 0 0
E (x y ) = E (x x)β + E (x u) (5)

Hence, given that E (x0 u) = 0 by OLS.1 we have:


−1
β̂ OLS = E (x0 x) E (x0 y ).

(6)

Note that the linear model can be written in full matrix notation as:

y = E [y|X] + u (7)
y = Xβ + u (8)

where y is a N × 1 vector, X is the N × K data matrix, and u is a


N × 1 vector of errors.

Alfonso
c Miranda (p. 5 of 33)
Then, the OLS estimator can be written as
b = (X0 X)−1 X0 y
β (9)

The analogy principle (see Goldberger (1968) and Manski (1988)) sug-
gest to substitute sample moment conditions for population moment
conditions to obtain estimators. This lead us to the method of mo-
ments that substitutes the corresponding sample averages for E (x0 x)
and E (x0 y ), respectively.
XN
!−1
XN
!
β
b= N −1 x0i xi N −1 x0i yi
i=1 i=1
N
!−1 N
!
X X
−1
β
b= N x0i xi N −1
x0i (xi β + ui )
i=1 i=1
N
!−1 N
!
X X
−1
β
b =β+ N x0i xi N −1
x0i ui
i=1 i=1

−1
x0i xi = plimN −1 X0 X.
P
Define A = plimN i
Alfonso
c Miranda (p. 6 of 33)
Then,

plimβb = β + A−1 · 0
plimβb = β. (10)

because plim N −1 i x0i ui = plimN −1 X0 u = 0. Therefore, the


P 

OLS estimator is consistent.


I We did not impose any restriction to the nature of y
I In fact y can be binary or ordinal and OLS gives, even in those
cases, a consistent estimator.
I OLS gives us a consistent estimator of ant linear projection of
y in x.

I Note that we do not need u to be independent of x, which is


a much more restrictive condition than E (u|x) = 0.

Alfonso
c Miranda (p. 7 of 33)
Asymptotic inference with OLS
From our previous discussion we have

N
!−1 N
!
√   X X
N βb −β = N −1 x0i xi N −1/2 x0i ui
i=1 i=1

suppose that (x0i ui ), i = 1, . . . , N, is an i.i.d sequence with zero


mean and, element by element, finite variance. Then the conditions
for the central limit (CLT) theorem hold. The CLT says that
N
d
X
N −1/2 x0i ui → N (0, B)
i=1

where B = E (u 2 x0 x). For future reference we define Ω = E [uu0 |X].


Then, B = E [X0 uu0 X] = E [X0 ΩX].

Alfonso
c Miranda (p. 8 of 33)
We also know that
N
!−1
X
N −1 x0i xi − A−1 = op (1)
i=1

That is, the difference converges to zero when N goes to infinite.


This means that when investigating the asymptotic distribution of
−1
√  N
 
−1 0
P
N β − β we can substitute the term N
b xi xi for the
i=1
√  
fix matrix A−1 . Then, we have that asymptotically N β b −β
is simply a linear transformation of something that is distributed
as a Normal. Here we can use the well known result that a linear
transformation of a Normal variable is distributed as a Normal as
well.

Alfonso
c Miranda (p. 9 of 33)
In particular we have that:
N
!
d
X
0 −1/2
W N x0i ui → N (0, W0 BW)
i=1

for any positive definite W. This implies


√  
a
N β b −β ∼ N (0, A−1 BA−1 )

because A−1 is symetric. Under the assumption of homoskedas-


ticity Ω = E [uu0 |X] = E [uu0 ] = σ 2 I, or more generally,

B = E (u 2 x0 x) = σ 2 E (x0 x); σ 2 ≡ E (u 2 ), (OLS.3)

such that Var(u|x) = Var(u) = σ 2 . Therefore, we have


√  
a
N βb −β ∼ N (0, σ 2 A−1 ).

Alfonso
c Miranda (p. 10 of 33)
This result saya that we can treat β
b as asymptotically Normal with
0 −1
mean β and variance σ [E (x x)] . A consistent estimator for σ 2 is
2

N
ûi2
P
i=1
σ̂ 2 =
N −K
N
If we substitute E (x0 x) by the sample average N −1 x0i xi =
P
i=1
N −1 (X0 X) we can write:
b∼a
β N (β, V̂).

where
b = σ̂ 2 (X0 X)−1
V̂ = Avar(β)
is the asymptotic estimator of the variance.

Alfonso
c Miranda (p. 11 of 33)
For the j-th coefficient the asymptotic standard error is definied as:
 1/2
v̂jj
se(βbj ) =
N

where v̂jj is the j-th element of the diagonal of V̂.

Alfonso
c Miranda (p. 12 of 33)
In short. . .
I standard errors and confidence intervals.
I t tests
I F tests
I χ2 tests
are all asymptotically valid under OLS.1-OLS.3. Hence, we can do
hypothesis tests as usual in the basis that

βbj − βj
∼ tN−K
se(βbj )

Remember, you can use the p-value = P(|T | > |t|) for inference.
A p-value < 0.05 (p-value < 0.01) is evidence for rejecting H0 at a
significance level of 5% (1%).

Alfonso
c Miranda (p. 13 of 33)
Alfonso
c Miranda (p. 14 of 33)
Heteroskedasticity robust inference
Violations to OLS.1 leads to an inconsistent OLS estimator. This
is the worst problem the econometrician can ever face, as if OLS.1
fails we need to solve the endogeneity problem [i.e. E (u|x) 6= 0]
using some advanced method. If OLS.3 fails standard errors will
be inflated or deflated and inference will be invalid under the
assumption of homoskedasticity. This is a less troublesome issue as
we can use a heteroskedasticity robust estimator of the covariance
matrix. In general we have that:

Avar(β)b = N −1 A−1 BA−1

is a consistent estimator for A is  = N −1 i x0i xi = N −1 (X0 X)−1 .


P

For B a consistent estimator is B̂ = N −1 i ûi2 x0i xi = N −1 X0 ûû0 X.


P
Therefore, we can write:

Alfonso
c Miranda (p. 15 of 33)
b = (X0 X)−1 X0 Ω̂X(X0 X)−1
Avar(β)
where a consistent estimator for Ω = E (uu0 ) is Ω̂ = Diag[ûi2 ] with

ûi = yi − xi β.

This estimator of the asymptotic variance β b is robust to het-


eroskedasticity violations. It is known in the literature as the sand-
wish or the White-Huber-Eiker estimator of the covariance matrix.
Using this estimator we can obtain robust standard errors and with
them we can obtain robust t statistics, confidence intervals and do
inference on the basis of those statistics.
I When OLS.3 fails the F tests are not valid ⇒use a Wald test.
I The OLS estimator is not an efficient estimator when OLS.3 fails
⇒use GLS to achieve efficiency gains.

Alfonso
c Miranda (p. 16 of 33)
The Delta method
Suppose that
√  
d
N βb −β → N(0, V)

where V is positive definite. Let c : Ω → RQ be a continuously


differentiable function over the parameter space Ω ⊂ RK , where
Q ≤ K . Assume that β is in the interior of the parameter space.
Define C(β) ≡ ∇β c(β) as the Q × K Jacobian of c. Then,
√ h i
a
b − c(β) ∼
N c(β) N[0, C(β)VC(β)0 ] (DM)

Define ĈN ≡ C(β N ). Therefore, plimCN = C(β). If plimV̂N = V,


we have that
n√ h io0 h i−1 n√ h io
a
N c(β̂) − c(β) ĈN V̂N Ĉ0N N c(β̂) − c(β) ∼ χ2Q
(WT)
Alfonso
c Miranda (p. 17 of 33)
DM can be used to obtain asymptotic standard errors of nonlinear
functions of β̂. The asymptotic variance is given by
h i h i
Avar c(β̂) = ĈN Avar(β̂ N ) Ĉ0N

The asymptotic standard


h i errors are the square root of the diagonal
elements of Avar c(β̂) . The result (WT) is used for doing hypoth-
esis tests of the form H0 : c(β) = 0 vs H1 : c(β) 6= 0. The Wald
statistic is
√ h i−1 √
a
WN = Nc(β̂)0 ĈN V̂N Ĉ0N Nc(β̂) ∼ χ2Q

under the null hypothesis.

Alfonso
c Miranda (p. 18 of 33)
Example: Delta method

b = (β̂1 , β̂2 )0 with V̂ given by:


Let β
 
1 0.5
V̂ =
0.5 1

where I do not write explicitly the dependence on the sample size


to simplify notation. We want to obtain the asymptotic variance of
b = [exp(β̂1 ), exp(β̂2 )]0 . Ĉ in this case
exp(β)
   
∂ exp(β̂1 )/∂ β̂1 ∂ exp(β̂1 )/∂ β̂2 exp(β̂1 ) 0
Ĉ = =
∂ exp(β̂2 )/∂ β̂1 ∂ exp(β̂2 )/∂ β̂2 0 exp(β̂2 )

Alfonso
c Miranda (p. 19 of 33)
i    
b = exp(β̂1 ) 0 1 0.5 exp(β̂1 ) 0
h
Avar exp(β)
0 exp(β̂2 ) 0.5 1 0 exp(β̂2 )
2
 
exp(β̂1 ) 0.5 exp(β̂1 ) exp(β̂2 )
=
0.5 exp(β̂1 ) exp(β̂2 ) exp(β̂2 )2

Alfonso
c Miranda (p. 20 of 33)
Three equivalent ways of doing asymptotic inference
The objective is to perform hypothesis tests of the form
H0: c(β) = 0 vs H1: c(β) 6= 0

where c(β) is Q × 1 and θ = β in the graph.


Alfonso
c Miranda (p. 21 of 33)
I Likelihood ratio test. If the constraint c(β) = 0 is valid,
then imposing it does not lead to a large reduction on the
log-likelihood.
!
L̂R a
−2 ln(λ) = −2 ln ∼ χ2Q
L̂U

where Q is the number of restrictions.

Alfonso
c Miranda (p. 22 of 33)
I Wald test. If the restriction is valid, then c(β MLE ) most be
close zero. Hence, we can base the hypothesis test on
c(β̂ MLE ) and reject H0 if the value that we get is significantly
different from zero. If c(β) = Rβ − r
 0  −1  
a
W = Rβ̂ − r RV̂R Rβ̂ − r ∼ χ2Q .

where R is Q × K and has range Q ≤ K , and r is Q × 1. In


the linear model with normal errors an exact-sample result
can be obtained
W
∼ F (Q, N − K )
Q
exactly, under H0 . Exact results are not available in nonlinear
models.

Alfonso
c Miranda (p. 23 of 33)
I Lagrange multiplier test (‘score’ test). This test is based on
the restricted model. The idea is that the likelihood function
is maximised under the constraint c(β) = 0

ln L∗ (β) = L(β) + λ0 c(β)

The solution is found solving the following equations


∂ ln L∗ ∂ ln L
= + C(β)0 λ = 0 (11)
∂β ∂β
∂ ln L∗
= c(β) = 0 (12)
∂λ
where C(β) ≡ ∇β c(β). If the restriction is valid, imposing
the restriction will not change much the log-likelihood and the
second term in (11) will be small.

Alfonso
c Miranda (p. 24 of 33)
In particular λ will be small. This can be tested with
H0: λ = 0, which leads us to the Lagrange multiplier test. An
equivalent form makes use of the fact that
∂ ln L
s(β̂ R ) = = −C(β)0 λ
∂β

where s(β) is known as the ’score’. For this reason the LM


test is also known as the score test. If the restriction is valid
then s(β̂ R ) = 0.
h i−1
a
LM = N −1 s(β̂ R )0 I(β̂ R ) s(β̂ R ) ∼ χ2Q

where I(β̂ R ) is a consistent estimator of the information


matrix evaluated at β̂ R
2

−1 ∂ ln L

I(β̂ R ) = plimN 0
∂β∂β β̂R
Alfonso
c Miranda (p. 25 of 33)
Note that
N
X
s(β̂ R ) = si (β̂ R )
i=1

Define S as the N × K matrix with ŝ0i (β̂ R ) in the i-th row. A


consistent estimator for I(β̂ R ) is the BHHH o OPG estimator.
N
X
−1
I(β̂ R ) = N si (β̂ R )s0i (β̂ R )
i=1

Let 1 be a N × 1 vector of ones. Then,


 −1 0
LM = 10 S S0 S 2
S 1 = NRutre

where Rutre 2 is the uncentred R 2 of a regression of 1s on


2
si (β̂ R ). Note that NRutre can be calculated as N − RSS,
where RSS represents the sum of squared residuals in that
regression.
Alfonso
c Miranda (p. 26 of 33)
With linear restrictions there is a easy way of calulating the
LM statistic (no need to obtain the scores). Partition β

y = x1 β 1 + x2 β 2 + u

where x1 is 1 × K1 and x2 is 1 × K2 . In this case is possible to


do the test for H0: β 2 = 0 running the constrained regression
of y on x1 and calculating ūi = yi − xi1 β̄ 1 , i = 1, . . . , N. In a
second regression run ūi on x1 and x2 . The LM statistic is
a
NR 2 ∼ χ2K2 of this second regression. Note that here we use
the usual R 2 .

Alfonso
c Miranda (p. 27 of 33)
An example
We have data on medical expenditures of individuals 65 and over
who qualify to Medicare (USA) (see Cameron and Trivedi (2009)
Stata book, page 73). Original data comes from the Medical
Expenditure Panel Survey (MEPS). Medicare does not cover all
medical expenses and about 1/2 of eligible individuals purchase
suplementary insurance.
. use mus03data
. describe totexp ltotexp posexp suppins phylim actlim totchr age female income

storage display value


variable name type format label variable label
----------------------------------------------------------------------------------
totexp double %12.0g Total medical expenditure
ltotexp float %9.0g ln(totexp) if totexp > 0
posexp float %9.0g =1 if total expenditure > 0
suppins float %9.0g =1 if has supp priv insurance
phylim double %12.0g =1 if has functional limitation
actlim double %12.0g =1 if has activity limitation
totchr double %12.0g # of chronic problems
age double %12.0g Age
female double %12.0g =1 if female
income double %12.0g annual household income/1000

Alfonso
c Miranda (p. 28 of 33)
We want to test

H0: actlim = totchr = 0 vs H1: actlim 6= totchr 6= 0

How many restrictions?


. regress ltotexp suppins phylim actlim totchr age female income

Source | SS df MS Number of obs = 2955


-------------+------------------------------ F( 7, 2947) = 124.98
Model | 1264.72124 7 180.674463 Prob > F = 0.0000
Residual | 4260.16814 2947 1.44559489 R-squared = 0.2289
-------------+------------------------------ Adj R-squared = 0.2271
Total | 5524.88938 2954 1.87030785 Root MSE = 1.2023

------------------------------------------------------------------------------
ltotexp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
suppins | .2556428 .0462264 5.53 0.000 .1650034 .3462821
phylim | .3020598 .0569709 5.30 0.000 .190353 .4137666
actlim | .3560054 .0621118 5.73 0.000 .2342185 .4777923
totchr | .3758201 .0184227 20.40 0.000 .3396974 .4119429
age | .0038016 .0036561 1.04 0.299 -.0033672 .0109705
female | -.0843275 .0455442 -1.85 0.064 -.1736292 .0049741
income | .0025498 .0010194 2.50 0.012 .000551 .0045486
_cons | 6.703737 .27676 24.22 0.000 6.161075 7.2464
------------------------------------------------------------------------------

. estimates store UM

Alfonso
c Miranda (p. 29 of 33)
/* LR test */
. regress ltotexp suppins phylim age female income

Source | SS df MS Number of obs = 2955


-------------+------------------------------ F( 5, 2949) = 63.59
Model | 537.725487 5 107.545097 Prob > F = 0.0000
Residual | 4987.1639 2949 1.6911373 R-squared = 0.0973
-------------+------------------------------ Adj R-squared = 0.0958
Total | 5524.88938 2954 1.87030785 Root MSE = 1.3004

------------------------------------------------------------------------------
ltotexp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
suppins | .2762374 .0499206 5.53 0.000 .1783547 .3741202
phylim | .8048815 .0501609 16.05 0.000 .7065276 .9032353
age | .0061206 .0039343 1.56 0.120 -.0015936 .0138349
female | -.0704256 .0492262 -1.43 0.153 -.1669468 .0260956
income | .0010507 .0010982 0.96 0.339 -.0011027 .0032041
_cons | 7.108237 .2969732 23.94 0.000 6.525942 7.690533
------------------------------------------------------------------------------

. estimates store RM
. lrtest UM RM

Likelihood-ratio test LR chi2(2) = 465.59


(Assumption: RM nested in UM) Prob > chi2 = 0.0000

Alfonso
c Miranda (p. 30 of 33)
/* Wald test */
. estimates restore UM
(results UM are active now)

. regress

Source | SS df MS Number of obs = 2955


-------------+------------------------------ F( 7, 2947) = 124.98
Model | 1264.72124 7 180.674463 Prob > F = 0.0000
Residual | 4260.16814 2947 1.44559489 R-squared = 0.2289
-------------+------------------------------ Adj R-squared = 0.2271
Total | 5524.88938 2954 1.87030785 Root MSE = 1.2023

------------------------------------------------------------------------------
ltotexp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
suppins | .2556428 .0462264 5.53 0.000 .1650034 .3462821
phylim | .3020598 .0569709 5.30 0.000 .190353 .4137666
actlim | .3560054 .0621118 5.73 0.000 .2342185 .4777923
totchr | .3758201 .0184227 20.40 0.000 .3396974 .4119429
age | .0038016 .0036561 1.04 0.299 -.0033672 .0109705
female | -.0843275 .0455442 -1.85 0.064 -.1736292 .0049741
income | .0025498 .0010194 2.50 0.012 .000551 .0045486
_cons | 6.703737 .27676 24.22 0.000 6.161075 7.2464
------------------------------------------------------------------------------

. test actlim=totchr=0

( 1) actlim - totchr = 0
( 2) actlim = 0

F( 2, 2947) = 251.45
Prob > F = 0.0000

Alfonso
c Miranda (p. 31 of 33)
/* Score test */
. estimates restore RM
. predict scores, score
. testomit (mean: actlim totchr), score(scores)

regress: score tests for omitted variables

Term | score df p
---------------------+----------------------
mean |
actlim (as factor) | 71.72 1 0.0000
totchr | 459.94 1 0.0000
---------------------+----------------------
simultaneous test | 487.55 2 0.0000
---------------------+----------------------

Nonlinear tests
. estimates restore UM
(results UM are active now)

. nlcom (product:_b[actlim]*_b[totchr]), post


. test _b[product]=0

( 1) product = 0

F( 1, 2947) = 33.39
Prob > F = 0.0000

Alfonso
c Miranda (p. 32 of 33)
References
I Eicker, Friedhelm (1967), ”Limit Theorems for Regression with Unequal and
Dependent Errors”, Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, pp. 59–82.
I Goldberger, A.S. (1968). Topics in Regression Analysis. New-York: Macmillan.
I Huber, Peter J. (1967), ”The behavior of maximum likelihood estimates under
nonstandard conditions”, Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, pp. 221–233.
I Huber, P. J. (1981) Robust Statistics. New York: John Wiley and Sons.
I MacKinnon, James G.; White, Halbert (1985), ”Some
Heteroskedastic-Consistent Covariance Matrix Estimators with Improved Finite
Sample Properties”, Journal of Econometrics 29: 305–325.
I Manski, C. (1988). Analog Estimation Methods in Econometrics. New-York:
Chapman and Hall.
I White, Halbert (1980). ”A heteroskedasticity-consistent covariance matrix
estimator and a direct test for heteroskedasticity”. Econometrica 48 (4):
817–838.

Alfonso
c Miranda (p. 33 of 33)

You might also like