You are on page 1of 50

Economics 513

Lecture 2: The Classical Linear Regression Model

Introduction

In lecture 1 we

Introduced concept of an economic relation


Noted that empirical economic relations do not resemble textbook
relations
Introduced a method to find the best fitting linear relation

No reference to mathematical statistics in all of this. Initially econometrics


did not use the tools of mathematical statistics.

Mathematical statistics develops methods for the analysis of data generated


by a random experiment in order to learn about that random experiment.

1
Economics 513

Is this relevant in economics?

Consider

Wage equation: relation between wage and education, work experience,


gender,
Macro consumption function: relation between (national) consumption
and (national) income

What is the random experiment?

To make progress we start with the assumption that all economic relations
are essentially deterministic, if we include all variables

y = f( x1 , , xW )

2
Economics 513

Hence, if we have data yi , xi1 , , xiW , i = 1, , n then

yi = f( xi1 , , xiW ) , i = 1, , n

Let x1 , , xW be the sample averages of the variables and assume that f is


sufficiently many times differentiable to have a Taylor series expansion
around x1 , , xW , i.e. a polynomial approximation

yi = 0 + 1 ( xi1 x1 ) + + W ( xiW xW ) +

+ 1 ( xi1 x1 ) 2 + + ( xiW xW ) 2 +

+ 1 ( xi1 x1 )( xi 2 x2 ) +

3
Economics 513

We divide x1 , , xW into three groups

1. Variables that do not vary in the sample (take this to be last W V


variables), i.e. for i = 1, , n , xi ,V +1 = xV +1 , , xi ,W = xW . Example:
gender if we consider a sample of women.
2. Variables in the relation that are omitted or cannot be included because
they are unobservable. Let this be the next V K + 1 variables.
3. Variables included in the relation, i.e. x1 , , x K +1.

4
Economics 513

To keep the relation simple we concentrate on the linear part. Hence the
observations satisfy

yi = 0 + 1 xi1 + + K 1 xi , K 1 + i

with
K =1
0 = 0 jxj
j =1
The remainder term contains all the omitted terms

i = K ( xiK x K ) + + V ( xiV xV ) +

+ 1 ( xi1 x1 ) 2 + + V ( xiV xV ) 2 +

+ 1 ( xi1 x1 )( xi 2 x2 ) +

We call i the disturbance (of the exact linear relation).

5
Economics 513

Note that

f
j = ( x1 , , xW ) , j = 1, , K 1
xj

K 1
0 = f( x1 , , xW ) jxj
j =1
Conclusions:

1. The slope coefficient j is the partial effect of x j on y .


2. The slope coefficients and the intercept depend on the variables that
are constant in a sample, e.g. in a sample of women the coefficients in a
wage relation may be different from those in a sample of only men.
3. Only in very special cases will the relation have a 0 intercept.

6
Economics 513

Consider the following experiment: Prediction of yi on the basis of


xi1 , , xi , K 1 .

Assume that we have observed xi1 , , xi , K 1 . This does not tell us anything
about the disturbance i that depends on (many) variables beside
x1 , , x K 1 . Hence, even if we know the coefficients 0 , , K 1, we cannot
predict with certainty what yi is.

A variable with a value that is unknown before the experiment is performed


is a random variable in probability theory.

As in classical random experiments (flipping coin, rolling die) randomness


due to lack of knowledge.

7
Economics 513

In prediction experiment yi is a random variable and hence so is

i = yi 0 1 xi1 K 1 xi , K 1

If we have a sample of size n , we have n replications of the random


experiment

y1 = 0 + 1 x11 + + K 1 x1, K 1 + 1

y n = 0 + 1 xn1 + + K 1 xn, K 1 + n

The distribution of the disturbance reflects the fact that we do not know
the value of the disturbance if we know x1 , , x K 1 .

In fact we make a more extreme assumption: the value of is completely


unpredictable from x1,, xK 1 .

8
Economics 513

Assumption 1: E( i | xi1 , , xi , K 1 ) = 0

In words: the disturbance is mean-independent of x 1 , , x K 1 .

Why do we need this assumption?

Consider

(1) i = ( xi1 x1 ) + i , i = 1, , n

Hence, is (partially) predictable using x1 (but not completely so).

Substitution gives

yi = 0 x1 + ( 1 + ) xi1 + + K 1 xi , K 1 + i

9
Economics 513

Remember that

f
1 = ( x1 , , xW )
x1

Conclusion: If (1) holds then the coefficient of x1 is not equal to the partial
effect of x1 on y .

Failure of Assumption 1 is a failure of the ceteris paribus condition in the


sample: a change in x1 has two effects on y , a direct effect 1 and an
indirect effect . The latter effect is due to the fact that in a sample we
cannot hold other relevant variables fixed/constant. Hence we only measure
the partial effect if the omitted variables are not related with x1.

Measuring partial effects is the goal of most empirical research in


economics (and other social sciences). The biggest challenge in empirical
research is to ensure the Assumption 1 holds.

10
Economics 513

Whether we care depends on our objectives. Compare homeowner who is


interested in relation between house price and square footage of house.

If he/she wants to predict the sales price of house as is no reason to be


worried about interpretation of regression coefficient as partial effect.
If he/she wants to evaluate the investment in an addition to the house
the estimation of the partial effect is essential.

Two strategies to estimate partial effect

Include all variables that are correlated with x1 in the relation.


Assign x1 randomly, i.e. using a random experiment that is independent
of anything, e.g. by flipping a coin if x1is dichotomous.

11
Economics 513

The CLR model

Re-label the variables x 1 , , x K 1 as x2 , , x K and introduce the


variable
x1 1

i.e. xi1 = 1, i = 1, , n . Also re-label 0 , , K 1to 1 , , K .

In the new notation for i = 1, , n

(2) yi = 1 xi1 + 2 xi 2 + + K xiK + i

This is the multiple linear regression model. The relation (2) is linear in
x1 , , x K and also linear in 1 , , K . The latter is essential. By re-
interpreting x1 , , x K we can deal with relations that are non-linear in these
variables.

12
Economics 513

Examples:

Polynomial in x
y = 1 + 2 x + 3 x 2 + 4 x 3 +

Define: x1 1, x2 = x, x3 = x 2 , x4 = x 3

13
Economics 513

Log-linear relation

ln y = 1 + 2 ln x +

Define: x1 1, x2 = ln x

Note

ln y
2 =
ln x

i.e. 2 is the elasticity of y w.r.t. x .

14
Economics 513

Semi-log relation

ln y = 1 + 2 x +
Note: no re-definition is needed.

Also
y
ln y y
2 = =
x x

This is a semi-elasticity.

15
Economics 513

Assumption 1 in the new notation:

(3) E( i | xi1 , , xiK ) = 0 , i = 1, , n

Note that this implies that E ( i ) = 0 . This is without loss of generality


because a non-zero mean can be absorbed into the intercept.

16
Economics 513

In matrix notation (2) and (3) are

y = X +
and

E( i | xi ) = 0 , i = 1, , n
with

x1
(4) X =

xn

Note that xi is a K vector.

17
Economics 513

The Classical (Multiple) Linear Regression (CLR) model is a set of


assumptions mainly on the conditional distribution of given x1 , , x K .

Some assumptions are essential, some are convenient initial assumptions


and can/will be relaxed.

The CLR model is appropriate in many practical situations and is the


starting point for the use of mathematical statistical inference to measure
economic relations

18
Economics 513

CLR model

y = X +

Assumption 1: Fundamental assumption

E( | X ) = 0

Assumption 2: Spherical disturbances

E( | X ) = 2 I

Assumption 3: Full rank

rank( X ) = K

19
Economics 513

Discussion of the assumptions

Assumption 1 is shorthand for

E( i | X ) = 0 , i = 1, , n

Hence this is equivalent to

E( i | x1 , , xn ) = 0 , i = 1, , n

Compare this with

E( i | xi ) = 0 , i = 1, , n

By the law of iterated expectations, the current assumption implies the


latter.

20
Economics 513

The current assumption states that not only xi but also x j , j i is not
related to i . This is not stronger than the previous assumption if x1 , , x n
are independent as in a random sample from a population. If these are not
independent, as e.g. in time-series data, then this additional assumption
may be too strong.

Assumption 1 is satisfied if X is chosen independently of i . In that case we


can treat X as a matrix of known constants. Therefore instead of
Assumption 1 one sometimes sees

Assumption 1: X is a matrix of known constants determined independently


of .

Note: Chosing/controlling X is not enough.

21
Economics 513

Next, we consider assumption 2

Note

12 1 2 1 n


' = 2 1


n1 n2
Hence Assumption 2 implies that

E( i2 | X ) = 2 , i = 1, , n

This is called homoskedasticity.

22
Economics 513

Assumption 2 also implies that

E( i j | X ) = 0

Hence, given X the disturbances are uncorrelated.

23
Economics 513

Example of failure of homoskedasticity: random coefficients

y = 0 + ( 1 + u ) x +

population variation
in coefficient

= 0 + 1 x + + ux

For the composite disturbance, if E(u | x) = 0

E( + ux | x) = 0
but
Var( + ux | x) = 2 + u x + u2 x 2
with u = E(u ) , u2 = E(u 2 ) . Hence the composite error is heteroskedastic.

24
Economics 513

Failure of uncorrelated disturbances: serial correlation

Assume serial correlation of order 1 in disturbances

i = i 1 + ui

Applies in time-series.
Finally, consider Assumption 3.

If rank( X ) = K , then Xa = 0 if and only if a = 0 with a a K vector, i.e.


there is no linear relation between the K variables.

25
Economics 513

Failure of Assumption 3: wage equation

yi = 1 xi1 + 2 xi 2 + 3 xi 3 + 4 xi 4 + i , i = 1, , n

with
x2 = schooling (years)
x3 = age
x4 = potential experience
(age-years in school-6)

26
Economics 513

Hence

x4 = x3 x2 6

27
Economics 513

Define for any constant c,


~ ~ ~ ~
1 = 1 6c, 2 = 2 c, 3 = 3 + c, 4= 4 c

Then also
~ ~ ~ ~
yi = 1 xi1 + 2 xi 2 + 3 xi 3 + 4 xi 4 + i , i = 1, , n

~ ~
Conclusion: We cannot distinguish between 1 , , 4 and 1 , , 4 . For all
c these parameters are observationally equivalent.

Problem is also clear if we substitute for x4

yi = ( 1 6 4 ) xi1 + ( 2 4 ) xi 2 + ( 3 + 4 ) xi 3 + i

28
Economics 513

By assumptions 1 and 2, E( | X ) = 0 , Var( | X ) = 2 I . Sometimes it is


assumed that

| X ~ N (0, 2 I ) .

Why is the normal distribution a natural choice?

In sequel we sometimes assume:

Assumption 4
| X ~ N (0, 2 I )

29
Economics 513

Linear regression as projection

Alternative interpretation of linear regression is as conditional mean


function.

In previous derivation regression coefficient j in multiple linear regression


is the partial effect on y

f
j = ( x1 , , xW )
xj
For this result Assumption 1 was necessary.

Alternative: Consider j as the coefficient of x j in a linear conditional mean


function.

To keep things simple we consider the case of 1 explanatory variable (and


an intercept).

30
Economics 513

The dependent variable y and the independent variable x have a joint


population distribution with joint frequency distribution f(x, y) .

Example: y = savings rate, x = income and f(x, y) is frequency distribution


over all US households.

By a sample survey we obtain yi , xi , i = 1, , n . This survey can be used to


obtain an estimate of the population f(x, y) denoted by f(x, y) (see table 1.1).

Note savings rate and income have been discretized.

31
Economics 513

32
Economics 513

From the joint frequency in the sample we obtain the sample conditional
frequency distribution (see table 1.2)

(y, x)
f
f(y | x) =
f(x)

33
Economics 513

If there is an exact relation between y and x , then every column should


have one 1 and rest 0s.

If not, we can consider the average of y for every value of x . This is the
conditional mean function.

In population

m( x) = E( y | x) = y f( y | x)
y
with estimate

m ( x) = y f( y | x)
y

34
Economics 513

Note that m ( x) may be rough: sampling variation around smooth


population m(x) .

35
Economics 513

Why is m( x) = E( y | x) interesting?

One reason is optimal prediction. Assume joint population distribution


f(x, y) is known and that you have a random draw from this distribution.
Only x is revealed and you must predict y . What is the best predictor
h(x) ?

Criterion: minimize expected squared prediction error


( ) ( )
E ( y h( x) )2 = E (( y m( x)) + (m( x) h( x)) )2 =

= E (( y m( x)) ) + 2 E(( y m( x))(m( x) h( x)) ) +


2

( ) (
+ E (m( x) h( x)) 2 E ( y m( x)) 2 )
and this lower bound is achieved if h( x) = m( x) .

Conclusion: Optimal prediction is h( x) = E( y | x) .

36
Economics 513

Now let us restrict to linear prediction

h( x) = a + bx

Best linear predictor

(
min E ( y a bx) 2
a,b
)
First-order conditions with u = y a bx

2 E(u ) = 0 E( y ) = a + b E( x)

2 E(ux) = 0 E( xy ) = a + b E( x 2 )

37
Economics 513

Solution (compare with OLS solution)

Cov( x, y )
b=
Var( x)

a = E( y ) b E( x)

If we replace population moments with sample moments we obtain

n
( xi x )( yi y )
b = i =1 n
( xi x ) 2
i =1

a = y bx

This is the OLS solution!

38
Economics 513

There is an important difference between the case that the conditional


mean is linear and the case that the conditional mean is not linear.

If E( y | x) = a + bx , we have for u = y a bx

E(u | x) = E( y a bx | x) = E( y E( y | x) | x ) = 0

This is Assumption 1 in the CLR model.

However, b has not a structural interpretation, as a partial effect!

39
Economics 513

If conditional mean is not linear, we have from the first-order condition


E(ux) = 0 .

This is weaker:

E(u | x) = 0 E(ux) = 0
but

E(ux) = 0 not E(u | x) = 0

Hence in that case Assumption 1 of the CLR model is not satisfied but a
weaker uncorrelatedness assumption.

40
Economics 513

The Ordinary Least Squares (OLS) estimator

We have n observations yi , xi 2 , , xiK , i = 1, , n . We organize the data in


the n 1vector y and the n K matrix X with

1 x12 x1K

X =

1 x xnK
n2

The observations are a sample from a population and we assume that the
joint distribution of y, X is such that the CLR model is the appropriate
statistical model, i.e. y, X satisfy

y = X +

41
Economics 513

for some K 1vector of regression coefficients and some n 1vector of


random errors with a distribution that satisfies

E ( | X ) = 0
E ( | X ) = 2 I

This specifies the random experiment of which y, X is the outcome (the


CLR model).

We can now discuss statistical inference: estimation of population


parameters and tests of hypotheses concerning the population parameters
and other aspects of the population distribution.

42
Economics 513

Setup applies to both cross-section and time-series data. In first case


yi , xi 2 , , xiK , i = 1, , n is a random sample and the CLR assumptions on
the population distribution can made on

y1 = x1 + 1

Because the observations are independent we can obtain the joint


distribution of y, X from the marginal distributions. For time-series data
the observations are not independent and the CLR model applies directly to
the joint distribution of y, X .

43
Economics 513

Estimation of and 2

The solution to minimization of sum of squared deviations/residuals

b = ( X ' X ) 1 X ' y

Note that for this rank ( X ) = K . This is an estimator of (only depends on


the data): the Ordinary Least Squares (OLS) estimator of .

Is the OLS estimator a good estimator?

In mathematical statistics estimators are evaluated by considering their


sampling distribution, i.e. their distribution in repeated samples
( y s , X s ), s = 1, , S .

44
Economics 513

All samples are realizations of CLR random experiment

ys = X s + s , s = 1, , S

The sampling distribution of the OLS estimator b is the frequency


distribution of bs , s = 1, , S for S large. We can obtain this distribution by
computer simulation (as in assignment 2).

45
Economics 513

Alternative is to use the CLR assumptions and rules of probability theory to


derive (features) of the sampling distribution of b .

Consider

b = ( X ' X ) 1 X ' y = ( X ' X ) 1 X ' ( X + ) =


= + ( X ' X ) 1 X '

From this we can, using the CLR assumptions, find the conditional average
of b given X (in the sampling distribution)

E (b | X ) = + ( X ' X ) 1 X ' E ( | X ) =

Hence the unconditional average of b is (law of iterated expectations)

E (b) = E X ( E (b | X )) =
In words: under CLR assumptions the OLS estimator is unbiased for .

46
Economics 513

Beside mean consider the variance of b :

Var (b) = E [(b E (b))(b E (b))'] = E [(b )(b )']

We have

b = ( X ' X ) 1 X '

Upon substitution

(
Var (b) = E ( X ' X ) 1 X ' ' X ( X ' X ) 1 )

47
Economics 513

We have

( )
E ( X ' X ) 1 X ' ' X ( X ' X ) 1 | X =
= ( X ' X ) 1 X ' E ( ' | X ) X ( X ' X ) 1 = 2 ( X ' X ) 1

and hence

(
Var (b) = E X ( 2 ( X ' X ) 1 ) = 2 E ( X ' X ) 1 )
Note that 2 ( X ' X ) 1 is an unbiased estimator of this variance.

48
Economics 513

In special case of constant and one regressor we have the unbiased variance
estimator

2
Var (b2 ) = n
( xi x ) 2
i =1
Note that this decreases with 2 and with the variation in x .

49
Economics 513

Optimality of OLS estimator in CLR model

Consider class of estimators for that are linear in y , i.e.

bL = Cy

For the OLS estimator C = ( X ' X ) 1 X ' , i.e. C in general depends on X .

Gauss-Markov Theorem: In CLR model the OLS estimator is the Best


Linear Unbiased (BLU) of , i.e. it has the smallest variance of linear
unbiased estimators.

50

You might also like