ClassicalLinearModel PDF

The Classical Linear Model
Least Squares Estimation

Algebraic Properties
The Classical Linear Model and OLS Estimation

Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
January 19, 2009





Social sciences: non-exact relationships.

Starting point: a model for the non-exact relationship
between y (explained variable) and a set of variables x (the
explanatory variables).

Assumption 1 (linearity):
yi = 1 x1i + 2 x2i + + K xKi + ui ,
i = 1, . . . , n
yi : explained variable for observation i. Its realizations are

observed
xik , k = 1, . . . , K: K explanatory variables. Observed
realizations.
ui is a random variable with unobserved realizations.
Represents the non-exact nature of the relationship.
k , k = 1, . . . , K are the regression coefficients.
Assumption 1: the underlying relationship is linear for all
observations.

The model in matrix notation

Define the following vectors and matrices:
1
y1
2
y2
u=
= .
Y = .
.
.
.
.
K K1
yn n1
X=
x11
x12
..
.
x1n
u1
u2
..
.
un
n1
x21 . . . xK1
x22
xK2
..
..
.
.
xKn nK

Then the linear model can be written as:

y1
x11 x21 xK1
..
..
. =
x22
.
yn
x1n
xKn
1
.. +
.
K
u1
..
.
un
Y = X + u
This is the linear model in matrix form.

Basic Results on Matrices and Random Vectors

Before we proceed, we need to establish some results involving
matrices and vectors.
Let A be a m n matrix. A: n column vectors, or m row
vectors. The column rank of A is defined as the maximum
number of columns linearly dependent. Similarly, the row rank
is the maximum numbers of rows that are linearly dependent.
The row rank is equal to the column rank. So we will talk, in
general, about the rank of a matrix A, and will denote it as
(A)
Let A be a square (m m) matrix. A is non singular if
|A| =
6 0. In such case, there exists a unique non-singular
matrix A1 called the inverse of A such that
AA1 = A1 A = Im .

A a square m m matrix.
If (A) = m |A| =
6 0
If (A) < m |A| = 0
X a n K matrix, with (X) = K (full column rank):
(X) = (X 0 X) = k
This results guarantees the existence of (X 0 X)1 based on
the rank of X.

Let b and a be two K 1 vectors. Then we define

(b0 a)
=a
b
Let b be a K 1 vector and A a symmetric K K matrix.
(b0 Ab)
= 2Ab
b

Let Y be a vector of K random variables:
Y1
Y = ...
Yk
E(Y ) = =
E(Y1 )
E(Y2 )
..
.
E(YK )

V (Y )
E[(Y )(Y )0 ]
E(Y1 1 )2 E(Y1 1 )(Y2 2 )
E(Y2 2 )2
..
.
E(Yk K )2
V (Y1 )
Cov(Y1 , Y2 )
V (Y2 )
...
..
Cov(Y1 YK )
.
V (YK )
Tthe variance of a vector is called its variance-covariance matrix,

an K K matrix
If V (Y ) = and c is an K 1 vector, then
V (c0 Y ) = c0 V (Y )c = c0 c.

Conditional Expectations
Z
E(Y |X = x) =
y fY |X dy
Idea: how the expected value of Y changes when X changes. It is

a function that depends on X. If X is a random variable, then
E(Y |X) es una variable aleatoria.
Properties
E(g(X)|X) = g(X)
Y = a + bX + U , then E(Y |X) = a + bX + E(U |X).
E(Y ) = E [E(Y |X)] (Law of Iterated Expectations).

Assumption 2: Strict Exogeneity

E(ui |X) = 0,
i = 1, 2, . . . , n
In basic courses it is assumed that E(ui ) = 0. Which one is

stronger?

Implications of strict exogeneity:

E(ui ) = 0,
i = 1, . . . , n.
Proof: By the law of iterated expectations and strict
exogeneity:
E(u) = E[E(u|X)] = E(0) = 0
In words: on average, the model is exactly linear.
E(xjk ui ) = 0,
j, i = 1, . . . , n; k = 1, . . . , K
In words: explanatory variables are uncorrelated with the error
terms of all observations.
Proof: as excercise.

Assumption 3: No Multicollinearity
(X) = K, w.p.1
Rank?
All columns of the realizations of X must be linearly
independent.
Careful: this prohibits exact linear relations between columns
of X.
The model admits non-exact relations and/or non-linear
relations.
Examples.

Assumption 4: spherical error variance

Homoskedasticity: E(u2i |X) = 2 > 0,
i = 1, . . . , n
No serial correlation: E(ui uj |X) = 0, i, j = 1, . . . , n., , i 6= j .

Homoskedasticity: by strict exogeneity

V (ui |X) = E[u2i |X] E(ui |X)2 = E(u2i |X)
so the assumption implies constant conditional variance for
the error term.
No serial correlation: also by strict exogeneity
Cov(ui , uj |X) = E(ui uj |X)
so no serial correlation implies that given X all error terms of
all observations are uncorrelated.

Assumption 4 in matrix terms: V (u|X) = E(uu0 |X) = 2 In

Recall that for any random vector Z of n elements:

V (Z) E (Z E(Z))(Z E(Z))0 ,
an n n matrix with typical element vij
vij = Cov(Zi , Zj ).
Homoskedasticity (V (u2i |X) = 2 ) implies that all the
diagonal elements of V (u|X) are equal to 2 .
No sereial correlation implies that all the off-diagonal elements
of V (u|X) are zero.

Summary

1
Linearity: Y = X + u.
Strict exogeneity: E(u|X) = 0
No Multicollinearity: (X) = K, w.p.1.
No heteroskedasticity/ serial correlation: V (u|X) = 2 In .

Details and Interpretations

Fixed Regressors
In basic treatments X is taken as a fixed, non-random matrix.
This is more compatible with experimental sciences. It simplifies
some computations.
The Intercept
Consider the case x1i = 1, i = 1, . . . , n
yi = 1 + 2 x2i + + K xKi + ui ,
i = 1, . . . , n
Then 1 is the intercept of the model. Careful with interpretations.

Interpretations
E(yi |X) = 1 + 2 x2i + + K xKi
If E(yi |X) is differentiable with respect to xki , which is

functionally unrelated to all other variables:
E(yi |X)
= k
xk
Careful: this is a partial derivative. A constant marginal effect.

Dummy explanatory variables: Suppose xki is a binary variable,

taking two values, indicating that the i-th observation belongs (1)
or does not belong to a certain class (0) (male-female, for
example). We cannot use the previous result for an interpretation
(why?)
Compute the following magnitudes
E(yi |X, xki = 1) = 1 + 2 x2i + + k 1 + + K xKi
E(yi |X, xki = 0) = 1 + 2 x2i + + k 0 + + K xKi
Then k = E(yi |X, xki = 1) E(yi |X, xki = 0)

Example: gender differences.

The linear model is not that linear

yi = 1 + 2 x2i + + K xKi + ui
Linear?
Linear in variables.
Linear in parameters.
For estimation purposes, what matters is linearity in parameters.

A small catalog of non-linear models that can be handled with the

classical linear model
2 +u
Quadratic: Yi = 1 + 2 X2i + 3 X2i
i
1
Inverse: Yi = 1 + 2 X2i
+ ui
Interactive: Yi = 1 + 2 X2i + 3 X3i + 4 X2i X3i ui

Logarithmic: ln Yi = 1 + 2 ln X2i + ui
Semilogarithmic: ln Yi = 1 + 2 X2i + ui
We will explore interpretations and examples in the homework.

Goal: recover based on a sample yi , xi ,

Let be any estimator of .
i = 1, . . . , n.
Define Y X (our prediction of Y ).

Define e = Y Y (estimation errors).
Note that if n > k we cannot produce an estimator by forcing
e = 0. Why?.
We need a criterion to derive a sensible and feasible estimator.

Consider the following penalty function:

SSR()
n
X
0 (y X )
e2i = e0 e = (y X )
i=1
is the aggregation of squared errors if we choose as an

SSR()
estimator.
The least squares estimator will be:
= argmin SSR()

Result: = (X 0 X)1 X 0 Y
= e0 e = (Y X )
0 (Y X )
SSR()
= Y 0 Y 0 X 0 Y Y 0 X + 0 X 0 X
= Y 0 Y 20 X 0 Y + 0 X 0 X
In the second line, note that 0 X 0 Y is a scalar, and hence it is
that is how we obtain the
trivially equal to its transpose, Y 0 X ,
result in the third line.
SSR can be easily shown to be a strictly convex, differentiable
so first order conditions for a stationary point are
function of ,
sufficient for a global minimum.

First order conditions are:
e0 e
=0

Using the derivation rules introduced before:
e0 e
= 2X 0 Y + 2X 0 X = 0

which is a system of K linear equations with K unknowns ().

Solving for gives the desired solution:
= (X 0 X)1 X 0 Y

Some comments and details

Existence and uniqueness: guaranteed by the rank assumption
(X) = K.
Second order conditions: X 0 X is positive definite, also by the
rank assumption.
The role of the assumptions: which of the assumptions have
been used to derive the OLS estimator and guarantee its
existence and uniqueness?
e = Y Y (the OLS residuals).
Notation: Y X ,

Recall the FOCs from the least squares problem:

X 0 Y + X 0 X = 0
= 0
X 0 (Y X )
X 0 e = 0
These are the normal equations
The algebraic properties are those that can be derived from the
normal equations.

Sum of errors: if the model has an intercept, one of the

columns of X is a vector of ones, so X 0 e = 0 implies:
n
X
ei = 0
i=1
Orthogonality: X 0 e = 0. Implying that OLS residuals are

uncorrelated to all explanatory variables.
Linearity: is a linear function of Y , that is, there exists a
K n matrix A that depends solely on X, with (A) = K
such that = AY .
Proof: trivial. Set A = (X 0 X)1 X 0

Goodness of Fit
First check some easy results, when there is an intercept in the
model:
Yi = Yi
Start with Yi = Yi + ei . Take averages in both sides,
ei = 0 by the previous property.
Pn
i=1 (Yi
P
P
Y )2 = ni=1 (Yi Y )2 + ni=1 e2i
Start with (Yi Y ) = (Yi Y ) + ei . Take squares. Then

P
P
show
ei (Yi Y ) = e0 X Y
ei = 0 by previous
properties.

(Yi Y )2 =
X
X
(Yi Y )2 +
e2i
The total variation in Y around its mean can be decomposed in

two additive terms: one corresponding to the model and the
second one to the estimation errors.
If all errors are zero, then all the varation is due to the model: the
fitted linear model explaines all the variation.
This suggest the following measure of goodness of fit:
Pn
Pn 2
2
e
2
i=1 (Yi Y )
P
P
R n
= 1 n i=1 i 2
2
i=1 (Yi Y )
i=1 (Yi Y )
This is the (centered) coeficient of determination: the proportion
of the total variability explained by the fitted linear model.

Comments and properties (as homework)

0 R2 1.
maximizes R2 .
R2 is non-decreasing in the number of explanatory variables,
K.
Use and abuse of R2 .

In some cases we will use the uncentered R2 :

Ru2
P 2
P 2
e
Yi
= P 2 = 1 P i2
Yi
Yi
The last equality holds since:
Yi2 = Y 0 Y
(Y + e)0 (Y + e)
Y 0 Y + e0 e + 2Y 0 e
Y 0 Y + e0 e + 20 X 0 e
Y 0 Y + e0 e
=
=
by the orthogonality property.

Estimation of 2
We will need an estimator for 2 . We will propose:

Pn 2
e
e0 e
2
S = i=1 i =
nK
nK
Later on we will establish its properties with more detail.

ClassicalLinearModel PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ClassicalLinearModel PDF

Uploaded by

Copyright:

Available Formats

The Classical Linear Model

Least Squares Estimation

The Classical Linear Model and OLS Estimation

January 19, 2009

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Social sciences: non-exact relationships.

The Classical Linear Model and OLS Estimation

The Classical Linear Model

yi = 1 x1i + 2 x2i + + K xKi + ui ,

yi : explained variable for observation i. Its realizations are

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The model in matrix notation

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Then the linear model can be written as:

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Basic Results on Matrices and Random Vectors

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Let b and a be two K 1 vectors. Then we define

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Let Y be a vector of K random variables:

The Classical Linear Model and OLS Estimation

The Classical Linear Model

E(Y1 1 )2 E(Y1 1 )(Y2 2 )

Tthe variance of a vector is called its variance-covariance matrix,

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Idea: how the expected value of Y changes when X changes. It is

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Assumption 2: Strict Exogeneity

In basic courses it is assumed that E(ui ) = 0. Which one is

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Implications of strict exogeneity:

The Classical Linear Model and OLS Estimation

The Classical Linear Model

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Assumption 4: spherical error variance

No serial correlation: E(ui uj |X) = 0, i, j = 1, . . . , n., , i 6= j .

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Homoskedasticity: by strict exogeneity

The Classical Linear Model and OLS Estimation

The Classical Linear Model

Assumption 4 in matrix terms: V (u|X) = E(uu0 |X) = 2 In