2 The Linear Regression Model

2.
The Linear Regression Model

Joshua Sherman
Applied Econometrics
040693
University of Vienna
Regression model
We use regression models to answer the
following types of questions:

If one variable changes in a certain way
(PUNISHMENT), by how much will another
variable (CRIME) change?

Given the value of one variable, can we predict
the corresponding value of another?

Simple linear regression model
We begin by discussing the simple linear regression model, in which
there is only one explanatory variable on the right hand side of the
regression equation:

=
0
+
1

The unknown parameters
0
and
1
are the intercept and slope of
the regression function. We refer to them as population
parameters.

Suppose Y is sales of umbrellas and X is the amount of rainfall in
centimeters in a given city. Then the slope coefficient
1
represents
the change in the average number of umbrella sales given a 1 cm
change in rainfall. The intercept coefficient represents the average
number of umbrella sales in a city with no rainfall.
Simple linear regression model
The number of umbrella sales for all cities in which annual rainfall is a given
amount (for example, 10 cm) will be scattered around the mean.

A probability density function (pdf) will depict how these values are scattered
around the mean.

The mean is just one descriptor of a distribution. Another important descriptor is
the variance.

The variance is defined as the average of the squared differences between the
values of a distribution and the mean. It is essentially a measure of the extent to
which values of a distribution are spread out. Mathematically:

=
(
)
2

Probability density function for Y
Y
X=10 X=25
=
0
+
1

This regression function shows the average number of
umbrellas sold at different levels of rainfall, in centimeters

The conditional variance of Y is =
2
for all values
of X
Probability density function for Y
On the previous slide, the constant variance assumption
implies that at each level of rainfall X we are equally uncertain
about how far values of Y will be from their average value,
=
0
+
1
. Data satisfying this condition are
considered to be homoskedastic. If this assumption is not
satisfied, the data are considered to be heteroskedastic.

The error term
An observation on Y can be decomposed into two parts:

Systematic component:
=
0
+
1

Random component:

= =
0

1

Rearranging we obtain the simple linear regression model:

=
0
+
1
+

The error term
Why do we introduce an error term?

Unavailability of data
Randomness in human behavior
Net influence of a large number of small and independent causes

As e is the random component, we know that = 0 because:

=
0
1
= 0

We also know that the variances of Y and e are identical and equal
to
2
because they only differ by a constant. Thus the pdfs for Y
and e are identical in all respects except for their location.
The error term initial assumptions
Several assumptions are required in order to run
the simple linear regression model. Thus far we
have assumed that:
1. =
0
+
1
+
2. = 0, which is equivalent to stating that
=
0
+
1
.
3. =
2
= (homoskedasticity)

Later we will see why these assumptions (and
others) are important for our purposes

The population vs. the sample
In practice, the econometrician will possess a
sample of Y values corresponding to some
fixed X values rather than data from the entire
population of values. Therefore the
econometrician will never truly know the
values of
0
and
1
.

However, we may estimate these parameters.
We will denote these estimators as
0
and
1
.
Ordinary least squares
So how shall we find
0
and
1
? We need a
method or rule for how to estimate the
population parameters using sample data.

The most widely used rule is the method of least
squares, or ordinary least squares (OLS).
According to this principle, a line is fitted to the
data that renders the sum of the squares of the
vertical distances from each data point to the line
as small as possible.

Therefore the fitted line may be written as:

=
0
+
1

The vertical distances from the fitted line to each
point are the least squares residuals,
. They are
given by:

Mathematically, we want to find
0
and
1
such that the sum
of the squared vertical distances from the data points to the
line is minimized:

min
2
= (
)
2
=(
)
2

If you do not recall how to find the solution for
0
and
1

using partial derivatives, the steps may be found in the course
text.

The least squares estimators
Upon solving this minimization problem we find
that:
1
=
)(
0
=

where
and
are the sample means

of the observations on Y and X.
OLS and the true parameter values
So how are the OLS estimators
0
and
1
related to
0
and
1
?

If assumptions 1 and 2 from earlier hold, then
0
=
0
and

1
=
1
(proof provided in the text).

That is, if we were able to take repeated samples, the expected
value of the estimators
0
and
1
would equal the true parameter
values
0
and
1

When the expected value of any estimator of a parameter equals
the true parameter value, then that estimator is unbiased

Later we will explore how violation of certain assumptions will
cause estimators to be biased

OLS and the true parameter values
So the idea behind OLS is that if we are
dealing with an instance in which certain
assumptions hold, the expected value of the
estimators
0
and
1
will equal the true
parameter values
0
and
1
.

Coefficient of determination
We are interested in a measure that will indicate how good of
a fit our sample regression line is to the data. Let us define
=
, the deviation of a variable from its mean. Using

sample data we note that =
. Then:

=
+

In other words, the amount by which the data deviate from
the mean can be broken into an explained portion (
)
and an unexplained portion, .

Using =
+ , we may square both sides and divide by N to obtain:

=
(
)
2

We may then define the coefficient of determination
2
as the ratio of
explained variation to total variation:

2
=
(
)
2
= 1

Therefore we have:

(
)
2
: Total sum of squares (TSS). A measure of total
variation in Y about the mean.

(
)
2
: Explained sum of squares (ESS). The part of
total variation in Y about the mean that is explained by the
sample regression.

2
: Residual sum of squares (RSS). The part of total
variation in Y about the mean that is not explained by the
sample regression.

It can also be shown that:

2
=
(
)
2
=
()
2
2
=
( )
2
(
2
2
)(
2
2
)

Its limits are 0
2
1. If
for each i, then

2
= 1

How would the regression line appear graphically if
2
=0?
What is the intuition?

One should remain level-headed upon finding the
2
:

It would not be surprising to find an
2
near 1 when working with particular types of time
series data that trend smoothly over time

It would not be surprising to find a relatively low
2
when working with microeconomic data
involving consumer behavior. Variations in individual behavior may be difficult to fully explain.

There are several other measures that are important indicators of how to evaluate
a model:

Signs and magnitudes of the estimates

Precision of the estimates

The models predictive value
What makes a good estimator?
Unbiasedness

Earlier we stated that an estimator is unbiased if its
mean is equal to the true value of the parameter
being estimated

Efficiency

The smaller the variance, the better the chance that
the estimate is close to the actual value of , which is
unknown
What makes a good estimator?
Restricting an estimator to be a linear function
of the observations on the dependent variable
makes our choice of which unbiased estimator
has smallest variance manageable.

An estimator that is linear, unbiased, and that
has minimum variance among all linear
unbiased estimators is called the best linear
unbiased estimator (BLUE).
Assumptions when running OLS
We require several assumptions in order for the OLS
estimators to be BLUE:

=
0
+
1
+

= 0. It is important that the factors not explicitly included
in the model, and therefore incorporated into
, do not
systematically affect the average value of Y. That is, the positive
values cancel out the negative
values so that their average

effect on Y is zero.

=
2
= . This is the assumption of
homoskedasticity. Otherwise, our estimators will not have
minimum variance.

Assumptions when running OLS

= 0. That is, the covariance between

any pair of random errors is zero. Otherwise, our estimators
will not have minimum variance.

The variable X is not random and must take at least two
different values. Without this condition, we cannot run OLS.
Quite simply, if there is no variation in the X variable, then we
will not be able to explain variation in the Y variable.

The values of are normally distributed about their mean and
therefore Y is normally distributed (this is necessary for
hypothesis testing, which we will discuss in a later lecture):
~ 0,
2

Variance
While the econometrician can never be certain that the
estimates obtained are equal or close to the true
parameters of the model (as the true parameters are
unknowable), finding a coefficient with relatively small
variance will certainly give him or her more confidence that
the estimate is good

That is, given two different distributions of
1
with the
same mean, we prefer the distribution with smaller
variance

Variance size will be shown to be crucial when testing
hypotheses
Variance
Given our previous definition of variance, if our assumptions (1-5)
hold it can be shown that the variances of
0
and
1
are:

0
=
2
2

1
=
2

How does the extent to which
is spread out relate to variance?

Variance
In addition, we may be interested in the variance
of the random error term

The variance of the random error
is:

=
2
=
2
=
2

Of course, the random errors are unobservable.
So how shall we proceed?
Variance
Recall that:

We may therefore replace
with
2
=

However, we must modify this formula slightly based on the number of
regression parameters (K) (what is the intuition?). When dealing with only
0
and
1
, K=2. Therefore the formula that we use to ensure an unbiased
estimator is:
2
=
2

Variance
Now that we have found
2
, an unbiased estimator of
2
,
we may write:

0
=
2
2

1
2

The square roots of the estimated variances are the
standard errors of
0
and
1

Covariance
Earlier in the lecture we defined

=
(
)(
= () ()

By extension we may define the covariance between two random variables X and Y as:

, =
= () ()
=

Positive covariance: When X is above (below) its mean, Y is likely to be above (below)
its mean, and vice versa.

Negative covariance: When X is above (below) its mean, Y is likely to be below (above)
its mean, and vice versa

Coefficient of correlation
However, interpreting
is difficult because
may
arbitrarily increase or decrease depending on units of
measurement. We may therefore scale the covariance
by the standard deviations of the variables and define
the coefficient of correlation as:

=
() ()
=

Its limits are 1 1, where = 1 indicates a
perfect linear relationship between X and Y.

Covariance
Covariance between
0
and
1
is also a measure of the association
between the two variables:

0
,
1
=
0
(
0
)
1
(
1
)

It can then be shown that:

0
,
1
=
2
2

Now that we have explored the theoretical background required to
appreciate OLS, lets start working with an actual data set

2 The Linear Regression Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 The Linear Regression Model

Uploaded by

Copyright:

Available Formats

2.

The Linear Regression Model

are the sample means

, the deviation of a variable from its mean. Using

+ , we may square both sides and divide by N to obtain:

for each i, then

values cancel out the negative

values so that their average

= 0. That is, the covariance between

is spread out relate to variance?

You might also like