Professional Documents
Culture Documents
)
2
Probability density function for Y
Y
X=10 X=25
=
0
+
1
This regression function shows the average number of
umbrellas sold at different levels of rainfall, in centimeters
The conditional variance of Y is =
2
for all values
of X
Probability density function for Y
On the previous slide, the constant variance assumption
implies that at each level of rainfall X we are equally uncertain
about how far values of Y will be from their average value,
=
0
+
1
. Data satisfying this condition are
considered to be homoskedastic. If this assumption is not
satisfied, the data are considered to be heteroskedastic.
The error term
An observation on Y can be decomposed into two parts:
Systematic component:
=
0
+
1
Random component:
= =
0
1
Rearranging we obtain the simple linear regression model:
=
0
+
1
+
The error term
Why do we introduce an error term?
Unavailability of data
Randomness in human behavior
Net influence of a large number of small and independent causes
As e is the random component, we know that = 0 because:
=
0
1
= 0
We also know that the variances of Y and e are identical and equal
to
2
because they only differ by a constant. Thus the pdfs for Y
and e are identical in all respects except for their location.
The error term initial assumptions
Several assumptions are required in order to run
the simple linear regression model. Thus far we
have assumed that:
1. =
0
+
1
+
2. = 0, which is equivalent to stating that
=
0
+
1
.
3. =
2
= (homoskedasticity)
Later we will see why these assumptions (and
others) are important for our purposes
The population vs. the sample
In practice, the econometrician will possess a
sample of Y values corresponding to some
fixed X values rather than data from the entire
population of values. Therefore the
econometrician will never truly know the
values of
0
and
1
.
However, we may estimate these parameters.
We will denote these estimators as
0
and
1
.
Ordinary least squares
So how shall we find
0
and
1
? We need a
method or rule for how to estimate the
population parameters using sample data.
The most widely used rule is the method of least
squares, or ordinary least squares (OLS).
According to this principle, a line is fitted to the
data that renders the sum of the squares of the
vertical distances from each data point to the line
as small as possible.
Ordinary least squares
Therefore the fitted line may be written as:
=
0
+
1
The vertical distances from the fitted line to each
point are the least squares residuals,
. They are
given by:
Ordinary least squares
Mathematically, we want to find
0
and
1
such that the sum
of the squared vertical distances from the data points to the
line is minimized:
min
2
= (
)
2
=(
)
2
If you do not recall how to find the solution for
0
and
1
using partial derivatives, the steps may be found in the course
text.
The least squares estimators
Upon solving this minimization problem we find
that:
1
=
)(
0
=
where
and
. Then:
=
+
In other words, the amount by which the data deviate from
the mean can be broken into an explained portion (
)
and an unexplained portion, .
Coefficient of determination
Using =
=
(
)
2
We may then define the coefficient of determination
2
as the ratio of
explained variation to total variation:
2
=
(
)
2
= 1
Coefficient of determination
Therefore we have:
(
)
2
: Total sum of squares (TSS). A measure of total
variation in Y about the mean.
(
)
2
: Explained sum of squares (ESS). The part of
total variation in Y about the mean that is explained by the
sample regression.
2
: Residual sum of squares (RSS). The part of total
variation in Y about the mean that is not explained by the
sample regression.
Coefficient of determination
It can also be shown that:
2
=
(
)
2
=
()
2
2
=
( )
2
(
2
2
)(
2
2
)
Its limits are 0
2
1. If
, do not
systematically affect the average value of Y. That is, the positive
2
1
=
2
How does the extent to which
is:
=
2
=
2
=
2
Of course, the random errors are unobservable.
So how shall we proceed?
Variance
Recall that:
We may therefore replace
with
2
=
However, we must modify this formula slightly based on the number of
regression parameters (K) (what is the intuition?). When dealing with only
0
and
1
, K=2. Therefore the formula that we use to ensure an unbiased
estimator is:
2
=
2
Variance
Now that we have found
2
, an unbiased estimator of
2
,
we may write:
0
=
2
2
1
2
The square roots of the estimated variances are the
standard errors of
0
and
1
Covariance
Earlier in the lecture we defined
=
(
)(
= () ()
By extension we may define the covariance between two random variables X and Y as:
, =
= () ()
=
Positive covariance: When X is above (below) its mean, Y is likely to be above (below)
its mean, and vice versa.
Negative covariance: When X is above (below) its mean, Y is likely to be below (above)
its mean, and vice versa
Coefficient of correlation
However, interpreting
is difficult because
may
arbitrarily increase or decrease depending on units of
measurement. We may therefore scale the covariance
by the standard deviations of the variables and define
the coefficient of correlation as:
=
() ()
=
Its limits are 1 1, where = 1 indicates a
perfect linear relationship between X and Y.
Covariance
Covariance between
0
and
1
is also a measure of the association
between the two variables:
0
,
1
=
0
(
0
)
1
(
1
)
It can then be shown that:
0
,
1
=
2
2
Now that we have explored the theoretical background required to
appreciate OLS, lets start working with an actual data set