You are on page 1of 23

Econ 8740

Note #3
The Simple Regression Model
Shif Gurmu
1. Denition of the Model
The simple regression model can be used to study the
relationship between two variables, say y and x. In par-
ticular, we are interested in explaining y in terms of x or
in studying how y varies with changes in x. For example,
if y is hourly wage rate and x is education, we might be
interested in how changes in years of schooling aects
hourly wage rate. Another example is how crime rate in
a city (y) varies with changes in the number of police
ocers (x). Although the model is simple and has limi-
tations in empirical analysis, learning how to interpret the
simple regression model is a good practice for studying
multiple regression later on.
There are three important issues in specifying a simple
regression model that will explain y in terms of x.
1. Since there is never an exact relationship between
two variables, how do we allow for other factors to
aect y?
2. What is the functional relationship between y and x
(e.g., between y = wage and x = years of school-
ing)?
3. If desired, how are we capturing the ceteris paribus
relationship between y and x?
These issues can be resolved by specifying an equa-
tion relating y to x. A linear equation relating y to x
takes the form
y =
0
+
1
x + , (1)
where the various terms are dened as follows: y is the
dependent variable, the explained variable, the response
variable, the predicted variable, or the regressand; x is
the independent variable, explanatory variable, the con-
trol variable, the predictor variable, the regressor or co-
variate. Here
0
and
1
are unknown parameters;
0
is
the intercept parameter or the constant parameter and

1
is the slope parameter. Finally, the variable , called
the error term or the the disturbance term in the relation-
ship, represents factors other than x that aect y. Both
x, which is observed, and the unobserved factor aect
y.
Table 1. Some Examples
Dependent Variable (y) Independent Variable (x)
Salary Experience
Hourly wage Education
crime rate in cities Number of police ocers
Expenditure on clothing Disposable income
Corn output amount of fertilizer
If other factors in are held constant so that change
in is zero ( = 0), then from equation (1), x has a
linear eect on y:
y =
1
x. (2)
That is, holding the other factors in xed, the change in
y is simply the slope
1
times change in x. For example, a
simple linear regression model relating a persons hourly
wage in dollars (wage) to observed years of education
(educ) and other unobserved factors () is
wage =
0
+
1
educ + . (3)
Here
1
measures the change in hourly wage given an-
other year of education, holding all other factors xed.
In empirical applications, we are interested in the ceteris
paribus eect of x on y. To achieve this, we need to make
assumptions about the relationship between the random
variables x and . We cannot estimate the causal eect
of x on y without taking the relationship between x and
into account. Just assuming that the factors in are
held constant, will not be helpful in estimating the causal
eect of x on y.
One simple assumption is that the average value of the
error term in the population is zero:
E() = 0. (4)
However, this zero mean assumption does not say any-
thing about how x and are related Since x and are
random variables, we can dene the conditional distribu-
tion of given x, and hence get the conditional mean of
given x. The crucial assumption of the simple linear
regression model is the zero conditional mean assump-
tion:
E(|x) = 0. (5)
This says that knowing something about x does not give
us any information about . Assumption (5) follows from
E(|x) = E(), which says that is mean independent
of x, and the zero mean assumption (4).
The zero conditional mean assumption (5) implies that
E(y|x) =
0
+
1
x. (6)
This shows that the population regression function is a
linear function of x. For any given value of x, the distrib-
ution of y is centered about E(y|x). The linearity means
that a one unit increase in x changes the expected value
of y by the amount
1
. That is:
E(y|x)
x
=
1
. (7)
Note that the change in the conditional mean of y, given
x, is simply
1
times the change in x:
E(y|x) =
1
x (8)
for the population.
2. Estimation: Ordinary Least Squares Estimates
The basic idea of regression is to estimate the pop-
ulation parameters using a sample. Let {(y
i
, x
i
) : i =
1, 2, ..., n} denote a random sample of size n from the
population. Since the data come from (1), the regression
model for each i is
y
i
=
0
+
1
x
i
+
i
. (9)
We need to estimate the unknown regression parameters,

0
and
1
.There are dierent ways of estimating the in-
tercept and slope parameters, including the method of
moments and ordinary least squares (OLS) approach.
The focus here is on the OLS approach; see Wooldridge
(2013), page 28 in section 2.2, for explanation of the
method of moments approach.
In the OLS approach, we choose the estimates

0
and

1
to minimize the sum of squared residuals. Dene the
predicted regression line for the i-th observation as:
y
i
=

0
+

1
x
i
. (10)
Then the residual for the i-th observation is:

i
= y
i
y
i
, or (11)

i
= y
i

1
x
i
Figure 1 of Problems for Class Discussion #1 gives the
scatter diagram of salary against experience. In the con-
text of regression of y on x, the tted values and residuals
are depicted in Figure 2.4 in your textbook.
In the method of method of ordinary least squares, we
choose

0
and

1
simultaneously to make
n
X
i=1

y
i

1
x
i

2
(12)
as small as possible. This leads to two equations in two
unknowns:
n
X
i=1

y
i

1
x
i

= 0 (13)
n
X
i=1
x
i

y
i

1
x
i

= 0
Rearranging, we get:
n

0
+

1
n
X
i=1
x
i
=
n
X
i=1
y
i
(14)

0
n
X
i=1
x
i
+

1
n
X
i=1
x
2
i
=
n
X
i=1
x
i
y
i
These are called normal equations or rst order con-
ditions.
Solving the normal equations, we obtain the OLS esti-
mates:

1
=
P
n
i=1
(x
i
x) (y
i
y)
P
n
i=1
(x
i
x)
2
(15)
and

0
= y

1
x. (16)
The ensuing predicted regression line is
y
i
=

0
+

1
x
i
.
It follows that
y
i
x
i
=

1
, (17)
showing that the slope estimate

1
is the amount by
which y
i
changes when x
i
increases by one unit. For
any change in x
i
, the predicted change in y
i
is given by
y
i
=

1
x
i
. (18)
Let us consider two examples using output from Stata.
Example 1 - From Problem for Class Discussion #1:
In this illustration, we examine the relationship between
experience and salary of 32 economists from the Univer-
sity of Michigan using data from 1983. Let salary =
salary in thousands of dollars (y) and exper = years of
experience, dened as years since receiving Ph.D. (x). A
model relating an individual is salary to observed years
of experience and other unobserved factors is
salary
i
=
0
+
1
exper
i
+
i
. (19)
Using Statas regress command, the estimated regression
line for salary is
\
salary = 39.314831 + 0.439907exper. (20)
How do we interpret the equation? First, if experience is
zero, then the predicted salary is the intercept, $39.31483
in thousands or $39, 314.83. The predicted change in
salary is given by

\
salary = 0.439907 exper
so that

\
salary
exper
= 0.439907.
This says that if experience increases by one year, then
salary is predicted to increase by about $439.91 or 0.43991
(1000s of dollars).
What is the predicted salary for a person with 10 years
of experience? Using (20), the predicted salary is
\
salary = 39.314831 + 0.439907 10
= $43.713901
or $43, 713.90 in 1983 dollars.
Example 2 - From Problem for Class Discussion #2
This illustration is based on ZIP code-level data on prices
for various items at fast-food restaurants, along with
characteristics of the zip code population, in New Jer-
sey and Pennsylvania. The purpose is to see whether
fast-food restaurants charge higher prices in areas with
large concentration of blacks. In specication 2.1 (see
the do le), we estimate a simple linear model:
psoda
i
=
0
+
1
prpblck
i
+
i
(21)
where psoda
i
is price of soda and prpblck is the propor-
tion black in the i-th ZIP code.
The estimated soda price equation is
\
psoda
i
= 1.037399 + 0.0649269prpblck
i
.
so that the predicted change in the price of soda is:

\
psoda
i
= 0.0649269 prpblck
i
.
This means that If, say, prpblck increases by 0.10 (ten
percentage points), the price of soda is estimated to in-
crease by about 0.0065 dollars (0.0649*0.10), or about
0.7 cents.
3. Algebraic Properties of OLS and Goodness-of-Fit
I Algebraic Properties
The following are three most important algebraic proper-
ties of OLS estimates and their associated statistics.
1. The point ( x, y) is always on the OLS regression
line. That is, if we plug in x for x
i
in equation (10),
the predicted value is y.
2. The sum of the residuals, and therefore the sample
average of the residuals, is zero. That is
n
X
i=1

i
= 0. (22)
This follows from the rst line of the rst order con-
ditions given in simultaneous equation system (13).
3. The sample covariance between the regressors and
the OLS residuals is zero:
n
X
i=1
x
i

i
= 0. (23)
This follows from the second line of the rst or-
der conditions given in simultaneous equation system
(13); namely
n
X
i=1
x
i

y
i

1
x
i

= 0.
I Decomposition in Total Sum of Squares in the
Dependent Variable, y
We can view OLS as decomposing each y
i
into two parts,
a tted value and a residual:
y
i
= y
i
+
i
. (24)
Since the average of the residual is zero, = 0, the
average of the tted value y
i
is the same as the average
of y
i
. That is, y = y.
Next, dene three sum of squares terms associated with
regression. First, the total sum of squares (SST) is a
measure of the total variation in the dependent variable
y. This is given as
SST =
X
i
(y
i
y)
2
. (25)
Observe that SST/n 1 is the sample variance of y,
s
2
y
=
1
n1
P
i
(y
i
y)
2
. Second, the explained sum of
squares (SSE) is
SSE =
X
i
( y
i
y)
2
, (26)
where we use y = y. SSE measures the sample variation
in y. Finally, the residual sum of squares (SSR) mea-
sures the sample variation in the residual , and is given
as
SSR =
X
i

2
i
. (27)
The total sum of squares in y can be expressed as the
sum of the explained variation and the residual sum of
squares (or the unexplained variation):
SST = SSE + SSR. (28)
This follows by squaring (24) or equivalently
(y
i
y) = ( y
i
y) +
i
and then summing and simplifying, to get
X
i
(y
i
y)
2
=
X
i
( y
i
y)
2
+
X
i

2
i
.
The cross term 2
P
i
( y
i
y)
i
can be shown to be equal
to zero.
Example 3: In the output from step 7 of the Problems
for Class Discussion #1,we have SSE = 425.201684, SSR
= 1880.12047 and SST =2305.32216 so that 2305.32216
= 425.201684 + 1880.12047.
I Goodness-of-Fit
We want to have a measure of how well the independent
variable, x, explains the dependent variable y. To do this,
divide both sides of equation (28) by SST to get
1 =
SSE
SST
+
SSR
SST
.
Consequently, the R-squared of the regression or the
coecient of determination is dened as the ratio of
the explained variation compared to the total variation:
R
2
=
SSE
SST
(29)
= 1
SSR
SST
.
R
2
is the proportion of the sample variation in y that is
explained by x. The value of R
2
is always between 0 and
1. An R-squared close to zero indicates a poor t of
the regression line. OLS gives a perfect t to the data
if all points lie on the same straight line, in which case
R
2
= 1. It can be shown that R
2
is the square of the
sample correlation between y and and y.
Example 4: In Problems for Class Discussion #1,
R
2
= 0.1844 from Stata regression output. This implies
that 18.44% of the total variation in salary is explained
by experience. It means that about 81.56% of the salary
variations for the economists at Michigan remains un-
explained. This is not surprising since other important
determinants of salary are not included in the regression.
4. Nonlinear Models - Incorporating
Nonlinearities in Simple Regression
I Introduction
So far, we have assumed a linear relationship between
the dependent and independent variables. The linearity
assumption may be restrictive for some economic appli-
cations. For example, we can expect nonlinear relation-
ships in the following cases: (a) output and labor input,
(b) salary and experience, (c) number of doctor visits and
age. Why? Here, we focus on nonlinear models that
can be transformed to linear models by suitably dening
the dependent and independent variables. For now, we
cover transformed models involving logarithms, where y
and/or x appear in logarithmic form.
The simple linear regression model we considered earlier,
namely,
y =
0
+
1
x + ,
is said to be in a level-level form. As we saw earlier, a
level-level model implies that a unit increase in x leads to
a change in expected value of y equal to
1
, a constant
amount irrespective of the value of x.
We consider three functional forms where the dependent
variable and/or the explanatory variable appear in natural
logarithmic form
1. Double-Log (Log-Log or Constant Elasticity) Model
- In the constant elasticity model, both y and x are
in logs. The model is
log y =
0
+
1
log x + , (30)
where log stands for a natural logarithm and
1
is
the elasticity of y with respect to x. That is, if
= 0, then
%y
%x
=
1
. (31)
The unknown parameters can be estimated from a
simple linear regression of log y on log x, including
the intercept term.
The double-log model is derived from a Cobb-Douglas
type nonlinear equation relating y to x:
y =

0
x

1
e

. (32)
Taking log of both sides of (32) gives the double-log
model (30), where
0
= log

0
.
2. Log-level (Semi-log 1) - The dependent variable is
in log while the explanatory variable is in level form.
The Log-level mode takes the form
log y =
0
+
1
x + . (33)
The intercept and the slope parameters can be ob-
tained from OLS regression of log y on constant term
and x. In this model, a unit increase in x changes y
by a constant percentage. That is, If change in is
zero, then
%y
x
= 100
1
. (34)
The 100
2
is called the semi-elasticity of y with re-
spect to x.
3. Level-log (Semi-log 2) - Here x is logged, but y is
not. The level-log model is
y =
0
+
1
log x + . (35)
The unknown parameters are obtained from the OLS
regression of y on constant and log x. Assuming
= 0, the the slope parameter over 100,
1
/100,
is the change in y as a result of a 1% increase in x.
That is:
y
%x
=
1
/100. (36)
Table 2 gives summary of the functional forms considered
above.
Table 2. Summary of Functional Forms Involving Logarithms
Model
Depen.
Var.
Indep.
Var.
Interpretation
of
1
Alternative
Interp. of
1
level-level y x y =
1
x
y
x
=
1
log-log log y log x %y =
1
%x
%y
%x
=
1
log-level log y x %y = 100
1
x
%y
x
= 100
1
level-log y log x y = (
1
/100)%x
y
%x
=
1
/100
Example 5 - From Problem for Class Discussion #2
In specication 3.1, we estimate a semi-elasticity model
of log of price soda on proportion black:
lpsoda =
0
+
1
prpblck +
to get the predicted equation:
\
lpsoda = 0.0331 + 0.0625prpblck.
This means that, if proportion black increases by 0.20, the
price of soda is estimated to increase by 1.25% (0.0625*100%
times 0.20), which is a log-level form interpretation.
Example 6 - From Problem for Class Discussion #2
Specication 3.3 is based on the constant elasticity model,
where the coecient estimates are obtained from the
OLS regression of log of price of soda on log income.
The estimated equation is
\
lpsoda = 0.3614 + 0.0375lincome.
The slope estimate is

1
= 0.0375. If median income
in the ZIP code increases by 10%, then price of soda is
predicted to increase by 0.375% (0.0375 10).
5. The Means and Variances of the OLS Estimators
We now consider the statistical properties of the OLS
estimators, where we view

0
and

1
as estimators for
the parameters
0
and
1
in model (1). In doing so,
we need to formally state the assumptions of the simple
linear regression model (SLR) along the way. Following
Wooldridge (2013), we number the assumptions using
the prex "SLR".
5.1 Unbiasedness of OLS
We start with the assumption that denes the population
model using linear functional form.
Assumption SLR.1 (Linear in Parameters)
The model relating the dependent variable y to the ex-
planatory variable x and the disturbance is
y =
0
+
1
x +, (37)
where
0
and
1
are the population intercept and slope
parameters, respectively.
Since we are interested in estimating the unknown para-
meters
0
and
1
using data on y and x, we assume that
our data are obtained as a random sample.
Assumption SLR.2 (Random Sampling)
We have a random sample of size n, {(x
i
, y
i
) : i =
1, 2, ..., n}, from the population.
The model can now be written in terms of the random
sample:
y
i
=
0
+
1
x
i
+
i
(38)
i = 1, 2, ..., n,
where subscript i refers to observation i, such as person,
country, city and so on.
The next assumption requires that we have sample vari-
ation in the independent variable x.
Assumption SLR.3 (Sample Variation in the Explanatory Variab
The sample observations on x, {x
i
, i = 1, ..., n}, are
not all the same values. That is,
n
X
i=1
(x
i
x)
2
> 0.
If there is no variability in x, we cannot estimate the
model. In any case, there is no point in doing regression
if the explanatory variable is constant for all observations.
To establish the unbiasedness of the OLS estimators, we
need the zero conditional mean assumption we considered
earlier (see around equation 5).
Assumption SLR.4 (Zero Conditional Mean)
The error term has an expected value of zero given any
value of the independent variable. That is
E(|x) = 0.
Assumption SLR.4 implies that
E(y|x) =
0
+
1
x,
which is the population regression line. For a random
sample, assumption SLR.4 implies that
E(
i
|x
i
) = 0
and
E(y
i
|x
i
) =
0
+
1
x
i
.
Now we turn to unbiasedness property of the OLS esti-
mators:

1
=
P
n
i=1
(x
i
x) (y
i
y)
P
n
i=1
(x
i
x)
2
=
Cov(x, y)
Var(x)
which is the slope estimator of
1
, and

0
= y

1
x,
the intercept estimator.
F Main Result on Unbiasedness of OLS
It can be shown that, under assumptions SLR.1 through
SLR.4,

0
is an unbiased estimator of
0
and

1
is an
unbiased estimator of
1
. In other words,
E(

0
) =
0
, and E(

1
) =
1
. (39)
The unbiasedness property for

1
, for example, says that,
although

1
could be dierent from the true parameter
value
1
, the expected value (mean) of

1
is equal to the
true parameter
1
. We say that the distribution of

1
is
centered around
1
. Unbiasedness generally fails if any
of the four assumptions fail. For example if the data we
use are nonrandom (i.e., assumption SLR.2 fails), then
OLS estimators are biased.
5.2 Variances of the OLS Estimators
In addition to knowing the unbiasedness of

1
, it is useful
to know how far we can expect

1
to be away from
1
on
average. One measure of the spread of

1
about its mean

1
is called the variance of

1
. The square root of the
variance of an estimator is called the standard deviation or
the standard error of the estimator. We need one more
assumption, called "constant variance" assumption, to
obtain the variances of the OLS estimators.
Assumption SLR.5 (Homoskedasticity)
The error term has the same variance given any value
of the explanatory variable. That is:
Var(|x) =
2
.
The assumption of the constant variance of implies that
the conditional variance of y is constant:
Var(y|x) =
2
. (40)
If Var(|x) is not constant, say Var(|x) varies with x,
then the error term is said to exhibit heteroskedasticity or
nonconstant variance. Since Var(|x) =Var(y|x), het-
eroskedasticity is present whenever Var(y|x) is a function
of x.
Example 7.- Suppose we want to get an unbiased esti-
mator of the ceteris paribus eect of household income
on saving using the model:
saving =
0
+
1
income +.
According to SLR.4, we must assume that E(|income) =
0, which in turn implies that E(saving|income) =
0
+

1
income. On the other hand, if we impose the ho-
moskedasticity assumption Var(|income) =
2
, which
is the same as assuming Var(saving|income) =
2
. Thus,
while average saving is allowed to change with income,
the variability of saving about its mean is assumed to be
constant across all income levels.
Question - In Example 7, is the assumption of homoskedas-
ticity realistic?
F Main Result on Variances of the OLS Estimators
Under assumptions SLR.1 through SLR.5, we have the
following sampling variances and standard errors (SE) of
the OLS estimators.
Variance and Standard Deviation of

1
Var(

1
) =

2
P
n
i=1
(x
i
x)
2
(41)
SD(

1
) = /

n
X
i=1
(x
i
x)
2

1/2
. (42)
Variance and Standard Deviation of

0
Var(

0
) =
2

1
n
+
x
2
P
n
i=1
(x
i
x)
2
!
(43)
SD(

0
) =

1
n
+
x
2
P
n
i=1
(x
i
x)
2
!
1/2
. (44)
Covariance between

0
and

1
is
cov(

0
,

1
) =
x
2
P
n
i=1
(x
i
x)
2
. (45)
5.3 Estimating the Error Variance
The variances and standard errors given in equations (41)
through (45) are unknown since the variance of the error
term
2
is unknown. In empirical implementation, we
need to estimate
2
, and subsequently nd the estimators
of the variances of the OLS estimators.
It can be shown that an unbiased estimator of
2
is

2
=
SSR
n 2
=
P
n
i=1

2
i
n 2
(46)
MSR=
SSR
n2
is called mean squared residual/error. Its
square root, =

SSR
n2

1/2
is called the standard error
of the regression (or root MSR).
Given the estimator of the error variance, we can esti-
mate the variances and standard deviations of the OLS
estimators. For example, the estimator of the standard
deviation of the slope estimator

1
is given by
SE(

1
) = /

n
X
i=1
(x
i
x)
2

1/2
. (47)
where =
q
P
n
i=1

2
i
/(n 2). Similarly,
SE(

0
) =

1
n
+
x
2
P
n
i=1
(x
i
x)
2
!
1/2
. (48)
Standard errors of regression estimates play an impor-
tant role in constructing condence intervals and testing
hypotheses.
Example 8
In order to display the standard errors of the coecient
estimates, we will reconsider examples 1, 2 and 5 that
were based on problems for class discussions 1 and 2.
The results are summarized as follows.
Stata output from step 7 of the Problems for Class
Discussion #1 - The estimated salary equation is:
\
salary = 39.3148
(3.3994)
+ 0.4399
(0.1689)
exper (49)
n = 32, R
2
= 0.1844,
where standard errors are enclosed within parenthe-
ses. Standard errors in Stata results are given in
the table of coecient estimates under column title:
"Std. Err.".
Stata output from step 2.1 of the Problems for Class
Discussion #2 - Estimation results for soda price:
\
psoda = 1.0374
(0.0052)
+ 0.0649
(0.0240)
prpblck (50)
n = 401, R
2
= 0.0181,
where standard errors are again enclosed within paren-
theses.
Stata output from step 3.1 of Problems for Class
Discussion #2 - The results relating log of price of
soda to fraction black in the ZIP code are:
\
lpsoda = 0.0331
(0.0050)
+ 0.0625
(0.0229)
prpblck (51)
n = 401, R
2
= 0.01831.
Later on, we will discuss the use of standard errors in
inference about the regression parameters.
6. Eects of Scaling and Unit of Measurement on
OLS statistics
In Problem for Class Discussion #1, salary was measured
in thousands of dollars and education is in years of school-
ing. The predicted equation (six decimal points) is:
\
salary = 39.31483
(3.399436)
+ 0.439907
(0.1688867)
exper (52)
n = 32, R
2
= 0.1844.
Suppose we choose to measure salary in dollars. How
does the OLS estimates and their standard errors change
when salary is measured in dollars? We would like to
know how the regression statistics change without run-
ning another regression.
To answer the above question, let salarydol be salary
measured in dollars so that "salary in dollars" equals
"salary in 1000s of dollars" times 1000. That is, salary-
dol = salary*1000 or salary = salarydol/1000. It is not
dicult to see that the estimated salary in dollars equa-
tion is
\
salarydol = 39314.83
(3399.436)
+ 439.907
(168.8867)
exper (53)
n = 32, R
2
= 0.1844.
We obtain the intercept and slope in (53) by multiplying
the intercept and slope in (52) by 1000.The standard
errors are also multiplied by 1000, but the coecient of
determination does not change. Note that change in the
unit of measure of the dependent variable does not aect
the estimated eect of experience on salary. Whether
salary is measured in dollars or in thousands of dollars,
for an additional year of experience, salary is predicted to
increase by $439.91 or by $0.43991 in 1000s.
Generally, if the dependent variable is multiplied by a
constant c, then the OLS intercept and slope estimates
are also multiplied by c. This gives the new estimates
of the intercept and slope parameters. The standard
errors are also multiplied by the constant c, but R
2
is
not aected by scaling the dependent variable.
It is also useful to know what happens to OLS statis-
tics when the unit measure for the explanatory variable
x changes. We can use the price of soda example
from Problems for Class Discussion #2 to see what hap-
pens when the unit of measurement of the explanatory
changes. Recall that the predicted soda price is given as
\
psoda = 1.0374
(0.0052)
+ 0.0649
(0.0240)
prpblck (54)
n = 401, R
2
= 0.0181.
Suppose now the proportion black in the ZIP code is
measured in percentage. Let pctblck be percent black
in the ZIP code so that pctblck = prpblck*100. For the
regression of psoda on pctblck, we get .
\
psoda = 1.0374
(0.0052)
+ 0.0000649
(0.0000240)
pctblck (55)
n = 401, R
2
= 0.0181
In going from (54), where the independent variable is
measured as proportion, to (55) - independent variable in
percentage, the new slope term is obtained by dividing the
slope term associated with prpblck by 100. The standard
error associated with the slope term is also divided by
100. The intercept term is not aected by scaling the
explanatory variable. R-squared is not aected either.
Generally, if the independent variable is multiplied or di-
vided by some nonzero constant c, then the OLS slope
coecient is divided or multiplied by c, respectively. If
the independent variable is multiplied or divided by some
nonzero constant c, the standard error of the slope es-
timate is also divided or multiplied by c, respectively.
Changing the unit of measurement of only the indepen-
dent variable does not aect the intercept. The coe-
cient of determination is also not aected by scaling of
the independent variable.
7. Regression through the Origin
In some rare cases, we want to estimate a linear regression
model without an intercept term. Regression without in-
tercept means that, when x = 0, the expected value of y
is zero; the regression line passes through the origin. For
example if income (x) is zero, then income tax revenue
(y) must also be zero. We will also see later on exam-
ples where a model with an intercept is transformed to
another model without an intercept.
A linear regression model without an intercept (
0
= 0)
can be specied as
y =
1
x +, (56)
where
1
is the slope parameter associated with a model
without an intercept. We can use the method of ordinary
least squares to obtain the estimator (say

1
) of
1
. The
OLS method minimizes the sum of squared residuals:
n
X
i=1

y
i

1
x
i

2
. (57)
This gives the rst order condition:
n
X
i=1

y
i

1
x
i

x
i
= 0. (58)
Solving this equation for

1
gives the OLS estimator:

1
=
P
n
i=1
y
i
x
i
P
n
i=1
x
2
i
(59)
pertaining to a regression through the origin.
In Stata, we use the "noconstant" option to request esti-
mation of a linear model without an intercept term. For
example, Stata command:
reg y x,noconstant
suppresses the constant term (intercept) in the model.

You might also like