Simple Linear Regression

Goldsman ISyE 6739 Linear Regression
REGRESSION
12.1 Simple Linear Regression Model

12.2 Fitting the Regression Line
12.3 Inferences on the Slope Parameter
1
Goldsman ISyE 6739 12.1 Simple Linear Regression Model
Suppose we have a data set with the following paired

observations:
(x1, y1), (x2, y2), . . . , (xn, yn)
Example:
xi = height of person i
yi = weight of person i
Can we make a model expressing yi as a function of

xi ?
2
Estimate yi for fixed xi. Lets model this with the

simple linear regression equation,
yi = 0 + 1xi + i,
where 0 and 1 are unknown constants and the error

terms are usually assumed to be
iid
1, . . . , n N (0, 2)
yi N (0 + 1xi, 2).
3
y = 0 + 1x
with high 2
y = 0 + 1x
with low 2
4
Warning! Look at data before you fit a line to it:

doesnt look very linear!
5
xi yi
Production Electric Usage
($ million) (million kWh)
Jan 4.5 2.5
Feb 3.6 2.3
Mar 4.3 2.5
Apr 5.1 2.8
May 5.6 3.0
Jun 5.0 3.1
Jul 5.3 3.2
Aug 5.8 3.5
Sep 4.7 3.0
Oct 5.6 3.3
Nov 4.9 2.7
Dec 4.2 2.5
6
3.4
yi 3.0
2.6
2.2
3.5 4.0 4.5 5.0 5.5 6.0
xi
Great... but how do you fit the line?
7
Goldsman ISyE 6739 12.2 Fitting the Regression Line
Fit the regression line y = 0 + 1x to the data
(x1, y1), . . . , (xn, yn)
by finding the best match between the line and the

data. The bestchoice of 0, 1 will be chosen to
minimize
n n
))2 2
X X
Q= (yi (0 + 1xi = i.
i=1 i=1
8
This is called the least square fit. Lets solve...

Q P
0 = 2 (yi (0 + 1xi)) = 0
Q P
1 = 2 xi(yi (0 + 1xi)) = 0
P P
yi = n0 + 1 xi
P P
xiyi = 2 xi(yi (0 + 1xi)) = 0
After a little algebra, get

P P P
n xiyi( xi)( yi)
1 =
n x2
P P
i ( xi)2
0 = y 1x, where y 1 Py and x 1 Px .
n i n i
9
Lets introduce some more notation:

x)2 x2 2
P P
Sxx = (xi = i nx
xi)2
P
(
x2
P
= i n
P P
Sxy = (xi x)(yi y) = xiyi nxy
P P
P ( xi)( yi )
= xiyi n
These are called sums of squares.
10
Then, after a little more algebra, we can write

Sxy
1 =
Sxx
Fact: If the is are iid N (0, 2), it can be shown that
0 and 1 are the MLEs for 0 and 1, respectively.
(See text for easy proof).
Anyhow, the fitted regression line is:
y = 0 + 1x.
11
Fix a specific value of the explanatory variable x, the

equation gives a fitted value y|x = 0 + 1x for the
dependent variable y.
12
y
y = 0 + 1x
y|x
x
x xi
For actual data points xi, the fitted values are yi =

0 + 1xi.
observed values : yi = 0 + 1xi + i
fitted values : yi = 0 + 1xi
Lets estimate the error variation 2 by considering

the deviations between yi and yi.
SSE = (yi yi = (yi (0 + 1xi))2
)2
P P
= yi2 0 yi 1 xiyi.
P P P
13
Turns out that 2 SSE

n2 is a good estimator for 2.
P12
Example: Car plant energy usage n = 12, i=1 xi =
x2 yi2 = 98.697,
P P P
58.62, yi = 34.15, i = 291.231,
P
xiyi = 169.253
1 = 0.49883, 0 = 0.4090
fitted regression line is
y = 0.409 + 0.499x y|5.5 = 3.1535
What about something like y|10.0?
14
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1
Sxy
(xi x)2 and
P
1 = Sxx , where Sxx =
P P P
Sxy = (xi x)(yi y) = (xi x)yi y (xi x)
P
= (xi x)yi
15
Since the yis are independent with yi N(0+1xi, 2)

(and the xis are constants), we have
E1 = S1xx ESxy = S1xx (xi x)Eyi = X1xx

P P
(xi x)(0 + 1x
= 1 [ X(x x) + P(x x)x ]
Sxx 0 | i
{z } 1 i i
0
1 P 2 1 X 2 2)
= Sxx (x i x i x) = (
Sxx | x
i {z nx = 1
}
Sxx
1 is an unbiased estimator of 1.
16
Further, since 1 is a linear combination of indepen-

dent normals, 1 is itself normal. We can also derive
1 1 X 2
Var(1) = 2 Var(Sxy ) = 2 (xi x)2Var(yi) = .
Sxx Sxx Sxx
2
Thus, 1 N(1, Sxx )
17
While were at it, we can do the same kind of thing

with the intercept parameter, 0:
0 = y 1x
Thus, E0 = Ey xE1 = 0 + 1x x1 = 0 Similar

to before, since 0 is a linear combination of indepen-
dent normals, it is also normal. Finally,
x2
P
Var(0) = i 2.
nSxx
18
Proof:
Cov(y, 1) = 1 Cov(y, P(x x)y )
Sxx i i
P
(xix
= Sxx )Cov(y, yi)
P
(xix) 2
= Sxx n = 0
Var(0) = Var(y 1x)

= Var(y) + x2Var1 2x Cov(y,
| {z
1)}
0
2
= n + x2 Sxx 2

2
= 2 Sxx nx .
nSxx
P 2
xi 2
Thus, 0 N(0, nSxx ).
19
Back to 1 N(1, 2/Sxx) . . .
1 1
q N(0, 1)
2/Sxx
Turns out:
SSE 22(n2)
(1) 2
= n2 n2 ;
(2) 2 is independent of 1.
20
1
1
/ Sxx N(0, 1)
s t(n 2)
/ 2(n2)
n2
1 1
t(n 2).
/ Sxx
21
t(n 2)
t/2,n2 t/2,n2
22
2-sided Confidence Intervals for 1:
1 = Pr(t/2,n2 1 1
t/2,n2)
/ Sxx
= Pr(1 t/2,n2 1 1 + t/2,n2 )
Sxx Sxx
1-sided CIs for 1:
1 (, 1 + t,n2 )
Sxx
1 (1 t,n2 , )
Sxx
23

Simple Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression

Uploaded by

Copyright:

Available Formats

Goldsman ISyE 6739 Linear Regression

12.1 Simple Linear Regression Model

Suppose we have a data set with the following paired

(x1, y1), (x2, y2), . . . , (xn, yn)

Can we make a model expressing yi as a function of

Estimate yi for fixed xi. Lets model this with the

where 0 and 1 are unknown constants and the error

Warning! Look at data before you fit a line to it:

3.5 4.0 4.5 5.0 5.5 6.0

Great... but how do you fit the line?

Fit the regression line y = 0 + 1x to the data

(x1, y1), . . . , (xn, yn)

by finding the best match between the line and the

This is called the least square fit. Lets solve...

After a little algebra, get

Lets introduce some more notation:

These are called sums of squares.

Then, after a little more algebra, we can write

Anyhow, the fitted regression line is:

Fix a specific value of the explanatory variable x, the

For actual data points xi, the fitted values are yi =

Lets estimate the error variation 2 by considering

Turns out that 2 SSE

Since the yis are independent with yi N(0+1xi, 2)

E1 = S1xx ESxy = S1xx (xi x)Eyi = X1xx

Further, since 1 is a linear combination of indepen-

While were at it, we can do the same kind of thing

Thus, E0 = Ey xE1 = 0 + 1x x1 = 0 Similar

Var(0) = Var(y 1x)

Back to 1 N(1, 2/Sxx) . . .

2-sided Confidence Intervals for 1:

1-sided CIs for 1:

You might also like