You are on page 1of 24

Goldsman ISyE 6739 Linear Regression

REGRESSION

12.1 Simple Linear Regression Model


12.2 Fitting the Regression Line
12.3 Inferences on the Slope Parameter

1
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

Suppose we have a data set with the following paired


observations:

(x1, y1), (x2, y2), . . . , (xn, yn)

Example:
xi = height of person i
yi = weight of person i

Can we make a model expressing yi as a function of


xi ?

2
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

Estimate yi for fixed xi. Lets model this with the


simple linear regression equation,

yi = 0 + 1xi + i,

where 0 and 1 are unknown constants and the error


terms are usually assumed to be
iid
1, . . . , n N (0, 2)

yi N (0 + 1xi, 2).

3
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

y = 0 + 1x
with high 2

y = 0 + 1x
with low 2

4
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

Warning! Look at data before you fit a line to it:


doesnt look very linear!

5
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

xi yi
Production Electric Usage
($ million) (million kWh)
Jan 4.5 2.5
Feb 3.6 2.3
Mar 4.3 2.5
Apr 5.1 2.8
May 5.6 3.0
Jun 5.0 3.1
Jul 5.3 3.2
Aug 5.8 3.5
Sep 4.7 3.0
Oct 5.6 3.3
Nov 4.9 2.7
Dec 4.2 2.5

6
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

3.4

yi 3.0

2.6

2.2

3.5 4.0 4.5 5.0 5.5 6.0

xi

Great... but how do you fit the line?

7
Goldsman ISyE 6739 12.2 Fitting the Regression Line

Fit the regression line y = 0 + 1x to the data

(x1, y1), . . . , (xn, yn)

by finding the best match between the line and the


data. The bestchoice of 0, 1 will be chosen to
minimize
n n
))2 2
X X
Q= (yi (0 + 1xi = i.
i=1 i=1

8
Goldsman ISyE 6739 12.2 Fitting the Regression Line

This is called the least square fit. Lets solve...


Q P
0 = 2 (yi (0 + 1xi)) = 0
Q P
1 = 2 xi(yi (0 + 1xi)) = 0

P P
yi = n0 + 1 xi
P P
xiyi = 2 xi(yi (0 + 1xi)) = 0

After a little algebra, get


P P P
n xiyi( xi)( yi)
1 =
n x2
P P
i ( xi)2
0 = y 1x, where y 1 Py and x 1 Px .
n i n i

9
Goldsman ISyE 6739 12.2 Fitting the Regression Line

Lets introduce some more notation:


x)2 x2 2
P P
Sxx = (xi = i nx
xi)2
P
(
x2
P
= i n
P P
Sxy = (xi x)(yi y) = xiyi nxy
P P
P ( xi)( yi )
= xiyi n

These are called sums of squares.

10
Goldsman ISyE 6739 12.2 Fitting the Regression Line

Then, after a little more algebra, we can write


Sxy
1 =
Sxx
Fact: If the is are iid N (0, 2), it can be shown that
0 and 1 are the MLEs for 0 and 1, respectively.
(See text for easy proof).

Anyhow, the fitted regression line is:

y = 0 + 1x.

11
Goldsman ISyE 6739 12.1 Simple Linear Regression Model

Fix a specific value of the explanatory variable x, the


equation gives a fitted value y|x = 0 + 1x for the
dependent variable y.

12
y

y = 0 + 1x

y|x

x
x xi
Goldsman ISyE 6739 12.2 Fitting the Regression Line

For actual data points xi, the fitted values are yi =


0 + 1xi.
observed values : yi = 0 + 1xi + i
fitted values : yi = 0 + 1xi

Lets estimate the error variation 2 by considering


the deviations between yi and yi.
SSE = (yi yi = (yi (0 + 1xi))2
)2
P P

= yi2 0 yi 1 xiyi.
P P P

13
Goldsman ISyE 6739 12.2 Fitting the Regression Line

Turns out that 2 SSE


n2 is a good estimator for 2.
P12
Example: Car plant energy usage n = 12, i=1 xi =
x2 yi2 = 98.697,
P P P
58.62, yi = 34.15, i = 291.231,
P
xiyi = 169.253
1 = 0.49883, 0 = 0.4090
fitted regression line is
y = 0.409 + 0.499x y|5.5 = 3.1535
What about something like y|10.0?

14
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

Sxy
(xi x)2 and
P
1 = Sxx , where Sxx =
P P P
Sxy = (xi x)(yi y) = (xi x)yi y (xi x)
P
= (xi x)yi

15
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

Since the yis are independent with yi N(0+1xi, 2)


(and the xis are constants), we have

E1 = S1xx ESxy = S1xx (xi x)Eyi = X1xx


P P
(xi x)(0 + 1x
= 1 [ X(x x) + P(x x)x ]
Sxx 0 | i
{z } 1 i i
0
1 P 2 1 X 2 2)
= Sxx (x i x i x) = (
Sxx | x
i {z nx = 1
}
Sxx

1 is an unbiased estimator of 1.

16
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

Further, since 1 is a linear combination of indepen-


dent normals, 1 is itself normal. We can also derive

1 1 X 2
Var(1) = 2 Var(Sxy ) = 2 (xi x)2Var(yi) = .
Sxx Sxx Sxx
2
Thus, 1 N(1, Sxx )

17
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

While were at it, we can do the same kind of thing


with the intercept parameter, 0:

0 = y 1x

Thus, E0 = Ey xE1 = 0 + 1x x1 = 0 Similar


to before, since 0 is a linear combination of indepen-
dent normals, it is also normal. Finally,
x2
P
Var(0) = i 2.
nSxx

18
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

Proof:
Cov(y, 1) = 1 Cov(y, P(x x)y )
Sxx i i
P
(xix
= Sxx )Cov(y, yi)
P
(xix) 2
= Sxx n = 0

Var(0) = Var(y 1x)


= Var(y) + x2Var1 2x Cov(y,
| {z
1)}
0
2
= n + x2 Sxx 2
 
2
= 2 Sxx nx .
nSxx
P 2
xi 2
Thus, 0 N(0, nSxx ).
19
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

Back to 1 N(1, 2/Sxx) . . .

1 1
q N(0, 1)
2/Sxx
Turns out:
SSE 22(n2)
(1) 2
= n2 n2 ;
(2) 2 is independent of 1.

20
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

1
1
/ Sxx N(0, 1)
s t(n 2)
/ 2(n2)
n2

1 1
t(n 2).
/ Sxx

21
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

t(n 2)

t/2,n2 t/2,n2

22
Goldsman ISyE 6739 12.3 Inferences on Slope Parameter 1

2-sided Confidence Intervals for 1:

1 = Pr(t/2,n2 1 1
t/2,n2)
/ Sxx
= Pr(1 t/2,n2 1 1 + t/2,n2 )
Sxx Sxx

1-sided CIs for 1:

1 (, 1 + t,n2 )
Sxx
1 (1 t,n2 , )
Sxx

23

You might also like