Professional Documents
Culture Documents
Review
Linear regression refers to fitting a best fit line
y=a+bx to the bivariate data (x, y), where
a = my bmx
xy
a = mx bmy
b = cov(x, y)/sy2
r = cov(x, y)/(sx sy)
r = 0.71
200
Weight( lbs)
Weight( lbs)
240
160
120
r = 0.62
200
160
120
80
80
50
60
70
Height (inches)
80
50
60
70
80
Height (inches)
150
100
60
65
70
height (inches)
Outline
Relationship between correlation and
regression, along with notes on the
correlation coefficient
Effect size, and the meaning of r
Other kinds of correlation coefficients
Confidence intervals on the parameters of
correlation and regression
75
The meaning of r
Effect size
X
X
cov(
x, y )
1 N
1 N
(xi m x )) 2
( yi yi' ) 2 = (( yi m y )
2
N i=1
N i=1
s
x
2
cov( x, y ) 1 N
1 N
( yi m y ) 2 +
(xi m x ) 2
2
N i=1
N
i=1
sx
cov( x, y ) N
(xi m x )( yi m y )
2
s2
i=1
= s 2y +
cov( x, y ) 2
2
s x
cov( x, y ) 2
s x2
= s 2y
1 N
( yi yi' ) 2
N i=1
s 2y' =
cov( x, y )
(xi mx )
= my +
s2
cov( x, y ) 2 2
s
(s 2y s 2y' ) / s 2y = s 2y s 2y +
y
s x2
cov( x, y ) 2
s x2 s 2y
= r 2 !!
cov(
x, y ) 2
s x2
r2
350
W = 4H - 117
r2 = 0.23
300
250
60
70
Height (in.)
80
The linear relationship accounts for 23% of the variation in the data.
75
70
Height (in.)
65
60
H = 0.06W + 58
r2 = 0.23
55
50
50
150
250
Weight (lbs.)
350
12
10
8
R = 0.75
6
4
2
0
0
12
10
R = 0.9231
8
6
12
10
8
R =0
6
4
2
0
0
4
2
0
0
The correlation/regression
associated with rpb
10
9
8
7
6
5
4
2
1
0
No
Yes
Significant other?
rpb
Used when one variable is dichotomous
5
3
1
1
rs
10
Outline
Relationship between correlation and
regression, along with notes on the
correlation coefficient
Effect size, and the meaning of r
Other kinds of correlation coefficients
Confidence intervals on the parameters of
correlation and regression
testing on a, b, y, and r
y = + x +
11
An estimator, se for ()
An estimator, se for ()
se = sqrt((yi yi)2/(N-2))
Why n-2? We used up two degrees of
freedom in calculating a and b, leaving n-2
independent pieces of information to
estimate ()
An alternative computational formula:
se = sqrt((ssyy bssxy)/(n-2))
12
13
SE(ynew)
These variances add, so
SE(ynew)2 = SE(y)2 + se2
SE(ynew) = se sqrt(1/N + (xnew mx)2/ssxx + 1)
14
Homework notes
Hypothesis testing on r
from height
Examples
xi
yi
60
62
64
66
68
70
72
74
76
84
95
140
155
119
175
145
197
150
-8
-6
-4
-2
0
2
4
6
8
-56
-45
0
15
-21
35
5
57
10
Sum=612 1260
mx=68 my=140
b = ssxy/ssxx = 1200/240 = 5;
64
36
16
4
0
4
16
36
64
ssxx=240
(yi-my)2
(xi-mx) (yi-my)
3136
2025
0
225
441
1225
25
3249
100
448
270
0
-30
0
70
20
342
80
ssyy=10426
ssxy=1200
15
A last note
The test for 0 is actually the same as the test for
0.
This should make sense to you, since if 0, you
would also expect the slope to be 0
Want an extra point added to your midterm score?
Prove that the two tests are the same.
In general, not just for this example
Submit it with your next homework
16