Professional Documents
Culture Documents
Introduction
Simple Linear Regression & GLM
Least Squares Trend Line Fitting
Model Testing
Inference and Regression
Regression Diagnostics
Correlation
Chpts. 16 & 17 W&S
1
Introduction
Recall, to date, have focused on statistics examining one
variable from either one or two samples.
Regression is a method
used to predict the value Regression
of one numerical value
from another.
35 X is the independent or
30 predictor variable.
25 Independent because it
20 is assigned by
Y
15 investigator &
10 independent of
5 measurement error.
0
1 2 3 4 5
Y is the dependent or
X
response variable.
4
6
Simple Linear Regression
- Example -
A simple initial
scatterplot
80
Y
to respond to X
0
0 10 20 30 40 50 60
X 7
previous scatterplot is
Blood Pressure (mm Hg)
85
misleading.
80
Age (years)
y = bx + a
y = + x +
Where: = the y-intercept
= the slope
= error deviation from mean
10
of points. 86.4
82.1
We use a procedure
known as Least Squares 77.9
69.3
We attempt to minimize
the s. 65.0
20.0 28.0 36.0 44.0 52.0 60.0
Age (yrs)
11
12
Least Squares Trend Line Fitting
95
Equation 3 is very notable,
because it means that the 90
?
Blood Pressure (mm Hg)
least squares trend line
MUST run through the 85
Age (yrs)
13
1. Determine x, x2, y, xy
14
15
Least Squares Trend Line Fitting
- Example -
x y x2 xy Mean x = 36.90
31 7.8 961 241.8 Mean y = 10.38
32 8.3 1024 265.6
33 7.6 1089 250.8
34 9.1 1156 309.4 Lastly, we need to
35 9.6 1225 336.0 calculate:
35 9.8 1225 343.0
40 11.8 1600 472.0 Sxx: Sum of squared-x
41 12.1 1681 496.1 deviations
42 14.7 1764 617.4 Sxy: Sum of xy-product
46 13.0 2116 598.0
deviations
369 103.8 13841 3930.1
16
( x )2
Sxx = 224.9
S xx = x 2
N
( x )( y )
S xy = xy N Sxy = 99.88
S xy
b= b = 0.444
S xx
a = y bx a = -6.0076
20
15 Y = 0.444 X - 6.007
10
Y
-5
-10
0 10 20 30 40 50
X
18
Least Squares Trend Line Fitting
- Caveat -
Hypothetical Data
You have just fitted your 0.45
first regression line! Predicted
0.4
Response
0.35
You have also created a via Fitted Valid
linear model against which 0.3 Line Prediction
Range
you could predict any value 0.25
of y from x. 0.2
examined. 0
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99
19
OLS using R
Call:
lm(formula = X ~ Y)
Coefficients:
(Intercept) Y
-46.377 1.078 20
75
70
25 30 35 40 45 50
21
X
> fitted(model1)
1 2 3 4 5
71.01666 67.92322 85.86517 79.67829 70.39797
6 7 8 9 10
71.63535 80.29698 74.72879 78.44092 71.01666
> resid(model1)
1 2 3 4 4
-1.01665799 0.07678293 4.13482561 -4.67829256 -2.39796981
6 7 8 9 10
8.36465383 -2.29698074 -4.72878709 1.55908381 0.98334201
> segments(X,fitted(model1),X,Y) 90
85
80
Y
75
70
25 30 35 40 45 50
22
X
Model Testing
23
Model Testing
To meet the conditions for the regression of y on x:
24
Model Testing
Model Testing
95.0
Once again, recall
that the s, or 90.7
Blood Pressure (mm Hg)
residuals, are
represented by the
86.4
difference between 82.1
27
Model Testing
Thus, the assumptions of linear regression are largely tied
to the behavior of the residuals!
28
Normality
You may subject the
residuals to the same
measures of skewness,
kurtosis, and tests of
normality that we have
previously used in
univariate analysis.
29
Linearity
30
Equality of
Variance
Variances that are
independent of x (i.e.,
homogeneous) will
result in a horizontal
band of points around
= 0.
Independence
32
Model Testing
33
Model Testing
Model Testing
- Beta -
36
Model Testing
- Beta -
S yy = y 2 ( y ) / n ,
2
NB: df = N-2
S xy = xy ( x y) / n
37
Model Testing
- Alpha, Beta, y-hat -
38
Call:
lm(formula = X ~ Y)
Residuals:
Min 1Q Median 3Q Max
-10.88342 -2.71830 0.08607 4.00782 7.50782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -46.3765 20.3033 -2.284 0.05173 .
Y 1.0782 0.2693 4.004 0.00393 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Standardized residuals
2.0
6 6
5
3
Residuals
1.0
-1.0 0.0
0
-5
8 4
4
Standardized residuals
6
6
2
3
4
1.0
3 1
1
0.5
0.5
0
-1
0.5
Cook's distance
0.0
4 1
Model Testing
- Confidence Intervals -
Often, we wish to place 95% CIs around our best fit trend
line.
41
Model Testing
- Confidence Intervals -
CI 0.95 : y t / 2, n 2 s y x
(
1 x * x
+
) 2 The 1 lessens
n S xx
the influence of
the means, hence
1 x * x
PI 0.95 : y t / 2, n 2 s y x 1 + +
( ) 2
less flare on plot.
n S xx
43
75
70
25 30 35 40 45 50
X 44
Nonparametric Regression
45
Kendalls Robust Line-fit Method
- Procedure -
47
Calculate Sjis:
Data Set:
X Y S21 = (8.14 -8.98)/(12-0) = -0.07000
0.0 8.98 S32 = (6.67 - 8.14)/(29.5-12) = -0.08400
12.0 8.14 .
29.5 6.67 .
43.0 6.08 S31 = (6.67 - 8.98)/(29.5 - 0) = -0.07831
53.0 5.90 .
62.5 5.83 .
75.5 4.68 S91 = (3.72 - 8.98)/(93.0 - 0) = -0.05656
85.0 4.20
93.0 3.72 Median of the 36 slopes: b = -0.05436
48
Kendalls Robust Line-fit Method
- Example -
Correlation
There are many purposes to regression, but the main one
is for prediction. Thus, the goal is to determine the
NATURE of the relationship between two variables.
Correlation
S xy
r=
S xx S yy
In all cases, -1 r +1
r = 0 is no relationship
r = 1 is a perfect relationship (pos. or neg.)
51
Coefficient of Determination
S xy2
R2 =
S xx S yy
52
Coefficient of Determination
Correlation
54
Regression
Model
vs.
Correlation
Model
55
Nonparametric Correlation
56
6d2
rs = 1 with d = rx ry (diff. in x, y ranks)
N (N 2 1)
Test statistic : z = rs n 1
59