Professional Documents
Culture Documents
1.0
Linear regression
inappropriate because:
0.7
-0.2
10 30 50 70 90
age
X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 4
21-12-08 10:48
Dependent variable: logit (p)
100
p
logit( p ) y ln
1 p
80
ey elogit( p )
p
1 e y
1 elogit ( p )
60
p
40
20
-4 -2 0 2 4
logit
X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 6
21-12-08 10:48
Logistic regression: model coefficients
1
• The magnitude of
the regression Y
coefficient > 0, small
depends on how 0
abruptly p
X
changes with X,
with large values 1
indicating abrupt
change. Y
> 0, large
0
X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 7
21-12-08 10:48
Least squares
estimation (LSE)
SSR
• An ordinary least
squares (OLS) estimate
of a model parameter
is that which
minimizes the sum of
squared differences OLS
between observed and
predicted values: • Predicted values are
N derived from some
SS R ( yi yˆ ) 2 model whose
parameters we wish to
i 1
yˆ f ( x, )
estimate
L or - log L
• A maximum likelihood
estimate (MLE) of a
model parameter for a
given distribution is that
which maximizes the
probability of generating
MLE
the observed sample
data.
• …or equivalently, by
• MLEs are obtained by
minimizing the negative
maximizing the loss log likelihood function
function n n
L ( xi ; ) log L ln( ( xi ; ))
i 1 i 1
• Likelihood
– Deviance=-2L
– Is apprioximately distributed as chi-square
– Measures the variation unexplained by the fitted
model, analagous to residual sums of squares.
• Model comparison
– Change in deviance when model terms are added
(or deleted) is also approximately distributed as
chi-square, so can test hypotheses relating to
individual model terms.
Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action =
na.exclude, control
= list(epsilon = 0.0001, maxit = 50, trace = F))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.545637 -0.5732664 -0.272312 -0.1404323 2.679875
Coefficients:
Value Std. Error t value
(Intercept) -7.76838060 0.376403465 -20.63844
age 0.09557905 0.005097055 18.75182
(Dispersion Parameter for Binomial family taken to be 1 )
Null Deviance: 2050.515 on 1999 degrees of freedom
Residual Deviance: 1490.001 on 1998 degrees of freedom
Number of Fisher Scoring Iterations: 4
0.9
y=logit(p) 7.77 0.96 Age
0.7 ey elogit( p ) e 7.770.96 Age
p
1 e y
1 e logit( p )
1 e 7.770.96 Age
cardiaque
0.5
0.3
0.1
-0.1
30 40 50 60 70 80 90
age