You are on page 1of 18

Logistic regression

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 1
21-12-08 10:48
Logistic regression

• Member of the GLM family


• Unlike standard linear regression, the
dependent variable is binary (0,1), so that
each cases’ value is either 0 or 1.
• Normally, 0 is taken to mean the absence 
of some attribute, 1 its presence.
• Logistic regression can be extended to the
case where there are more than two
possible values for the dependent variable
(e.g. low, medium, high – multinomial
regression)
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 2
21-12-08 10:48
Example: incidence of heart attacks in
relation to age

1.0

Linear regression
inappropriate because:
0.7

•Residuals not normal


cardiaque

0.4 •Residuals heteroscedastic


•Predicted values nonsense (e.g.
what does a predicted value of
0.3 mean?)
0.1

-0.2

10 30 50 70 90
age

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 3
21-12-08 10:48
Logistic regression: dependent variable
1
• Variable of interest is
the probability p of
Y
obtaining a a one as a
function of predictor
variables 0
• The magnitude of
regression X
coefficients in the
1
model depends on
distribution of the
Y
predictor variables in
the two groups Y= 0
and Y = 1, 0

X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 4
21-12-08 10:48
Dependent variable: logit (p)
100
 p 
logit( p )  y  ln  
 1  p 
80
ey elogit( p )
p 
1 e y
1  elogit ( p )
60
p

40

20

-4 -2 0 2 4
logit

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 5
21-12-08 10:48
Logistic regression: model coefficients
1
• Negative regression
coefficient means
Y
probability of success >0
decreases with
increasing value of 0
predictor.
X
• Positive regression
coefficient means
1
probability of success
decreases with
Y <0
increasing value of
predictor.
0

X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 6
21-12-08 10:48
Logistic regression: model coefficients
1
• The magnitude of
the regression Y
coefficient  > 0, small
depends on how 0
abruptly p
X
changes with X,
with large values 1
indicating abrupt
change. Y
 > 0, large

0
X
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées
© Antoine Morin et Scott Findlay 7
21-12-08 10:48
Least squares
estimation (LSE)

SSR
• An ordinary least
squares (OLS) estimate
of a model parameter
is that which
minimizes the sum of
squared differences OLS
between observed and
predicted values: • Predicted values are
N derived from some
SS R   ( yi  yˆ ) 2 model whose
parameters we wish to
i 1
yˆ  f ( x, )
estimate

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 8
21-12-08 10:48
Maximum likelihood
- log L
estimation (MLE) L

L or - log L
• A maximum likelihood
estimate (MLE) of a
model parameter for a
given distribution is that
which maximizes the
probability of generating
MLE
the observed sample
data.
• …or equivalently, by
• MLEs are obtained by
minimizing the negative
maximizing the loss log likelihood function
function n n
L    ( xi ; )  log L   ln( ( xi ; ))
i 1 i 1

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 9
21-12-08 10:48
How are the model parameters
estimated?
• Estimated not by least squares, but rather
by Maximum Likelihood
– Based on an estimate of the likelihood of obtaining
the observed results based on different values of
the model parameters
– In principle, parameter estimates should converge
to those maximizing log-likelihood or minimizing -
LogL

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 10
21-12-08 10:48
Hypothesis testing

• Likelihood
– Deviance=-2L
– Is apprioximately distributed as chi-square
– Measures the variation unexplained by the fitted
model, analagous to residual sums of squares.
• Model comparison
– Change in deviance when model terms are added
(or deleted) is also approximately distributed as
chi-square, so can test hypotheses relating to
individual model terms.

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 11
21-12-08 10:48
Model assumptions

• Observations are independent


• Dependent variable has a binomial
distribution
• Little error in measurement of dependent
variables.

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 12
21-12-08 10:48
Logistic regression in SPlus
*** Generalized Linear Model ***

Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action =
na.exclude, control
= list(epsilon = 0.0001, maxit = 50, trace = F))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.545637 -0.5732664 -0.272312 -0.1404323 2.679875
Coefficients:
Value Std. Error t value
(Intercept) -7.76838060 0.376403465 -20.63844
age 0.09557905 0.005097055 18.75182
(Dispersion Parameter for Binomial family taken to be 1 )
Null Deviance: 2050.515 on 1999 degrees of freedom
Residual Deviance: 1490.001 on 1998 degrees of freedom
Number of Fisher Scoring Iterations: 4

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 13
21-12-08 10:48
Incidence of heart attack in relation to age

0.9
y=logit(p)  7.77  0.96 Age
0.7 ey elogit( p ) e 7.770.96 Age
p  
1 e y
1 e logit( p )
1  e 7.770.96 Age
cardiaque

0.5

0.3

0.1

-0.1

30 40 50 60 70 80 90
age

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 14
21-12-08 10:48
Presence of post-operative kyphosis using
logistic regression

Kyphosis: a binary variable indicating the


presence/absence
of a postoperative spinal deformity called Kyphosis.
• Age: the age of the child in months.
• Number: the number of vertebrae involved in the spinal
operation.
• Start: the beginning of the range of the vertebrae
involved in the operation

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 15
21-12-08 10:48
Evidence that the distribution of predictor
variables differs among levels of response
variable

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 16
21-12-08 10:48
The model

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 17
21-12-08 10:48
Testing hypotheses

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées


© Antoine Morin et Scott Findlay 18
21-12-08 10:48

You might also like