14 Logistsic Regression

Logistic regression
Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

© Antoine Morin et Scott Findlay 1
21-12-08 10:48
Logistic regression
• Member of the GLM family

• Unlike standard linear regression, the
dependent variable is binary (0,1), so that
each cases’ value is either 0 or 1.
• Normally, 0 is taken to mean the absence
of some attribute, 1 its presence.
• Logistic regression can be extended to the
case where there are more than two
possible values for the dependent variable
(e.g. low, medium, high – multinomial
regression)
21-12-08 10:48
Example: incidence of heart attacks in
relation to age
1.0
Linear regression
inappropriate because:
0.7
•Residuals not normal

cardiaque
0.4 •Residuals heteroscedastic

•Predicted values nonsense (e.g.
what does a predicted value of
0.3 mean?)
0.1
-0.2
10 30 50 70 90
age

21-12-08 10:48
Logistic regression: dependent variable
1
• Variable of interest is
the probability p of
Y
obtaining a a one as a
function of predictor
variables 0
• The magnitude of
regression X
coefficients in the
1
model depends on
distribution of the
Y
predictor variables in
the two groups Y= 0
and Y = 1, 0
X
21-12-08 10:48
Dependent variable: logit (p)
100
 p 
logit( p )  y  ln  
 1  p 
80
ey elogit( p )
p 
1 e y
1  elogit ( p )
60
p
40
20
-4 -2 0 2 4
logit

21-12-08 10:48
Logistic regression: model coefficients
1
• Negative regression
coefficient means
Y
probability of success >0
decreases with
increasing value of 0
predictor.
X
• Positive regression
coefficient means
1
probability of success
decreases with
Y <0
increasing value of
predictor.
0
X
21-12-08 10:48
Logistic regression: model coefficients
1
• The magnitude of
the regression Y
coefficient  > 0, small
depends on how 0
abruptly p
X
changes with X,
with large values 1
indicating abrupt
change. Y
 > 0, large
0
X
21-12-08 10:48
Least squares
estimation (LSE)
SSR
• An ordinary least
squares (OLS) estimate
of a model parameter
is that which
minimizes the sum of
squared differences OLS
between observed and
predicted values: • Predicted values are
N derived from some
SS R   ( yi  yˆ ) 2 model whose
parameters we wish to
i 1
yˆ  f ( x, )
estimate

21-12-08 10:48
Maximum likelihood
- log L
estimation (MLE) L
L or - log L
• A maximum likelihood
estimate (MLE) of a
model parameter for a
given distribution is that
which maximizes the
probability of generating
MLE
the observed sample
data.
• …or equivalently, by
• MLEs are obtained by
minimizing the negative
maximizing the loss log likelihood function
function n n
L    ( xi ; )  log L   ln( ( xi ; ))
i 1 i 1

21-12-08 10:48
How are the model parameters
estimated?
• Estimated not by least squares, but rather
by Maximum Likelihood
– Based on an estimate of the likelihood of obtaining
the observed results based on different values of
the model parameters
– In principle, parameter estimates should converge
to those maximizing log-likelihood or minimizing -
LogL

21-12-08 10:48
Hypothesis testing
• Likelihood
– Deviance=-2L
– Is apprioximately distributed as chi-square
– Measures the variation unexplained by the fitted
model, analagous to residual sums of squares.
• Model comparison
– Change in deviance when model terms are added
(or deleted) is also approximately distributed as
chi-square, so can test hypotheses relating to
individual model terms.

21-12-08 10:48
Model assumptions
• Observations are independent

• Dependent variable has a binomial
distribution
• Little error in measurement of dependent
variables.

21-12-08 10:48
Logistic regression in SPlus
*** Generalized Linear Model ***
Call: glm(formula = cardiaque ~ age, family = binomial(link = logit), data = SDF12, na.action =
na.exclude, control
= list(epsilon = 0.0001, maxit = 50, trace = F))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.545637 -0.5732664 -0.272312 -0.1404323 2.679875
Coefficients:
Value Std. Error t value
(Intercept) -7.76838060 0.376403465 -20.63844
age 0.09557905 0.005097055 18.75182
(Dispersion Parameter for Binomial family taken to be 1 )
Null Deviance: 2050.515 on 1999 degrees of freedom
Residual Deviance: 1490.001 on 1998 degrees of freedom
Number of Fisher Scoring Iterations: 4

21-12-08 10:48
Incidence of heart attack in relation to age
0.9
y=logit(p)  7.77  0.96 Age
0.7 ey elogit( p ) e 7.770.96 Age
p  
1 e y
1 e logit( p )
1  e 7.770.96 Age
cardiaque
0.5
0.3
0.1
-0.1
30 40 50 60 70 80 90
age

21-12-08 10:48
Presence of post-operative kyphosis using
logistic regression
Kyphosis: a binary variable indicating the

presence/absence
of a postoperative spinal deformity called Kyphosis.
• Age: the age of the child in months.
• Number: the number of vertebrae involved in the spinal
operation.
• Start: the beginning of the range of the vertebrae
involved in the operation

21-12-08 10:48
Evidence that the distribution of predictor
variables differs among levels of response
variable

21-12-08 10:48
The model

21-12-08 10:48
Testing hypotheses

21-12-08 10:48

14 Logistsic Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14 Logistsic Regression

Uploaded by

Copyright:

Available Formats

Logistic regression

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

• Member of the GLM family

•Residuals not normal

0.4 •Residuals heteroscedastic

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

• Observations are independent

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Kyphosis: a binary variable indicating the

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

Université d’Ottawa - Bio 4518 - Biostatistiques appliquées

You might also like