You are on page 1of 46

Chapter 11

Regression
with a Binary
Dependent
Variable

Copyright 2015 Pearson, Inc. All rights reserved.

Outline
1. The Linear Probability Model
2. Probit and Logit Regression
3. Estimation and Inference in Probit and
Logit
4. Application to Racial Discrimination in
Mortgage Lending

Copyright 2015, Pearson, Inc. All rights reserved.

11-2

1. What was the question all pollees were asked ?


2. If the dependent variable is your stand on the issue, and the
two potential answers are SUPPORT IT and AGAINST IT, what
type of variable is the dependent variable then ?
3. What factors do you think would explain the variation in the
dependent variable ? What could explain why some support it and
some are against it ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-3
13-3

Binary Dependent Variables: Whats


Different?
So far the dependent variable (Y) has been continuous:
district-wide average test score
traffic fatality rate
What if Y is binary?
Y = get into college, or not; X = high school grades, SAT
scores, demographic variables
Y = person smokes, or not; X = cigarette tax rate, income,
demographic variables
Y = mortgage application is accepted, or not; X = race,
income, house characteristics, marital status

Copyright 2015, Pearson, Inc. All rights reserved.

11-4

Example: Mortgage Denial and Race


The Boston Fed HMDA Dataset
Individual applications for single-family mortgages made in
1990 in the greater Boston area
2380 observations, collected under Home Mortgage
Disclosure Act (HMDA)
Variables
Dependent variable:
Is the mortgage denied or accepted? YES/NO

Independent variables:
income, wealth, employment status
other loan, property characteristics
race of applicant

Copyright 2015, Pearson, Inc. All rights reserved.

11-5

Binary Dependent Variables and the Linear


Probability Model (SW Section 11.1)
A natural starting point is the linear regression model with a
single regressor:

Yi = 0 + 1Xi + ui
But:
Y
What does 1 mean when Y is binary? Is 1 =
?

What does the line 0 + 1X mean when Y is binary?


What does the predicted value Y mean when Y is binary?
For example, what does Y = 0.26 mean?

Copyright 2015, Pearson, Inc. All rights reserved.

11-6

The Linear Probability Model


The linear probability model is simply running OLS
for a regression, where the dependent variable is a
dummy (i.e. binary) variable:
(13.1)
where Di is a dummy variable, and the Xs, s, and
are typical independent variables, regression
coefficients, and an error term, respectively
The term linear probability model comes from the
fact that the right side of the equation is linear while
the expected value of the left side measures the
probability that Di = 1
Copyright 2015, Pearson, Inc. All rights reserved.

11-7
13-7

The linear probability model, ctd.


In the linear probability model, the predicted value of Y is
interpreted as the predicted probability thatY=1, and 1 is the
change in that predicted probability for a unit change in X.
Heres the math:
Linear probability model: Yi = 0 + 1Xi + ui
When Y is binary,
E(Y|X) = 1Pr(Y=1|X) + 0Pr(Y=0|X) = Pr(Y=1|X)
Under LS assumption #1, E(ui|Xi) = 0, so
E(Yi|Xi) = E(0 + 1Xi + ui|Xi) = 0 + 1Xi,
so
Pr(Y=1|X) = 0 + 1Xi
Copyright 2015, Pearson, Inc. All rights reserved.

11-8

The linear probability model, ctd.


When Yisbinary,thelinearregressionmodelYi=0 + 1Xi+ui
is called the linear probability model because
Pr(Y=1|X)=0 + 1Xi

The predicted value is a probability:


E(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x
Y = the predicted probability that Yi = 1, given X

1 = change in probability that Y = 1 for a unit change


in x:
1 = Pr(Y 1| X x x) Pr(Y 1| X x)

Copyright 2015, Pearson, Inc. All rights reserved.

11-9

Example: linear probability model, HMDA data


Mortgage denial v. ratio of debt payments to income
(P/I ratio) in a subset of the HMDA data set (n = 127)

Copyright 2015, Pearson, Inc. All rights reserved.

11-10

Linear probability model: full HMDA data


set

deny

= -.080 + .604P/Iratio
(n = 2380)
(.032) (.098)
What is the predicted value when P/I ratio = .3?
deny 1| P / Iratio .3) = -.080 + .604.3 = .151
Pr(

Calculating effects: increase P/Iratio from .3 to .4:


= -.080 + .604.4 = .212
deny 1| P / Iratio .4
Pr(
.3)
The effect on the probability of denial of an increase in P/Iratio
from .3 to .4 is to increase the probability by .061, that is,
by 6.1 percentagepoints (what?).

Copyright 2015, Pearson, Inc. All rights reserved.

11-11

Linear probability model: HMDA data, ctd


Next include black as a regressor:
= -.091 + .559P/Iratio + .177black

(.032) (.098)
(.025)
deny
Predicted probability of denial:
for black applicant with P/Iratio = .3:
= -.091 + .559.3 + .1771 = .254
deny 1)
Pr(for
white applicant, P/Iratio = .3:
= -.091 + .559.3 + .1770 = .077
difference
Pr(
deny 1) = .177 = 17.7 percentage points
Coefficient on black is significant at the 5% level
Stillplentyofroomforomittedvariablebias
Credithistory,earningspotential,etc

Copyright 2015, Pearson, Inc. All rights reserved.

11-12

The linear probability model: Summary


The linear probability model models Pr(Y=1|X) as a linear
function of X
Advantages:
simple to estimate and to interpret
inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)

Disadvantages:
A LPM says that the change in the predicted probability for a
given change in X is the same for all values of X, but that
doesnt make sense. Think about the HMDA example
Also, LPM predicted probabilities can be <0 or >1!

These disadvantages can be solved by using a nonlinear


probability model: probit and logit regression

Copyright 2015, Pearson, Inc. All rights reserved.

11-13

Probit and Logit Regression


(SW Section 11.2)
The problem with the linear probability model is that
it models the probability of Y=1 as being linear:
Pr(Y = 1|X) = 0 + 1X
Instead, we want:
I.Pr(Y = 1|X) to be increasing in X for 1>0, and
II.0 Pr(Y = 1|X) 1 for all X
This requires using a nonlinear functional form for
the probability. How about an S-curve
Copyright 2015, Pearson, Inc. All rights reserved.

11-14

A. The PROBIT
Regression
Copyright 2015, Pearson, Inc. All rights reserved.

11-15
13-

The probit model satisfies these conditions:


I.

Pr(Y = 1|X) to be increasing in X for 1>0, and

II. 0 Pr(Y = 1|X) 1 for all X


Copyright 2015, Pearson, Inc. All rights reserved.

11-16

Probit regression models the probability that Y=1 using


the cumulative standard normal distribution function,
(z), evaluated at z = 0 + 1X.
The probit regression model is,
Pr(Y = 1|X) = (0 + 1X)
where
is the cumulative normal distribution function and
z = 0 + 1X is the z-value or z-index of the probit
model.
Example: Suppose 0 = -2, 1= 3, X = .4, so
Pr(Y = 1|X=.4) = (-2 + 3.4) = (-0.8)
Pr(Y = 1|X=.4) = area under the standard normal density
to left of z = -.8, which is
Copyright 2015, Pearson, Inc. All rights reserved.

11-17

Pr(z -0.8) = .2119


Copyright 2015, Pearson, Inc. All rights reserved.

11-18

Probit regression, ctd.

Why use the cumulative normal probability distribution?

The S-shape gives us what we want:


Pr(Y = 1|X) is increasing in X for 1>0
0 Pr(Y = 1|X) 1 for all X

Easy to use the probabilities are tabulated in the


cumulative normal tables (and also are easily computed
using regression software)
Relatively straightforward interpretation:
0 + 1X = z-value
0 + X is the predicted z-value, given X
1
1 is the change in the z-value for a unit change in X(notethat
thisisNOTtheregular1unitincreaseinXisassociatedwitha
1changeinY)
Copyright 2015, Pearson, Inc. All rights reserved.

11-19

STATA Example: HMDA data


. probit deny p_irat, r;
Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
Probit estimates

likelihood
likelihood
likelihood
likelihood

= -872.0853
= -835.6633
= -831.80534
= -831.79234

Well discuss this later

Number of obs
=
2380
Wald chi2(1)
=
40.68
Prob > chi2
=
0.0000
Log likelihood = -831.79234
Pseudo R2
=
0.0462
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
2.967908
.4653114
6.38
0.000
2.055914
3.879901
_cons | -2.194159
.1649721
-13.30
0.000
-2.517499
-1.87082
------------------------------------------------------------------------------

deny 1| P / Iratio)
Pr(

Copyright 2015, Pearson, Inc. All rights reserved.

= (-2.19 + 2.97P/Iratio)
(.16) (.47)
11-20

STATA Example: HMDA data, ctd.


deny 1| P / Iratio)
Pr(
= (-2.19 + 2.97P/Iratio)
(.16)
(.47)
Positive coefficient with no numerical interpretation: Doesthis
makesense?
Standard errors have the usual interpretation
Use probit to calculate:
Probability at a point in time
Change in probability when ONE regressor changes (all else constant)

Predicted probabilities:
deny 1| P / Iratio .3) = (-2.19+2.97.3)= (-1.30) = .097
Pr(
Effect of change in P/Iratio from .3 to .4:
= (-2.19+2.97.4)= (-1.00) = .159

Pr(deny 1| P / Iratio .4)


Predicted probability of denial rises from .097 to .159
Copyright 2015, Pearson, Inc. All rights reserved.

11-21

Probit regression with multiple regressors


Pr(Y = 1|X1, X2) = (0 + 1X1 + 2X2)
is the cumulative normal distribution
function.
z = 0 + 1X1 + 2X2 is the z-value or zindex of the probit model.
1 is the effect on the z-score of a unit
change in X1, holding constant X2
Copyright 2015, Pearson, Inc. All rights reserved.

11-22

STATA Example: HMDA data


. probit deny p_irat black, r;
Iteration 0:
log likelihood = -872.0853
Iteration 1:
log likelihood = -800.88504
Iteration 2:
log likelihood = -797.1478
Iteration 3:
log likelihood = -797.13604
Probit estimates

Number of obs
=
2380
Wald chi2(2)
=
118.18
Prob > chi2
=
0.0000
Log likelihood = -797.13604
Pseudo R2
=
0.0859
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------

Wellgothroughtheestimationdetailslater

Copyright 2015, Pearson, Inc. All rights reserved.

11-23

STATA: Calculate predicted probit


probabilities
. probit deny p_irat black, r;
Probit estimates

Number of obs
=
2380
Wald chi2(2)
=
118.18
Prob > chi2
=
0.0000
Log likelihood = -797.13604
Pseudo R2
=
0.0859
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------

. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0
. display "Pred prob, p_irat=.3, white: " normprob(z1)
Pred prob, p_irat=.3, white: .07546603
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
Copyright 2015, Pearson, Inc. All rights reserved.

11-24

Calculating Significance and Estimated


Effects
deny 1| P / I , black )
Pr(
= (-2.26 + 2.74P/Iratio + .71black)
(.16)

(.44)

(.08)

Is the coefficient on black statistically significant?


Estimated effect of race for P/I ratio = .3:
deny 1| .3,1)
= (-2.26+2.74.3+.711) = .233
Pr(
=0)
(-2.26+2.74.3+.710) = .075
deny 1| .3,
Pr(

Difference in rejection probabilities = .158 (15.8


percentage points)
Stillplentyofroomforomittedvariablebias!
Copyright 2015, Pearson, Inc. All rights reserved.

11-25

More in class practice and examples

__________________

Copyright 2015, Pearson, Inc. All rights reserved.

11-26

STATA Predicted probit probabilities


So far, by hand, we could only compute predicted
probabilities by introducing numerical values
for our regressors.

STATA COMES TO THE RESCUE


In our STATA problem set, we are going to learn
how to use STATA to give us the estimated
effect of a 1 unit increase in X, holding all else
constant, on the probability of Y=1.
(mfx command)
11-27
Copyright 2015, Pearson, Inc. All rights reserved.

STATA Predicted probit probabilities


mfx, predict(p)

Marginal effects after probit


y = Pr(deny) (predict, p)
= .10548556
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------pi_rat |
.5001945
.07942
6.30
0.000
.34454 .655849
.330814
black*|
.1716895
.02466
6.96
0.000
.123349
.22003
.142437
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

1. How do we interpret these coefficients in words?


2. What does * mean in words?
Copyright 2015, Pearson, Inc. All rights reserved.

11-28

B. The LOGIT
Regression
Copyright 2015, Pearson, Inc. All rights reserved.

11-29
13-

Logit Regression
Logit regression models the probability of Y=1,
given X, as the cumulative standard logistic
distribution function, evaluated at z = 0 + 1X:
Pr(Y = 1|X) = F(0 + 1X)
where F is the cumulative logistic distribution
function:
1
( 0 1 X )
F(0 + 1X) =

1 e

Because logit and probit use different probability


functions, the coefficients (s) are different in logit
and probit.
Copyright 2015, Pearson, Inc. All rights reserved.

11-30

Copyright 2015, Pearson, Inc. All rights reserved.

11-31
13-

Logit regression, ctd.


Pr(Y = 1|X) = F(0 + 1X)
where F(0 + 1X) =

1
1 e

( 0 1 X )

Example:
Again, we need specific numerical values:

0 = -3, 1= 2, X = .4,
so 0 + 1X = -3 + 2X.4 = -2.2 so
Pr(Y = 1|X=.4) = 1/(1+e(2.2)) = .0998

Copyright 2015, Pearson, Inc. All rights reserved.

11-32

STATA Example: HMDA data


.

logit deny p_irat black, r;

Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
Iteration 4:
log
Logit estimates

likelihood
likelihood
likelihood
likelihood
likelihood

= -872.0853
= -806.3571
= -795.74477
= -795.69521
= -795.69521

Later

Number of obs
=
2380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Log likelihood = -795.69521
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------

dis "Pred prob, p_irat=.3, white: ">


+_b[p_irat]*.3+_b[black]*0)));

1/(1+exp(-(_b[_cons]

Pred prob, p_irat=.3, white: .07485143

NOTE:

the probit predicted probability is .07546603

Copyright 2015, Pearson, Inc. All rights reserved.

11-33

Interpreting Estimated Logit Coefficients (cont.)

Probability at a point in time

Change in probability when ONE regressor


changes (all else constant)

Copyright 2015, Pearson, Inc. All rights reserved.

11-34

More in class practice and examples


We run the same model again, but with the logit regression
logit deny pi_rat black, r
Logistic regression

Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Log pseudolikelihood = -795.69521
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------

1. Discuss the significance of the coefficients in the regression


2. How do the coefficients differ from the Probit Regression ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-35

The predicted probabilities from the probit and logit


models are very close in these HMDA regressions:

Copyright 2015, Pearson, Inc. All rights reserved.

11-36

More in class practice and examples


We run the same model again, but with the logit regression
logit deny pi_rat black, r
Logistic regression

Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Log pseudolikelihood = -795.69521
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------

3. Why are the coefficients not the same as under the Probit

Regression ?
4. Please interpret the coefficients in words
5. Calculate the predicted probability for a white person with a
pi_rat of .38
Copyright 2015, Pearson, Inc. All rights reserved.

11-37

STATA Predicted logit probabilities


So far, by hand, we could only compute predicted
probabilities by introducing numerical values
for our regressors.

STATA COMES TO THE RESCUE


In our STATA problem set, we are going to learn
how to use STATA to give us the estimated
effect of a 1 unit increase in X, holding all else
constant, on the probability of Y=1.
(mfx command)
11-38
Copyright 2015, Pearson, Inc. All rights reserved.

More in class practice and examples


mfx, predict(p)
Marginal effects after logit
y = Pr(deny) (predict, p)
= .10269081
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------pi_rat |
.4948542
.08524
5.81
0.000
.327794 .661914
.330814
black*|
.1670805
.02454
6.81
0.000
.118974 .215187
.142437
------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

1. How do we interpret these coefficients in


words?
2. What does * mean in words?
3. Are the marginal effects different from the
Probit Regression ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-39

Logit regression, ctd.


Why bother with logit if we have probit?
The main reason is historical: logit is
computationally faster & easier, but that
doesnt matter nowadays
In practice, logit and probit are very similar
since empirical results typically dont
hinge on the logit/probit choice, both tend
to be used in practice

Copyright 2015, Pearson, Inc. All rights reserved.

11-40

Measures of Fit for Logit and Probit


2
2
The R and R dont make sense here (why?). So, two
other specialized measures are used:

1. The fraction correctly predicted = fraction of Ys


for which the predicted probability is >50% when
Yi=1, or is <50% when Yi=0.
Not reported in our regression output, so we learn how to
calculate in in our Stata Problem Set (estat command)

2. The pseudo-R2 measures the improvement in the


value of the log likelihood, relative to having no Xs
(see SW App. 11.2). The pseudo-R2 simplifies to the
R2 in the linear model with normally distributed
errors.

Reported in our regression output in Stata

Copyright 2015, Pearson, Inc. All rights reserved.

11-41

More in class practice and examples


probit deny p_irat black, r;
Probit estimates

Same as F-test value


in linear regression

Log likelihood = -797.13604

Number of obs
Wald chi2(2)
Prob > chi2

=
=
=

2380
118.18
0.0000

Pseudo R2

0.0859

-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------

1. How would you interpret the coefficients in words ?


2. Which coefficients are significant and which are not ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-42

More in class practice and examples


We run the same model again, but with the logit regression
logit deny pi_rat black, r
Logistic regression

Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Log pseudolikelihood = -795.69521
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------

Discuss the measures of fit of the logistic regression ? Are any


missing from the Stata output ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-43

More in class examples and practice

1. Lets interpret every coefficient from models (1)-(7)


2. Can you tell without any additional info which of the models
should not be reported in a paper ?
3. What additional information do you need to decide which
model is best ?
Copyright 2015, Pearson, Inc. All rights reserved.

11-44

Conclusion
(SW Section 11.5)
If Yi is binary, then E(Y| X) = Pr(Y=1|X)
Three models:
linear probability model (linear multiple regression)
probit (cumulative standard normal distribution)
logit (cumulative standard logistic distribution)

LPM, probit, logit all produce predicted probabilities


Effect of X is change in conditional probability that Y=1.
For logit and probit, this depends on the initial X
Probit and logit are estimated via maximum likelihood
Coefficients are normally distributed for large n
Large-n hypothesis testing, conf. intervals is as usual

Copyright 2015, Pearson, Inc. All rights reserved.

11-45

CDF calculator online


http://
www.danielsoper.com/statcalc3/calc.aspx?id
=53

Copyright 2015, Pearson, Inc. All rights reserved.

11-46

You might also like