Econometrics Chapter 11 PPT Slides

Chapter 11
Regression
with a Binary
Dependent
Variable
Copyright 2015 Pearson, Inc. All rights reserved.
Outline
1. The Linear Probability Model
2. Probit and Logit Regression
3. Estimation and Inference in Probit and
Logit
4. Application to Racial Discrimination in
Mortgage Lending
Copyright 2015, Pearson, Inc. All rights reserved.
11-2
1. What was the question all pollees were asked ?

2. If the dependent variable is your stand on the issue, and the
two potential answers are SUPPORT IT and AGAINST IT, what
type of variable is the dependent variable then ?
3. What factors do you think would explain the variation in the
dependent variable ? What could explain why some support it and
some are against it ?
11-3
13-3
Binary Dependent Variables: Whats

Different?
So far the dependent variable (Y) has been continuous:
district-wide average test score
traffic fatality rate
What if Y is binary?
Y = get into college, or not; X = high school grades, SAT
scores, demographic variables
Y = person smokes, or not; X = cigarette tax rate, income,
demographic variables
Y = mortgage application is accepted, or not; X = race,
income, house characteristics, marital status
11-4
Example: Mortgage Denial and Race

The Boston Fed HMDA Dataset
Individual applications for single-family mortgages made in
1990 in the greater Boston area
2380 observations, collected under Home Mortgage
Disclosure Act (HMDA)
Variables
Dependent variable:
Is the mortgage denied or accepted? YES/NO
Independent variables:
income, wealth, employment status
other loan, property characteristics
race of applicant
11-5
Binary Dependent Variables and the Linear

Probability Model (SW Section 11.1)
A natural starting point is the linear regression model with a
single regressor:
Yi = 0 + 1Xi + ui
But:
Y
What does 1 mean when Y is binary? Is 1 =
?
What does the line 0 + 1X mean when Y is binary?

What does the predicted value Y mean when Y is binary?
For example, what does Y = 0.26 mean?
11-6
The Linear Probability Model

The linear probability model is simply running OLS
for a regression, where the dependent variable is a
dummy (i.e. binary) variable:
(13.1)
where Di is a dummy variable, and the Xs, s, and
are typical independent variables, regression
coefficients, and an error term, respectively
The term linear probability model comes from the
fact that the right side of the equation is linear while
the expected value of the left side measures the
probability that Di = 1
11-7
13-7
The linear probability model, ctd.

In the linear probability model, the predicted value of Y is
interpreted as the predicted probability thatY=1, and 1 is the
change in that predicted probability for a unit change in X.
Heres the math:
Linear probability model: Yi = 0 + 1Xi + ui
When Y is binary,
E(Y|X) = 1Pr(Y=1|X) + 0Pr(Y=0|X) = Pr(Y=1|X)
Under LS assumption #1, E(ui|Xi) = 0, so
E(Yi|Xi) = E(0 + 1Xi + ui|Xi) = 0 + 1Xi,
so
Pr(Y=1|X) = 0 + 1Xi
11-8
The linear probability model, ctd.

When Yisbinary,thelinearregressionmodelYi=0 + 1Xi+ui
is called the linear probability model because
Pr(Y=1|X)=0 + 1Xi
The predicted value is a probability:

E(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x
Y = the predicted probability that Yi = 1, given X
1 = change in probability that Y = 1 for a unit change

in x:
1 = Pr(Y 1| X x x) Pr(Y 1| X x)
11-9
Example: linear probability model, HMDA data

Mortgage denial v. ratio of debt payments to income
(P/I ratio) in a subset of the HMDA data set (n = 127)
11-10
Linear probability model: full HMDA data

set
deny
= -.080 + .604P/Iratio
(n = 2380)
(.032) (.098)
What is the predicted value when P/I ratio = .3?
deny 1| P / Iratio .3) = -.080 + .604.3 = .151
Pr(
Calculating effects: increase P/Iratio from .3 to .4:

= -.080 + .604.4 = .212
deny 1| P / Iratio .4
Pr(
.3)
The effect on the probability of denial of an increase in P/Iratio
from .3 to .4 is to increase the probability by .061, that is,
by 6.1 percentagepoints (what?).
11-11
Linear probability model: HMDA data, ctd

Next include black as a regressor:
= -.091 + .559P/Iratio + .177black
(.032) (.098)
(.025)
deny
Predicted probability of denial:
for black applicant with P/Iratio = .3:
= -.091 + .559.3 + .1771 = .254
deny 1)
Pr(for
white applicant, P/Iratio = .3:
= -.091 + .559.3 + .1770 = .077
difference
Pr(
deny 1) = .177 = 17.7 percentage points
Coefficient on black is significant at the 5% level
Stillplentyofroomforomittedvariablebias
Credithistory,earningspotential,etc
11-12
The linear probability model: Summary

The linear probability model models Pr(Y=1|X) as a linear
function of X
Advantages:
simple to estimate and to interpret
inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)
Disadvantages:
A LPM says that the change in the predicted probability for a
given change in X is the same for all values of X, but that
doesnt make sense. Think about the HMDA example
Also, LPM predicted probabilities can be <0 or >1!
These disadvantages can be solved by using a nonlinear

probability model: probit and logit regression
11-13
Probit and Logit Regression

(SW Section 11.2)
The problem with the linear probability model is that
it models the probability of Y=1 as being linear:
Pr(Y = 1|X) = 0 + 1X
Instead, we want:
I.Pr(Y = 1|X) to be increasing in X for 1>0, and
II.0 Pr(Y = 1|X) 1 for all X
This requires using a nonlinear functional form for
the probability. How about an S-curve
11-14
A. The PROBIT
Regression
11-15
13-
The probit model satisfies these conditions:

I.
Pr(Y = 1|X) to be increasing in X for 1>0, and
II. 0 Pr(Y = 1|X) 1 for all X

11-16
Probit regression models the probability that Y=1 using

the cumulative standard normal distribution function,
(z), evaluated at z = 0 + 1X.
The probit regression model is,
Pr(Y = 1|X) = (0 + 1X)
where
is the cumulative normal distribution function and
z = 0 + 1X is the z-value or z-index of the probit
model.
Example: Suppose 0 = -2, 1= 3, X = .4, so
Pr(Y = 1|X=.4) = (-2 + 3.4) = (-0.8)
Pr(Y = 1|X=.4) = area under the standard normal density
to left of z = -.8, which is
11-17
Pr(z -0.8) = .2119

11-18
Probit regression, ctd.
Why use the cumulative normal probability distribution?
The S-shape gives us what we want:

Pr(Y = 1|X) is increasing in X for 1>0
0 Pr(Y = 1|X) 1 for all X
Easy to use the probabilities are tabulated in the

cumulative normal tables (and also are easily computed
using regression software)
Relatively straightforward interpretation:
0 + 1X = z-value
0 + X is the predicted z-value, given X
1
1 is the change in the z-value for a unit change in X(notethat
thisisNOTtheregular1unitincreaseinXisassociatedwitha
1changeinY)
11-19
STATA Example: HMDA data

. probit deny p_irat, r;
Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
Probit estimates
likelihood
likelihood
likelihood
likelihood
= -872.0853
= -835.6633
= -831.80534
= -831.79234
Well discuss this later
Number of obs
=
2380
Wald chi2(1)
=
40.68
Prob > chi2
=
0.0000
Log likelihood = -831.79234
Pseudo R2
=
0.0462
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------p_irat |
2.967908
.4653114
6.38
0.000
2.055914
3.879901
_cons | -2.194159
.1649721
-13.30
0.000
-2.517499
-1.87082
------------------------------------------------------------------------------
deny 1| P / Iratio)
Pr(
= (-2.19 + 2.97P/Iratio)
(.16) (.47)
11-20
STATA Example: HMDA data, ctd.

deny 1| P / Iratio)
Pr(
= (-2.19 + 2.97P/Iratio)
(.16)
(.47)
Positive coefficient with no numerical interpretation: Doesthis
makesense?
Standard errors have the usual interpretation
Use probit to calculate:
Probability at a point in time
Change in probability when ONE regressor changes (all else constant)
Predicted probabilities:
deny 1| P / Iratio .3) = (-2.19+2.97.3)= (-1.30) = .097
Pr(
Effect of change in P/Iratio from .3 to .4:
= (-2.19+2.97.4)= (-1.00) = .159
Pr(deny 1| P / Iratio .4)

Predicted probability of denial rises from .097 to .159
11-21
Probit regression with multiple regressors

Pr(Y = 1|X1, X2) = (0 + 1X1 + 2X2)
is the cumulative normal distribution
function.
z = 0 + 1X1 + 2X2 is the z-value or zindex of the probit model.
1 is the effect on the z-score of a unit
change in X1, holding constant X2
11-22

. probit deny p_irat black, r;
Iteration 0:
log likelihood = -872.0853
Iteration 1:
Iteration 2:
Iteration 3:
Probit estimates
Number of obs
=
2380
Wald chi2(2)
=
118.18
Prob > chi2
=
0.0000
Pseudo R2
=
0.0859
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------
Wellgothroughtheestimationdetailslater
11-23
STATA: Calculate predicted probit

probabilities
. probit deny p_irat black, r;
Probit estimates
Number of obs
=
2380
Wald chi2(2)
=
118.18
Prob > chi2
=
0.0000
Pseudo R2
=
0.0859
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0
. display "Pred prob, p_irat=.3, white: " normprob(z1)
Pred prob, p_irat=.3, white: .07546603
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
11-24
Calculating Significance and Estimated

Effects
deny 1| P / I , black )
Pr(
= (-2.26 + 2.74P/Iratio + .71black)
(.16)
(.44)
(.08)
Is the coefficient on black statistically significant?

Estimated effect of race for P/I ratio = .3:
deny 1| .3,1)
= (-2.26+2.74.3+.711) = .233
Pr(
=0)
(-2.26+2.74.3+.710) = .075
deny 1| .3,
Pr(
Difference in rejection probabilities = .158 (15.8

percentage points)
Stillplentyofroomforomittedvariablebias!
11-25
More in class practice and examples
__________________
11-26
STATA Predicted probit probabilities

So far, by hand, we could only compute predicted
probabilities by introducing numerical values
for our regressors.
STATA COMES TO THE RESCUE

In our STATA problem set, we are going to learn
how to use STATA to give us the estimated
effect of a 1 unit increase in X, holding all else
constant, on the probability of Y=1.
(mfx command)
11-27
STATA Predicted probit probabilities

mfx, predict(p)
Marginal effects after probit

y = Pr(deny) (predict, p)
= .10548556
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------pi_rat |
.5001945
.07942
6.30
0.000
.34454 .655849
.330814
black*|
.1716895
.02466
6.96
0.000
.123349
.22003
.142437
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1
1. How do we interpret these coefficients in words?

2. What does * mean in words?
11-28
B. The LOGIT
Regression
11-29
13-
Logit Regression
Logit regression models the probability of Y=1,
given X, as the cumulative standard logistic
distribution function, evaluated at z = 0 + 1X:
Pr(Y = 1|X) = F(0 + 1X)
where F is the cumulative logistic distribution
function:
1
( 0 1 X )
F(0 + 1X) =
1 e
Because logit and probit use different probability

functions, the coefficients (s) are different in logit
and probit.
11-30
11-31
13-
Logit regression, ctd.

Pr(Y = 1|X) = F(0 + 1X)
where F(0 + 1X) =
1
1 e
( 0 1 X )
Example:
Again, we need specific numerical values:
0 = -3, 1= 2, X = .4,
so 0 + 1X = -3 + 2X.4 = -2.2 so
Pr(Y = 1|X=.4) = 1/(1+e(2.2)) = .0998
11-32

.
logit deny p_irat black, r;
Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
Iteration 4:
log
Logit estimates
likelihood
likelihood
likelihood
likelihood
likelihood
= -872.0853
= -806.3571
= -795.74477
= -795.69521
= -795.69521
Later
Number of obs
=
2380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------p_irat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------
dis "Pred prob, p_irat=.3, white: ">

+_b[p_irat]*.3+_b[black]*0)));
1/(1+exp(-(_b[_cons]
Pred prob, p_irat=.3, white: .07485143
NOTE:
the probit predicted probability is .07546603
11-33
Interpreting Estimated Logit Coefficients (cont.)
Probability at a point in time
Change in probability when ONE regressor

changes (all else constant)
11-34

We run the same model again, but with the logit regression
logit deny pi_rat black, r
Logistic regression
Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Log pseudolikelihood = -795.69521
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------
1. Discuss the significance of the coefficients in the regression

2. How do the coefficients differ from the Probit Regression ?
11-35
The predicted probabilities from the probit and logit

models are very close in these HMDA regressions:
11-36

Logistic regression
Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------
3. Why are the coefficients not the same as under the Probit
Regression ?
4. Please interpret the coefficients in words
5. Calculate the predicted probability for a white person with a
pi_rat of .38
11-37
STATA Predicted logit probabilities

So far, by hand, we could only compute predicted
probabilities by introducing numerical values
for our regressors.
STATA COMES TO THE RESCUE

In our STATA problem set, we are going to learn
how to use STATA to give us the estimated
effect of a 1 unit increase in X, holding all else
constant, on the probability of Y=1.
(mfx command)
11-38

mfx, predict(p)
Marginal effects after logit
y = Pr(deny) (predict, p)
= .10269081
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------pi_rat |
.4948542
.08524
5.81
0.000
.327794 .661914
.330814
black*|
.1670805
.02454
6.81
0.000
.118974 .215187
.142437
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
1. How do we interpret these coefficients in

words?
2. What does * mean in words?
3. Are the marginal effects different from the
Probit Regression ?
11-39
Logit regression, ctd.

Why bother with logit if we have probit?
The main reason is historical: logit is
computationally faster & easier, but that
doesnt matter nowadays
In practice, logit and probit are very similar
since empirical results typically dont
hinge on the logit/probit choice, both tend
to be used in practice
11-40
Measures of Fit for Logit and Probit

2
2
The R and R dont make sense here (why?). So, two
other specialized measures are used:
1. The fraction correctly predicted = fraction of Ys

for which the predicted probability is >50% when
Yi=1, or is <50% when Yi=0.
Not reported in our regression output, so we learn how to
calculate in in our Stata Problem Set (estat command)
2. The pseudo-R2 measures the improvement in the

value of the log likelihood, relative to having no Xs
(see SW App. 11.2). The pseudo-R2 simplifies to the
R2 in the linear model with normally distributed
errors.
Reported in our regression output in Stata
11-41

probit deny p_irat black, r;
Probit estimates
Same as F-test value

in linear regression
Number of obs
Wald chi2(2)
Prob > chi2
=
=
=
2380
118.18
0.0000
Pseudo R2
0.0859
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------p_irat |
2.741637
.4441633
6.17
0.000
1.871092
3.612181
black |
.7081579
.0831877
8.51
0.000
.545113
.8712028
_cons | -2.258738
.1588168
-14.22
0.000
-2.570013
-1.947463
------------------------------------------------------------------------------
1. How would you interpret the coefficients in words ?

2. Which coefficients are significant and which are not ?
11-42

Logistic regression
Number of obs
=
2,380
Wald chi2(2)
=
117.75
Prob > chi2
=
0.0000
Pseudo R2
=
0.0876
-----------------------------------------------------------------------------|
Robust
deny |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------pi_rat |
5.370362
.9633435
5.57
0.000
3.482244
7.258481
black |
1.272782
.1460986
8.71
0.000
.9864339
1.55913
_cons | -4.125558
.345825
-11.93
0.000
-4.803362
-3.447753
------------------------------------------------------------------------------
Discuss the measures of fit of the logistic regression ? Are any

missing from the Stata output ?
11-43
More in class examples and practice
1. Lets interpret every coefficient from models (1)-(7)

2. Can you tell without any additional info which of the models
should not be reported in a paper ?
3. What additional information do you need to decide which
model is best ?
11-44
Conclusion
(SW Section 11.5)
If Yi is binary, then E(Y| X) = Pr(Y=1|X)
Three models:
linear probability model (linear multiple regression)
probit (cumulative standard normal distribution)
logit (cumulative standard logistic distribution)
LPM, probit, logit all produce predicted probabilities

Effect of X is change in conditional probability that Y=1.
For logit and probit, this depends on the initial X
Probit and logit are estimated via maximum likelihood
Coefficients are normally distributed for large n
Large-n hypothesis testing, conf. intervals is as usual
11-45
CDF calculator online

http://
www.danielsoper.com/statcalc3/calc.aspx?id
=53
11-46

Econometrics Chapter 11 PPT Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Chapter 11 PPT Slides

Uploaded by

Copyright:

Available Formats

Chapter 11

Copyright 2015 Pearson, Inc. All rights reserved.

Copyright 2015, Pearson, Inc. All rights reserved.

1. What was the question all pollees were asked ?

Binary Dependent Variables: Whats

Copyright 2015, Pearson, Inc. All rights reserved.

Example: Mortgage Denial and Race

Copyright 2015, Pearson, Inc. All rights reserved.

Binary Dependent Variables and the Linear

What does the line 0 + 1X mean when Y is binary?

Copyright 2015, Pearson, Inc. All rights reserved.

The Linear Probability Model

The linear probability model, ctd.

The linear probability model, ctd.

The predicted value is a probability:

1 = change in probability that Y = 1 for a unit change

Copyright 2015, Pearson, Inc. All rights reserved.

Example: linear probability model, HMDA data

Copyright 2015, Pearson, Inc. All rights reserved.

Linear probability model: full HMDA data

Calculating effects: increase P/Iratio from .3 to .4:

Copyright 2015, Pearson, Inc. All rights reserved.

Linear probability model: HMDA data, ctd

Copyright 2015, Pearson, Inc. All rights reserved.

The linear probability model: Summary

These disadvantages can be solved by using a nonlinear

Copyright 2015, Pearson, Inc. All rights reserved.

Probit and Logit Regression

The probit model satisfies these conditions:

Pr(Y = 1|X) to be increasing in X for 1>0, and

II. 0 Pr(Y = 1|X) 1 for all X

Probit regression models the probability that Y=1 using

Pr(z -0.8) = .2119

Probit regression, ctd.

Why use the cumulative normal probability distribution?

The S-shape gives us what we want:

Easy to use the probabilities are tabulated in the

STATA Example: HMDA data

Well discuss this later

Copyright 2015, Pearson, Inc. All rights reserved.

STATA Example: HMDA data, ctd.

Pr(deny 1| P / Iratio .4)

Probit regression with multiple regressors

STATA Example: HMDA data

Copyright 2015, Pearson, Inc. All rights reserved.

STATA: Calculate predicted probit

Calculating Significance and Estimated

Is the coefficient on black statistically significant?

Difference in rejection probabilities = .158 (15.8

More in class practice and examples

Copyright 2015, Pearson, Inc. All rights reserved.

STATA Predicted probit probabilities

STATA COMES TO THE RESCUE

STATA Predicted probit probabilities

Marginal effects after probit

1. How do we interpret these coefficients in words?

Because logit and probit use different probability

Copyright 2015, Pearson, Inc. All rights reserved.

Logit regression, ctd.

Copyright 2015, Pearson, Inc. All rights reserved.

STATA Example: HMDA data

logit deny p_irat black, r;

dis "Pred prob, p_irat=.3, white: ">