You are on page 1of 69

Regression with a Binary Dependent Variable

(SW Ch. 9)
So far the dependent variable (Y) has been continuous:
district-wide average test score
traffic fatality rate
But we might want to understand the effect of X on a
binary variable:
Y = get into college, or not
Y = person smokes, or not
Y = mortgage application is accepted, or not
-!
Example !ortgage denial and ra"e
#he Boston $ed %!D& data set
"ndividual applications for single-family mortgages
made in !# in the greater Boston area
$%&# observations, collected under 'ome (ortgage
)isclosure *ct ('()*)
Variables
)ependent variable:
o"s the mortgage denied or accepted+
"ndependent variables:
oincome, wealth, employment status
oother loan, property characteristics
-$
orace of applicant
-%
#he 'inear (robability !odel
(SW Se"tion 9.))
* natural starting point is the linear regression model
with a single regressor:
Y
i
=
#
,
!
X
i
, u
i
But:
-hat does
!
mean when Y is binary+ "s
!
=
Y
X

+
-hat does the line
#
,
!
X mean when Y is binary+
-hat does the predicted value
.
Y
mean when Y is
binary+ /or e0ample, what does
.
Y
= #1$2 mean+
-3
#he linear probability model* "td.
Y
i
=
#
,
!
X
i
, u
i
4ecall assumption 5!: E(u
i
6X
i
) = #, so
E(Y
i
6X
i
) = E(
#
,
!
X
i
, u
i
6X
i
) =
#
,
!
X
i
-hen Y is binary,
E(Y) = !7r(Y=!) , #7r(Y=#) = 7r(Y=!)
so
E(Y6X) = 7r(Y=!6X)
-8
#he linear probability model* "td.
-hen Y is binary, the linear regression model
Y
i
=
#
,
!
X
i
, u
i
is called the linear probability model1
9he predicted value is a probability:
oE(Y6X=x) = 7r(Y=!6X=x) = prob1 that Y = ! given x
o
.
Y
= the predicted probability that Y
i
= !, given X

!
= change in probability that Y = ! for a given x:

!
=
7r( !6 ) 7r( !6 ) Y X x x Y X x
x
+

:0ample: linear probability model, '()* data


-2
!ortgage denial +. ratio o, debt payments to in"ome
((-. ratio) in the %!D& data set (s/bset)
-;
'inear probability model %!D& data

deny = -1#&# , 12#3P/I ratio (n = $%&#)


(1#%$) (1#&)
-hat is the predicted value for P/I ratio = 1%+

7r( !6 < 1%) deny P Iratio = -1#&# , 12#31% = 1!8!


=alculating >effects:? increase P/I ratio from 1% to 13:

7r( !6 < 13) deny P Iratio = -1#&# , 12#313 = 1$!$


9he effect on the probability of denial of an increase
in P/I ratio from 1% to 13 is to increase the probability
by 1#2!, that is, by 21! percentage points (what?)1
-&
@e0t include black as a regressor:

deny = -1#! , 188P/I ratio , 1!;;black


(1#%$) (1#&) (1#$8)
7redicted probability of denial:
for black applicant with P/I ratio = 1%:

7r( !) deny = -1#! , 1881% , 1!;;! = 1$83


for white applicant, P/I ratio = 1%:

7r( !) deny = -1#! , 1881% , 1!;;# = 1#;;


difference = 1!;; = !;1; percentage points
=oefficient on black is significant at the 8A level
Still plenty of room for omitted ariable bias!
-
#he linear probability model S/mmary
(odels probability as a linear function of X
*dvantages:
osimple to estimate and to interpret
oinference is the same as for multiple regression
(need heteroskedasticity-robust standard errors)
)isadvantages:
o)oes it make sense that the probability should be
linear in B+
o7redicted probabilities can be C# or D!E
-!#
9hese disadvantages can be solved by using a nonlinear
probability model: probit and logit regression
(robit and 'ogit Regression
(SW Se"tion 9.0)
9he problem with the linear probability model is that it
models the probability of Y=! as being linear:
7r(Y = !6X) =
#
,
!
X
"nstead, we want:
# F 7r(Y = !6X) F ! for all X
7r(Y = !6X) to be increasing in X (for
!
D#)
-!!
9his reGuires a nonlinear functional form for the
probability1 'ow about an >S-curve?H
-!$

9he probit model satisfies these conditions:
# F 7r(Y = !6X) F ! for all X
-!%
7r(Y = !6X) to be increasing in X (for
!
D#)
-!3
Probit regression models the probability that Y=! using
the cumulative standard normal distribution function,
evaluated at " =
#
,
!
X:
7r(Y = !6X) = (
#
,
!
X)
is the cumulative normal distribution function1
" =
#
,
!
X is the >"-value? or >"-inde0? of the
probit model1
Example: Suppose
#
= -$,
!
= %, X = 13, so
7r(Y = !6X=13) = (-$ , %13) = (-#1&)
7r(Y = !6X=13) = area under the standard normal density
to left of " = -1&, which isH
-!8
7r(# F -#1&) = 1$!!
-!2
(robit regression* "td.
-hy use the cumulative normal probability distribution+
9he >S-shape? gives us what we want:
o# F 7r(Y = !6X) F ! for all X
o7r(Y = !6X) to be increasing in X (for
!
D#)
:asy to use I the probabilities are tabulated in the
cumulative normal tables
4elatively straightforward interpretation:
o"-value =
#
,
!
X
o
#
.
,
!
.
X is the predicted "-value, given X
-!;
o
!
is the change in the "-value for a unit change
in X
-!&
S$%$% Example: '()* data
. probit deny p_irat, r;
Iteration 0: log likelihood = -872.0853 Well di!"#!! thi! later
Iteration $: log likelihood = -835.%%33
Iteration 2: log likelihood = -83$.8053&
Iteration 3: log likelihood = -83$.7'23&
(robit e!ti)ate! *#)ber o+ ob! = 2380
Wald "hi2,$- = &0.%8
(rob . "hi2 = 0.0000
/og likelihood = -83$.7'23& (!e#do 02 = 0.0&%2
------------------------------------------------------------------------------
1 0ob#!t
deny 1 2oe+. 3td. 4rr. 5 (.151 6'57 2on+. Inter8al9
-------------:----------------------------------------------------------------
p_irat 1 2.'%7'08 .&%53$$& %.38 0.000 2.055'$& 3.87''0$
_"on! 1 -2.$'&$5' .$%&'72$ -$3.30 0.000 -2.5$7&'' -$.87082
------------------------------------------------------------------------------

7r( !6 < ) deny P Iratio = (-$1! , $1;P/I ratio)


(1!2) (13;)
-!
S$%$% Example: '()* data, ctd1

7r( !6 < ) deny P Iratio = (-$1! , $1;P/I ratio)


(1!2) (13;)
7ositive coefficient: does this make sense+
Standard errors have usual interpretation
7redicted probabilities:

7r( !6 < 1%) deny P Iratio = (-$1!,$1;1%)


= (-!1%#) = 1#;
:ffect of change in P/I ratio from 1% to 13:

7r( !6 < 13) deny P Iratio = (-$1!,$1;13) = 1!8


7redicted probability of denial rises from 1#; to 1!8
-$#
(robit regression with m/ltiple regressors
7r(Y = !6X
!
, X
$
) = (
#
,
!
X
!
,
$
X
$
)
is the cumulative normal distribution function1
" =
#
,
!
X
!
,
$
X
$
is the >"-value? or >"-inde0? of the
probit model1

!
is the effect on the "-score of a unit change in X
!
,
holding constant X
$
-$!
S$%$% Example: '()* data
. probit deny p_irat bla"k, r;
Iteration 0: log likelihood = -872.0853
Iteration $: log likelihood = -800.8850&
Iteration 2: log likelihood = -7'7.$&78
Iteration 3: log likelihood = -7'7.$3%0&
(robit e!ti)ate! *#)ber o+ ob! = 2380
Wald "hi2,2- = $$8.$8
(rob . "hi2 = 0.0000
/og likelihood = -7'7.$3%0& (!e#do 02 = 0.085'
------------------------------------------------------------------------------
1 0ob#!t
deny 1 2oe+. 3td. 4rr. 5 (.151 6'57 2on+. Inter8al9
-------------:----------------------------------------------------------------
p_irat 1 2.7&$%37 .&&&$%33 %.$7 0.000 $.87$0'2 3.%$2$8$
bla"k 1 .708$57' .083$877 8.5$ 0.000 .5&5$$3 .87$2028
_"on! 1 -2.258738 .$588$%8 -$&.22 0.000 -2.5700$3 -$.'&7&%3
------------------------------------------------------------------------------
&e'll go through the estimation details later!
-$$
S$%$% Example: predicted probit probabilities
. probit deny p_irat bla"k, r;
(robit e!ti)ate! *#)ber o+ ob! = 2380
Wald "hi2,2- = $$8.$8
(rob . "hi2 = 0.0000
/og likelihood = -7'7.$3%0& (!e#do 02 = 0.085'
------------------------------------------------------------------------------
1 0ob#!t
deny 1 2oe+. 3td. 4rr. 5 (.151 6'57 2on+. Inter8al9
-------------:----------------------------------------------------------------
p_irat 1 2.7&$%37 .&&&$%33 %.$7 0.000 $.87$0'2 3.%$2$8$
bla"k 1 .708$57' .083$877 8.5$ 0.000 .5&5$$3 .87$2028
_"on! 1 -2.258738 .$588$%8 -$&.22 0.000 -2.5700$3 -$.'&7&%3
------------------------------------------------------------------------------
. !"a 5$ = _b6_"on!9:_b6p_irat9;.3:_b6bla"k9;0;
. di!play <(red prob, p_irat=.3, =hite: <nor)prob,5$-;
(red prob, p_irat=.3, =hite: .075&%%03
NOTE
_b6_"on!9 i! the e!ti)ated inter"ept ,-2.258738-
_b6p_irat9 i! the "oe++i"ient on p_irat ,2.7&$%37-
!"a "reate! a ne= !"alar =hi"h i! the re!#lt o+ a "al"#lation
di!play print! the indi"ated in+or)ation to the !"reen
-$%
S$%$% Example: '()* data, ctd1

7r( !6 < , ) deny P I black


= (-$1$2 , $1;3P/I ratio , 1;!black)
(1!2) (133) (1#&)
"s the coefficient on black statistically significant+
:stimated effect of race for P/I ratio = 1%:

7r( !6 1%,!) deny = (-$1$2,$1;31%,1;!!) = 1$%%

7r( !6 1%, #) deny = (-$1$2,$1;31%,1;!#) = 1#;8


)ifference in reJection probabilities = 1!8& (!81&
percentage points)
Still plenty of room still for omitted ariable bias!
-$3
'ogit regression

Logit regression models the probability of Y=! as the
cumulative standard logistic distribution function,
evaluated at " =
#
,
!
X:
7r(Y = !6X) = ((
#
,
!
X)
( is the cumulative logistic distribution function:
((
#
,
!
X) =
# !
( )
!
!
X
e
+
+
-$8
'ogisti" regression* "td.
7r(Y = !6X) = ((
#
,
!
X)
where ((
#
,
!
X) =
# !
( )
!
!
X
e
+
+
1
Example:
#
= -%,
!
= $, X = 13,
so
#
,
!
X = -% , $13 = -$1$ so
7r(Y = !6X=13) = !<(!,e
I(I$1$)
) = 1#&
-hy bother with logit if we have probit+
'istorically, numerically convenient
"n practice, very similar to probit
-$2
S$%$% Example: '()* data
. logit deny p_irat bla"k, r;
Iteration 0: log likelihood = -872.0853 /ater>
Iteration $: log likelihood = -80%.357$
Iteration 2: log likelihood = -7'5.7&&77
Iteration 3: log likelihood = -7'5.%'52$
Iteration &: log likelihood = -7'5.%'52$
/ogit e!ti)ate! *#)ber o+ ob! = 2380
Wald "hi2,2- = $$7.75
(rob . "hi2 = 0.0000
/og likelihood = -7'5.%'52$ (!e#do 02 = 0.087%
------------------------------------------------------------------------------
1 0ob#!t
deny 1 2oe+. 3td. 4rr. 5 (.151 6'57 2on+. Inter8al9
-------------:----------------------------------------------------------------
p_irat 1 5.3703%2 .'%33&35 5.57 0.000 3.&822&& 7.258&8$
bla"k 1 $.272782 .$&%0'8% 8.7$ 0.000 .'8%&33' $.55'$3
_"on! 1 -&.$25558 .3&5825 -$$.'3 0.000 -&.8033%2 -3.&&7753
------------------------------------------------------------------------------
. di! <(red prob, p_irat=.3, =hite: <
. $?,$:e@p,-,_b6_"on!9:_b6p_irat9;.3:_b6bla"k9;0---;
(red prob, p_irat=.3, =hite: .07&85$&3
NOTE: the probit predi"ted probability i! .075&%%03
-$;
7redicted probabilities from estimated probit and logit
models usually are very close1
-$&
Estimation and .n,eren"e in (robit (and 'ogit)
!odels (SW Se"tion 9.1)
7robit model:
7r(Y = !6X) = (
#
,
!
X)
:stimation and inference
o'ow to estimate
#
and
!
+
o-hat is the sampling distribution of the estimators+
o-hy can we use the usual methods of inference+
/irst discuss nonlinear least s)uares (easier to e0plain)
-$
9hen discuss maximum likelihood estimation (what is
actually done in practice)
-%#
(robit estimation by nonlinear least s2/ares
4ecall KLS:
# !
$
, # !
!
min M ( )N
n
b b i i
i
Y b b X

9he result is the KLS estimators


#
.
and
!
.

"n probit, we have a different regression function I the


nonlinear probit model1 So, we could estimate
#
and
!

by nonlinear least squares:
# !
$
, # !
!
min M ( )N
n
b b i i
i
Y b b X

Solving this yields the nonlinear least squares estimator


of the probit coefficients1
-%!
3onlinear least s2/ares* "td.
# !
$
, # !
!
min M ( )N
n
b b i i
i
Y b b X

'ow to solve this minimiOation problem+


=alculus doesnPt give and e0plicit solution1
(ust be solved numerically using the computer, e1g1
by >trial and error? method of trying one set of values
for (b
#
,b
!
), then trying another, and another,H
Better idea: use specialiOed minimiOation algorithms
"n practice, nonlinear least sGuares isnPt used because it
isnPt efficient I an estimator with a smaller variance isH
-%$
(robit estimation by maxim/m li4elihood
9he likelihood function is the conditional density of Y
!
,
H,Y
n
given X
!
,H,X
n
, treated as a function of the
unknown parameters
#
and
!
1
9he ma0imum likelihood estimator ((L:) is the value
of (
#
,
!
) that ma0imiOe the likelihood function1
9he (L: is the value of (
#
,
!
) that best describe the
full distribution of the data1
"n large samples, the (L: is:
oconsistent
onormally distributed
-%%
oefficient (has the smallest variance of all estimators)
Spe"ial "ase the probit !'E with no X
Y =
! with probability
# with probability !
p
p

'

(Bernoulli distribution)
)ata: Y
!
,H,Y
n
, i1i1d1
)erivation of the likelihood starts with the density of Y
!
:
7r(Y
!
= !) = p and 7r(Y
!
= #) = !Ip
so
7r(Y
!
= y
!
) =
! !
!
(! )
y y
p p

(erify this for y
*
+,- *.)
-%3
Qoint density of (Y
!
,Y
$
):
Because Y
!
and Y
$
are independent,
7r(Y
!
= y
!
,Y
$
= y
$
) = 7r(Y
!
= y
!
) 7r(Y
$
= y
$
)
= M
! !
!
(! )
y y
p p

NM
$ $
!
(! )
y y
p p

N
Qoint density of (Y
!
,11,Y
n
):
7r(Y
!
= y
!
,Y
$
= y
$
,H,Y
n
= y
n
)
= M
! !
!
(! )
y y
p p

NM
$ $
!
(! )
y y
p p

NHM
!
(! )
n n
y y
p p

N
=
( )
!
!
(! )
n
n
i
i i
i
n y
y
p p

9he likelihood is the Joint density, treated as a function of


the unknown parameters, which here is p:
-%8
f(pRY
!
,H,Y
n
) =
( )
!
!
(! )
n
n
i
i i
i
n Y
Y
p p

9he (L: ma0imiOes the likelihood1 "ts standard to work


with the log likelihood, lnMf(pRY
!
,H,Y
n
)N:
lnMf(pRY
!
,H,Y
n
)N =
( ) ( )
! !
ln( ) ln(! )
n n
i i
i i
Y p n Y p

+

!
ln ( R ,111, )
n
d f p Y Y
dp
=
( ) ( )
! !
! !
!
n n
i i
i i
Y n Y
p p

_
+

,

= #
Solving for p yields the (L:R that is,
.
/0E
p satisfies,
-%2
( ) ( )
! !
! !
. . !
n n
i i
/0E /0E
i i
Y n Y
p p

_
+

,

= #
or
( ) ( )
! !
! !
. . !
n n
i i
/0E /0E
i i
Y n Y
p p


or
.
. ! !
/0E
/0E
Y p
Y p


or

.
/0E
p =
Y
= fraction of !Ps
-%;
9he (L: in the >no-X? case (Bernoulli distribution):
.
/0E
p =
Y
= fraction of !Ps
/or Y
i
i1i1d1 Bernoulli, the (L: is the >natural?
estimator of p, the fraction of !Ps, which is
Y
-e already know the essentials of inference:
o "n large n, the sampling distribution of
.
/0E
p =
Y
is
normally distributed
o9hus inference is >as usual:? hypothesis testing via
t-statistic, confidence interval as !12SE
S9*9* note: to emphasiOe reGuirement of large-n, the
printout calls the t-statistic the "-statisticR instead of the
(-statistic, the chi1s)uared statstic (= )()1
-%&
#he probit li4elihood with one X
9he derivation starts with the density of Y
!
, given X
!
:
7r(Y
!
= !6X
!
) = (
#
,
!
X
!
)
7r(Y
!
= #6X
!
) = !I(
#
,
!
X
!
)
so
7r(Y
!
= y
!
6X
!
) =
! !
!
# ! ! # ! !
( ) M! ( )N
y y
X X

+ +
9he probit likelihood function is the Joint density of Y
!
,
H,Y
n
given X
!
,H,X
n
, treated as a function of
#
,
!
:
f(
#
,
!
R Y
!
,H,Y
n
6X
!
,H,X
n
)
= S
! !
!
# ! ! # ! !
( ) M! ( )N
Y Y
X X

+ +
T
HS
!
# ! # !
( ) M! ( )N
n n
Y Y
n n
X X

+ +
T
-%
#he probit li4elihood ,/n"tion:
f(
#
,
!
R Y
!
,H,Y
n
6X
!
,H,X
n
)
= S
! !
!
# ! ! # ! !
( ) M! ( )N
Y Y
X X

+ +
T
HS
!
# ! # !
( ) M! ( )N
n n
Y Y
n n
X X

+ +
T
=anPt solve for the ma0imum e0plicitly
(ust ma0imiOe using numerical methods
*s in the case of no X, in large samples:
o
#
.
/0E
,
!
.
/0E
are consistent
o
#
.
/0E
,
!
.
/0E
are normally distributed (more laterH)
o9heir standard errors can be computed
o9esting, confidence intervals proceeds as usual
/or multiple XPs, see S- *pp1 1$
-3#
#he logit li4elihood with one X
9he only difference between probit and logit is the
functional form used for the probability: is
replaced by the cumulative logistic function1
Ktherwise, the likelihood is similarR for details see
S- *pp1 1$
*s with probit,
o
#
.
/0E
,
!
.
/0E
are consistent
o
#
.
/0E
,
!
.
/0E
are normally distributed
o9heir standard errors can be computed
o9esting, confidence intervals proceeds as usual
-3!
!eas/res o, ,it
9he 2
$
and
$
2
donPt make sense here (why?)1 So, two
other specialiOed measures are used:
!1 9he fraction correctly predicted = fraction of YPs for
which predicted probability is D8#A (if Y
i
=!) or is
C8#A (if Y
i
=#)1
$1 9he pseudo-R
0
measure the fit using the likelihood
function: measures the improvement in the value of
the log likelihood, relative to having no XPs (see S-
*pp1 1$)1 9his simplifies to the 2
$
in the linear
model with normally distributed errors1
-3$
'arge5n distrib/tion o, the !'E (not in SW)
9his is foundation of mathematical statistics1
-ePll do this for the >no-X? special case, for which p is
the only unknown parameter1 'ere are the steps:
!1 )erive the log likelihood (>(p)?) (done)1
$1 9he (L: is found by setting its derivative to OeroR
that reGuires solving a nonlinear eGuation1
%1 /or large n,
.
/0E
p will be near the true p (p
true
) so this
nonlinear eGuation can be appro0imated (locally) by
a linear eGuation (9aylor series around p
true
)1
31 9his can be solved for
.
/0E
p I p
true
1
81 By the Law of Large @umbers and the =L9, for n
large,
n
(
.
/0E
p I p
true
) is normally distributed1
-3%
!1 )erive the log likelihood
2ecall: the density for observation 5! is:
7r(Y
!
= y
!
) =
! !
!
(! )
y y
p p

(density)
so
f(pRY
!
) =
! !
!
(! )
Y Y
p p

(likelihood)
9he likelihood for Y
!
,H,Y
n
is,
f(pRY
!
,H,Y
n
) = f(pRY
!
)H f(pRY
n
)
so the log likelihood is,
(p) = lnf(pRY
!
,H,Y
n
)
= lnMf(pRY
!
)H f(pRY
n
)N
=
!
ln ( R )
n
i
i
f p Y

$1 Set the derivative of (p) to Oero to define the (L::


-33
.
( )
/0E
p
p
p

L
=
!
.
ln ( R )
/0E
n
i
i
p
f p Y
p

= #
%1 Use a 9aylor series e0pansion around p
true
to
appro0imate this as a linear function of
.
/0E
p :
# =
.
( )
/0E
p
p
p

L

( )
true
p
p
p

L
,
$
$
( )
true
p
p
p

L
(
.
/0E
p I p
true
)
-38
31 Solve this linear appro0imation for (
.
/0E
p I p
true
):
( )
true
p
p
p

L
,
$
$
( )
true
p
p
p

L
(
.
/0E
p I p
true
) #
so
$
$
( )
true
p
p
p

L
(
.
/0E
p
I p
true
) I
( )
true
p
p
p

L
or
(
.
/0E
p
I p
true
) I
!
$
$
( )
true
p
p
p

1
]
L
( )
true
p
p
p

L
-32
81 Substitute things in and apply the LL@ and =L91
(p) =
!
ln ( R )
n
i
i
f p Y

( )
true
p
p
p

L
=
!
ln ( R )
true
n
i
i
p
f p Y
p

$
$
( )
true
p
p
p

L
=
$
$
!
ln ( R )
true
n
i
i
p
f p Y
p

so
(
.
/0E
p
I p
true
) I
!
$
$
( )
true
p
p
p

1
]
L
( )
true
p
p
p

L
=
!
$
$
!
ln ( R )
true
n
i
i
p
f p Y
p

1
_

1
,
]

!
ln ( R )
true
n
i
i
p
f p Y
p

-3;
(ultiply through by
n
:
n (
.
/0E
p
I p
true
)

!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

1
_

1
,
]

!
! ln ( R )
true
n
i
i
p
f p Y
p
n

1
_

1
,
]

Because Y
i
is i1i1d1, the i
th
terms in the summands are also
i1i1d1 9hus, if these terms have enough ($) moments, then
under general conditions (not Just Bernoulli likelihood):

$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

a (a constant) (-LL@)

!
! ln ( R )
true
n
i
i
p
f p Y
p
n

_

3(#,
$
ln f

) (=L9) (&hy?)
7utting this together,
-3&
n (
.
/0E
p
I p
true
)

!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

1
_

1
,
]

!
! ln ( R )
true
n
i
i
p
f p Y
p
n

1
_

1
,
]


$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

a (a constant) (-LL@)

!
! ln ( R )
true
n
i
i
p
f p Y
p
n

_

3(#,
$
ln f

) (=L9) (&hy?)
so
n
(
.
/0E
p I p
true
)
d

3(#,
$
ln f

<a
$
) (large-n normal)
Wor4 o/t the details ,or probit-no X (Berno/lli) "ase:
-3
4ecall:
f(pRY
i
) =
!
(! )
i i
Y Y
p p

so
ln f(pRY
i
) = Y
i
lnp , (!IY
i
)ln(!Ip)
and
ln ( , )
i
f p Y
p

=
!
!
i i
Y Y
p p

=
(! )
i
Y p
p p


and
$
$
ln ( , )
i
f p Y
p

=
$ $
!
(! )
i i
Y Y
p p

=
$ $
!
(! )
i i
Y Y
p p
_
+

,

-8#
)enominator term first:
$
$
ln ( , )
i
f p Y
p

=
$ $
!
(! )
i i
Y Y
p p
_
+

,

so
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

=
$ $
!
! !
(! )
n
i i
i
Y Y
n p p

_
+

=
$ $
!
(! )
Y Y
p p


$ $
!
(! )
p p
p p

(LL@)
=
! !
! p p
+

=
!
(! ) p p

-8!
@e0t the numerator:
ln ( , )
i
f p Y
p

=
(! )
i
Y p
p p

so
!
! ln ( R )
true
n
i
i
p
f p Y
p
n

_

=
!
!
(! )
n
i
i
Y p
p p
n

=
!
! !
( )
(! )
n
i
i
Y p
p p
n

_

,

d

3(#,
$
$
M (! )N
Y
p p

)
-8$
7ut these pieces together:
n (
.
/0E
p
I p
true
)
!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p

1
_

1
,
]

!
! ln ( R )
true
n
i
i
p
f p Y
p
n

1
_

1
,
]

where
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p


!
(! ) p p
!
! ln ( R )
true
n
i
i
p
f p Y
p
n

_

3(#,
$
$
M (! )N
Y
p p

)
9hus
n
(
.
/0E
p I p
true
)
d

3(#,
$
Y

)
S/mmary probit !'E* no5X "ase
-8%
9he (L::
.
/0E
p =
Y
-orking through the full (L: distribution theory gave:
n
(
.
/0E
p I p
true
)
d

3(#,
$
Y

)
But because p
true
= 7r(Y = !) = E(Y) =
Y
, this is:
n
(
Y
I
Y
)
d

3(#,
$
Y

)
* familiar result from the first week of classE
-83
#he !'E deri+ation applies generally
n
(
.
/0E
p I p
true
)
d

3(#,
$
ln f

<a
$
))
Standard errors are obtained from working out
e0pressions for
$
ln f

<a
$
:0tends to D! parameter (
#
,
!
) via matri0 calculus
Because the distribution is normal for large n, inference
is conducted as usual, for e0ample, the 8A confidence
interval is (L: !12SE1
9he e0pression above uses >robust? standard errors,
further simplifications yield non-robust standard errors
which apply if
ln ( R ) <
i
f p Y p
is homoskedastic1
-88
S/mmary distrib/tion o, the !'E
(Why did . do this to yo/6)
9he (L: is normally distributed for large n
-e worked through this result in detail for the probit
model with no XPs (the Bernoulli distribution)
/or large n, confidence intervals and hypothesis testing
proceeds as usual
"f the model is correctly specified, the (L: is efficient,
that is, it has a smaller large-n variance than all other
estimators (we didnPt show this)1
9hese methods e0tend to other models with discrete
dependent variables, for e0ample count data (5
crimes<day) I see S- *pp1 1$1
-82
&ppli"ation to the Boston %!D& Data
(SW Se"tion 9.7)
(ortgages (home loans) are an essential part of
buying a home1
"s there differential access to home loans by race+
"f two otherwise identical individuals, one white and
one black, applied for a home loan, is there a
difference in the probability of denial+
-8;
#he %!D& Data Set
)ata on individual characteristics, property
characteristics, and loan denial<acceptance
9he mortgage application process circa !#-!!:
oVo to a bank or mortgage company
o/ill out an application (personal,financial info)
o(eet with the loan officer
9hen the loan officer decides I by law, in a race-blind
way1 7resumably, the bank wants to make profitable
loans, and the loan officer doesnPt want to originate
defaults1
-8&
#he loan o,,i"er8s de"ision
Loan officer uses key financial variables:
oP/I ratio
ohousing e0pense-to-income ratio
oloan-to-value ratio
opersonal credit history
9he decision rule is nonlinear:
oloan-to-value ratio D &#A
oloan-to-value ratio D 8A (what happens in default+)
ocredit score
-8
Regression spe"i,i"ations
7r(deny=!6black, other XPs) = H
linear probability model
probit
(ain problem with the regressions so far: potential
omitted variable bias1 *ll these (i) enter the loan officer
decision function, all (ii) are or could be correlated with
race:
wealth, type of employment
credit history
family status
Wariables in the '()* data setH
-2#
-2!
-2$
-2%
-23
-28
S/mmary o, Empiri"al Res/lts
=oefficients on the financial variables make sense1
4lack is statistically significant in all specifications
4ace-financial variable interactions arenPt significant1
"ncluding the covariates sharply reduces the effect of
race on denial probability1
L7(, probit, logit: similar estimates of effect of race
on the probability of denial1
:stimated effects are large in a >real world? sense1
-22
Remaining threats to internal* external +alidity
"nternal validity
!1 omitted variable bias
what else is learned in the in-person interviews+
$1 functional form misspecification (noH)
%1 measurement error (originally, yesR now, noH)
31 selection
random sample of loan applications
define population to be loan applicants
81 simultaneous causality (no)
:0ternal validity
9his is for Boston in !#-!1 -hat about today+
-2;
S/mmary
(SW Se"tion 9.9)
"f Y
i
is binary, then E(Y6 X) = 7r(Y=!6X)
9hree models:
olinear probability model (linear multiple regression)
oprobit (cumulative standard normal distribution)
ologit (cumulative standard logistic distribution)
L7(, probit, logit all produce predicted probabilities
:ffect of X is change in conditional probability that
Y=!1 /or logit and probit, this depends on the initial X
7robit and logit are estimated via ma0imum likelihood
o=oefficients are normally distributed for large n
-2&
oLarge-n hypothesis testing, conf1 intervals is as usual
-2

You might also like