Professional Documents
Culture Documents
(SW Ch. 9)
So far the dependent variable (Y) has been continuous:
district-wide average test score
traffic fatality rate
But we might want to understand the effect of X on a
binary variable:
Y = get into college, or not
Y = person smokes, or not
Y = mortgage application is accepted, or not
-!
Example !ortgage denial and ra"e
#he Boston $ed %!D& data set
"ndividual applications for single-family mortgages
made in !# in the greater Boston area
$%&# observations, collected under 'ome (ortgage
)isclosure *ct ('()*)
Variables
)ependent variable:
o"s the mortgage denied or accepted+
"ndependent variables:
oincome, wealth, employment status
oother loan, property characteristics
-$
orace of applicant
-%
#he 'inear (robability !odel
(SW Se"tion 9.))
* natural starting point is the linear regression model
with a single regressor:
Y
i
=
#
,
!
X
i
, u
i
But:
-hat does
!
mean when Y is binary+ "s
!
=
Y
X
+
-hat does the line
#
,
!
X mean when Y is binary+
-hat does the predicted value
.
Y
mean when Y is
binary+ /or e0ample, what does
.
Y
= #1$2 mean+
-3
#he linear probability model* "td.
Y
i
=
#
,
!
X
i
, u
i
4ecall assumption 5!: E(u
i
6X
i
) = #, so
E(Y
i
6X
i
) = E(
#
,
!
X
i
, u
i
6X
i
) =
#
,
!
X
i
-hen Y is binary,
E(Y) = !7r(Y=!) , #7r(Y=#) = 7r(Y=!)
so
E(Y6X) = 7r(Y=!6X)
-8
#he linear probability model* "td.
-hen Y is binary, the linear regression model
Y
i
=
#
,
!
X
i
, u
i
is called the linear probability model1
9he predicted value is a probability:
oE(Y6X=x) = 7r(Y=!6X=x) = prob1 that Y = ! given x
o
.
Y
= the predicted probability that Y
i
= !, given X
!
= change in probability that Y = ! for a given x:
!
=
7r( !6 ) 7r( !6 ) Y X x x Y X x
x
+
'
(Bernoulli distribution)
)ata: Y
!
,H,Y
n
, i1i1d1
)erivation of the likelihood starts with the density of Y
!
:
7r(Y
!
= !) = p and 7r(Y
!
= #) = !Ip
so
7r(Y
!
= y
!
) =
! !
!
(! )
y y
p p
(erify this for y
*
+,- *.)
-%3
Qoint density of (Y
!
,Y
$
):
Because Y
!
and Y
$
are independent,
7r(Y
!
= y
!
,Y
$
= y
$
) = 7r(Y
!
= y
!
) 7r(Y
$
= y
$
)
= M
! !
!
(! )
y y
p p
NM
$ $
!
(! )
y y
p p
N
Qoint density of (Y
!
,11,Y
n
):
7r(Y
!
= y
!
,Y
$
= y
$
,H,Y
n
= y
n
)
= M
! !
!
(! )
y y
p p
NM
$ $
!
(! )
y y
p p
NHM
!
(! )
n n
y y
p p
N
=
( )
!
!
(! )
n
n
i
i i
i
n y
y
p p
,
= #
Solving for p yields the (L:R that is,
.
/0E
p satisfies,
-%2
( ) ( )
! !
! !
. . !
n n
i i
/0E /0E
i i
Y n Y
p p
_
+
,
= #
or
( ) ( )
! !
! !
. . !
n n
i i
/0E /0E
i i
Y n Y
p p
or
.
. ! !
/0E
/0E
Y p
Y p
or
.
/0E
p =
Y
= fraction of !Ps
-%;
9he (L: in the >no-X? case (Bernoulli distribution):
.
/0E
p =
Y
= fraction of !Ps
/or Y
i
i1i1d1 Bernoulli, the (L: is the >natural?
estimator of p, the fraction of !Ps, which is
Y
-e already know the essentials of inference:
o "n large n, the sampling distribution of
.
/0E
p =
Y
is
normally distributed
o9hus inference is >as usual:? hypothesis testing via
t-statistic, confidence interval as !12SE
S9*9* note: to emphasiOe reGuirement of large-n, the
printout calls the t-statistic the "-statisticR instead of the
(-statistic, the chi1s)uared statstic (= )()1
-%&
#he probit li4elihood with one X
9he derivation starts with the density of Y
!
, given X
!
:
7r(Y
!
= !6X
!
) = (
#
,
!
X
!
)
7r(Y
!
= #6X
!
) = !I(
#
,
!
X
!
)
so
7r(Y
!
= y
!
6X
!
) =
! !
!
# ! ! # ! !
( ) M! ( )N
y y
X X
+ +
9he probit likelihood function is the Joint density of Y
!
,
H,Y
n
given X
!
,H,X
n
, treated as a function of
#
,
!
:
f(
#
,
!
R Y
!
,H,Y
n
6X
!
,H,X
n
)
= S
! !
!
# ! ! # ! !
( ) M! ( )N
Y Y
X X
+ +
T
HS
!
# ! # !
( ) M! ( )N
n n
Y Y
n n
X X
+ +
T
-%
#he probit li4elihood ,/n"tion:
f(
#
,
!
R Y
!
,H,Y
n
6X
!
,H,X
n
)
= S
! !
!
# ! ! # ! !
( ) M! ( )N
Y Y
X X
+ +
T
HS
!
# ! # !
( ) M! ( )N
n n
Y Y
n n
X X
+ +
T
=anPt solve for the ma0imum e0plicitly
(ust ma0imiOe using numerical methods
*s in the case of no X, in large samples:
o
#
.
/0E
,
!
.
/0E
are consistent
o
#
.
/0E
,
!
.
/0E
are normally distributed (more laterH)
o9heir standard errors can be computed
o9esting, confidence intervals proceeds as usual
/or multiple XPs, see S- *pp1 1$
-3#
#he logit li4elihood with one X
9he only difference between probit and logit is the
functional form used for the probability: is
replaced by the cumulative logistic function1
Ktherwise, the likelihood is similarR for details see
S- *pp1 1$
*s with probit,
o
#
.
/0E
,
!
.
/0E
are consistent
o
#
.
/0E
,
!
.
/0E
are normally distributed
o9heir standard errors can be computed
o9esting, confidence intervals proceeds as usual
-3!
!eas/res o, ,it
9he 2
$
and
$
2
donPt make sense here (why?)1 So, two
other specialiOed measures are used:
!1 9he fraction correctly predicted = fraction of YPs for
which predicted probability is D8#A (if Y
i
=!) or is
C8#A (if Y
i
=#)1
$1 9he pseudo-R
0
measure the fit using the likelihood
function: measures the improvement in the value of
the log likelihood, relative to having no XPs (see S-
*pp1 1$)1 9his simplifies to the 2
$
in the linear
model with normally distributed errors1
-3$
'arge5n distrib/tion o, the !'E (not in SW)
9his is foundation of mathematical statistics1
-ePll do this for the >no-X? special case, for which p is
the only unknown parameter1 'ere are the steps:
!1 )erive the log likelihood (>(p)?) (done)1
$1 9he (L: is found by setting its derivative to OeroR
that reGuires solving a nonlinear eGuation1
%1 /or large n,
.
/0E
p will be near the true p (p
true
) so this
nonlinear eGuation can be appro0imated (locally) by
a linear eGuation (9aylor series around p
true
)1
31 9his can be solved for
.
/0E
p I p
true
1
81 By the Law of Large @umbers and the =L9, for n
large,
n
(
.
/0E
p I p
true
) is normally distributed1
-3%
!1 )erive the log likelihood
2ecall: the density for observation 5! is:
7r(Y
!
= y
!
) =
! !
!
(! )
y y
p p
(density)
so
f(pRY
!
) =
! !
!
(! )
Y Y
p p
(likelihood)
9he likelihood for Y
!
,H,Y
n
is,
f(pRY
!
,H,Y
n
) = f(pRY
!
)H f(pRY
n
)
so the log likelihood is,
(p) = lnf(pRY
!
,H,Y
n
)
= lnMf(pRY
!
)H f(pRY
n
)N
=
!
ln ( R )
n
i
i
f p Y
L
=
!
.
ln ( R )
/0E
n
i
i
p
f p Y
p
= #
%1 Use a 9aylor series e0pansion around p
true
to
appro0imate this as a linear function of
.
/0E
p :
# =
.
( )
/0E
p
p
p
L
( )
true
p
p
p
L
,
$
$
( )
true
p
p
p
L
(
.
/0E
p I p
true
)
-38
31 Solve this linear appro0imation for (
.
/0E
p I p
true
):
( )
true
p
p
p
L
,
$
$
( )
true
p
p
p
L
(
.
/0E
p I p
true
) #
so
$
$
( )
true
p
p
p
L
(
.
/0E
p
I p
true
) I
( )
true
p
p
p
L
or
(
.
/0E
p
I p
true
) I
!
$
$
( )
true
p
p
p
1
]
L
( )
true
p
p
p
L
-32
81 Substitute things in and apply the LL@ and =L91
(p) =
!
ln ( R )
n
i
i
f p Y
( )
true
p
p
p
L
=
!
ln ( R )
true
n
i
i
p
f p Y
p
$
$
( )
true
p
p
p
L
=
$
$
!
ln ( R )
true
n
i
i
p
f p Y
p
so
(
.
/0E
p
I p
true
) I
!
$
$
( )
true
p
p
p
1
]
L
( )
true
p
p
p
L
=
!
$
$
!
ln ( R )
true
n
i
i
p
f p Y
p
1
_
1
,
]
!
ln ( R )
true
n
i
i
p
f p Y
p
-3;
(ultiply through by
n
:
n (
.
/0E
p
I p
true
)
!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
1
_
1
,
]
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
1
_
1
,
]
Because Y
i
is i1i1d1, the i
th
terms in the summands are also
i1i1d1 9hus, if these terms have enough ($) moments, then
under general conditions (not Just Bernoulli likelihood):
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
a (a constant) (-LL@)
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
_
3(#,
$
ln f
) (=L9) (&hy?)
7utting this together,
-3&
n (
.
/0E
p
I p
true
)
!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
1
_
1
,
]
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
1
_
1
,
]
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
a (a constant) (-LL@)
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
_
3(#,
$
ln f
) (=L9) (&hy?)
so
n
(
.
/0E
p I p
true
)
d
3(#,
$
ln f
<a
$
) (large-n normal)
Wor4 o/t the details ,or probit-no X (Berno/lli) "ase:
-3
4ecall:
f(pRY
i
) =
!
(! )
i i
Y Y
p p
so
ln f(pRY
i
) = Y
i
lnp , (!IY
i
)ln(!Ip)
and
ln ( , )
i
f p Y
p
=
!
!
i i
Y Y
p p
=
(! )
i
Y p
p p
and
$
$
ln ( , )
i
f p Y
p
=
$ $
!
(! )
i i
Y Y
p p
=
$ $
!
(! )
i i
Y Y
p p
_
+
,
-8#
)enominator term first:
$
$
ln ( , )
i
f p Y
p
=
$ $
!
(! )
i i
Y Y
p p
_
+
,
so
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
=
$ $
!
! !
(! )
n
i i
i
Y Y
n p p
_
+
=
$ $
!
(! )
Y Y
p p
$ $
!
(! )
p p
p p
(LL@)
=
! !
! p p
+
=
!
(! ) p p
-8!
@e0t the numerator:
ln ( , )
i
f p Y
p
=
(! )
i
Y p
p p
so
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
_
=
!
!
(! )
n
i
i
Y p
p p
n
=
!
! !
( )
(! )
n
i
i
Y p
p p
n
_
,
d
3(#,
$
$
M (! )N
Y
p p
)
-8$
7ut these pieces together:
n (
.
/0E
p
I p
true
)
!
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
1
_
1
,
]
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
1
_
1
,
]
where
$
$
!
! ln ( R )
true
n
i
i
p
f p Y
n p
!
(! ) p p
!
! ln ( R )
true
n
i
i
p
f p Y
p
n
_
3(#,
$
$
M (! )N
Y
p p
)
9hus
n
(
.
/0E
p I p
true
)
d
3(#,
$
Y
)
S/mmary probit !'E* no5X "ase
-8%
9he (L::
.
/0E
p =
Y
-orking through the full (L: distribution theory gave:
n
(
.
/0E
p I p
true
)
d
3(#,
$
Y
)
But because p
true
= 7r(Y = !) = E(Y) =
Y
, this is:
n
(
Y
I
Y
)
d
3(#,
$
Y
)
* familiar result from the first week of classE
-83
#he !'E deri+ation applies generally
n
(
.
/0E
p I p
true
)
d
3(#,
$
ln f
<a
$
))
Standard errors are obtained from working out
e0pressions for
$
ln f
<a
$
:0tends to D! parameter (
#
,
!
) via matri0 calculus
Because the distribution is normal for large n, inference
is conducted as usual, for e0ample, the 8A confidence
interval is (L: !12SE1
9he e0pression above uses >robust? standard errors,
further simplifications yield non-robust standard errors
which apply if
ln ( R ) <
i
f p Y p
is homoskedastic1
-88
S/mmary distrib/tion o, the !'E
(Why did . do this to yo/6)
9he (L: is normally distributed for large n
-e worked through this result in detail for the probit
model with no XPs (the Bernoulli distribution)
/or large n, confidence intervals and hypothesis testing
proceeds as usual
"f the model is correctly specified, the (L: is efficient,
that is, it has a smaller large-n variance than all other
estimators (we didnPt show this)1
9hese methods e0tend to other models with discrete
dependent variables, for e0ample count data (5
crimes<day) I see S- *pp1 1$1
-82
&ppli"ation to the Boston %!D& Data
(SW Se"tion 9.7)
(ortgages (home loans) are an essential part of
buying a home1
"s there differential access to home loans by race+
"f two otherwise identical individuals, one white and
one black, applied for a home loan, is there a
difference in the probability of denial+
-8;
#he %!D& Data Set
)ata on individual characteristics, property
characteristics, and loan denial<acceptance
9he mortgage application process circa !#-!!:
oVo to a bank or mortgage company
o/ill out an application (personal,financial info)
o(eet with the loan officer
9hen the loan officer decides I by law, in a race-blind
way1 7resumably, the bank wants to make profitable
loans, and the loan officer doesnPt want to originate
defaults1
-8&
#he loan o,,i"er8s de"ision
Loan officer uses key financial variables:
oP/I ratio
ohousing e0pense-to-income ratio
oloan-to-value ratio
opersonal credit history
9he decision rule is nonlinear:
oloan-to-value ratio D &#A
oloan-to-value ratio D 8A (what happens in default+)
ocredit score
-8
Regression spe"i,i"ations
7r(deny=!6black, other XPs) = H
linear probability model
probit
(ain problem with the regressions so far: potential
omitted variable bias1 *ll these (i) enter the loan officer
decision function, all (ii) are or could be correlated with
race:
wealth, type of employment
credit history
family status
Wariables in the '()* data setH
-2#
-2!
-2$
-2%
-23
-28
S/mmary o, Empiri"al Res/lts
=oefficients on the financial variables make sense1
4lack is statistically significant in all specifications
4ace-financial variable interactions arenPt significant1
"ncluding the covariates sharply reduces the effect of
race on denial probability1
L7(, probit, logit: similar estimates of effect of race
on the probability of denial1
:stimated effects are large in a >real world? sense1
-22
Remaining threats to internal* external +alidity
"nternal validity
!1 omitted variable bias
what else is learned in the in-person interviews+
$1 functional form misspecification (noH)
%1 measurement error (originally, yesR now, noH)
31 selection
random sample of loan applications
define population to be loan applicants
81 simultaneous causality (no)
:0ternal validity
9his is for Boston in !#-!1 -hat about today+
-2;
S/mmary
(SW Se"tion 9.9)
"f Y
i
is binary, then E(Y6 X) = 7r(Y=!6X)
9hree models:
olinear probability model (linear multiple regression)
oprobit (cumulative standard normal distribution)
ologit (cumulative standard logistic distribution)
L7(, probit, logit all produce predicted probabilities
:ffect of X is change in conditional probability that
Y=!1 /or logit and probit, this depends on the initial X
7robit and logit are estimated via ma0imum likelihood
o=oefficients are normally distributed for large n
-2&
oLarge-n hypothesis testing, conf1 intervals is as usual
-2