You are on page 1of 20

Event Count and Zero-

Inflated Models

Erin Simpson & Michael Horowitz


29 April 2005
Overview
 Event count vs. zero-inflated models
 Varying levels of flexibility
 Diagnostics
What is Count Data?
 Non-negative integers
 Represent the number of occurrences
within a fixed period
 But can parameterize duration or “exposure”
 King (1989) notes that “one of the most
fundamental features of event count data is
that the variance of the count increases
with the expected number of events.”
 Presidential vetoes, US uses of force, war
casualties, number of coups, etc.
Increasing Model Flexibility
 Poisson  Hurdle
 Negative Binomial  Zero-Inflated
 Generalized Poisson
Negative Binomial  Zero-Inflated
 Generalized Event Negative Binomial
Count
Poisson
 Most basic count model
 Has several restrictive assumptions:
 Constant arrival rate
 All events are independent
 One implication of the model
specification is that the mean and
variance are equivalent (μ = V = i)
 Pr(Y = y) = e-λ i y / y! where i = xiβ
Poisson—Math
 yi ~ Poisson (i)
 i = exp(xiβ)
 μ = V = i i (mean-variance equality)

 f(yi|xi) = e- i i yi / yi!
Negative Binomial
 Allows for correction of overdispersion
 Result of contagion (non-independent
observations)
 Random variation over time
(heterogeneity)
 Loosens Poisson restrictions by
allowing arrival rate () to vary
systematically
Negative Binomial--Math
 yi ~ Poisson (i)
 i = exp (xiβ + ui)
 exp(ui) ~ Gamma(1/α, α)
 where α is the overdispersion parameter
 μi = i(as in Poisson)
 ωi = μi + αμip
 α is a constant (Poisson: α = 0)
 p is specified constant (usually 1 or 2)
Negative Binomial—Math (2)
 f(y|μ, α) =
Γ (y + α -1) / Γ (y + 1) Γ (α -1) X

[α -1 / (α -1 + μ)]α^(-1) X

[μ / (α -1 + μ)]y

Reduces to Poisson if α = 0.
Generalized Negative Binomial
 Can further specify the negative binomial
by allowing the alpha term to vary
systematically
 α = exp(zi γ )
 Essentially modeling the variance function
 This allows the dispersion to vary
observation by observation
 Similar to parameterization for truncated
NB models in King (1989: 218-222)
Generalized Event Count
 See King and Signorino (1996)
 Allows for modeling of over- and
under-dispersion, by modeling σ2 and
β simultaneously
 “σ2 is not a variance term…but rather
a positive factor that indicates how
much the variance increases with the
mean.”
 See paper for parameterization
Zero-Inflated Models
 Alternate response to modeling
overdispersion
 Believe that the excessive number of zeros may
be the result of different DGPs.
 Classic example: number of fish caught in
a given park
 Some zeros result from fishing and not catching
any fish
 Some zeros result from not fishing at all
 Zero-inflated models allow one to model
each process separately
 Usually maps logit onto a count model
Zero-inflated Poisson (2)
 yi ~ 0 with prob qi
 yi ~ Poisson (i) with prob 1-qi
 i = xiβ
 qi = (exp(zi γ)) / (1 + exp(zi γ))
Zero-inflated Models (3)
 Models of partial observability—only the
product of two latent variables is observed.
 yi = ziyi*
 where zi is a binary (0/1) variable and yi* is
distributed as Poisson (i) or negative binomial
(i, θ).
 Prob [y=0]=Prob[zi=0] + Prob[zi=1, yi*=0]
 = qi + (1-qi)f(0)
 Prob [yi = k]
 = (1-qi)f(k), k = 1, 2,….
 where f() is either Poisson or NB
Zero-inflated Models (3)
 Generates following PDF
 p(yi) = pi = (1-qi)f(yi) + 1(yi = 0)qi
 For Zero-inflated Poisson:
 E[yi] = Ezi [E[yi|zi]]
 = qi0 + (1-qi)i
 =(1-qi)i
 Var[yi] = Ezi [Var[yi|zi] + Varzi [E[yi|zi]]
 = i(1-qi)[1+iqi]
Zero-inflated Models (4)
 For ZIP (cont)
 Var[yi] / E[yi]
 = 1 + iqi
 = 1 + [qi/(1-qi)]E[yi]
 This is the counterpart to α in the NB model

 For ZINB
 Conditional means are same as NB
Zero-inflated Models (5)
 For ZINB (cont)
 (un)conditional means are same as NB
 Var[yi] = (1-qi)i[1 + (qi + α)i]
 Var[yi] / E[yi] = 1 + [(qi + α)/(1-qi)]E[yi]
 The Var[yi] / E[yi] terms for both models
reveal the degree of overdispersion
 increases in qi: more likely the zero state, the
greater is the overdispersion
 result of both heterogeneity and splitting
Vuong Test
 May want to compare Poisson-ZIP or NB-
ZINB (comparing ZIP-ZINB is straight
forward as models are nested)
 Greene suggests using the Vuong Test.
 V = √Nm / sm
 where mi = log[f1(yi)/f2(yi)]
 and f1 and f2 are competing models
 Large positive values favor model 1, large
negative values favor model 2
 Useful for evaluating fx of heterogeneity
References—Statistics
 Greene, William. 1994. “Accounting for Excess Zeros
and Sample Selection in Poisson and Negative
Binomial Regression Models.” Stern School of
Business Working Paper, EC-94-10.
 King, Gary. 1998. Unifying Political Methodology: The
Likelihood Theory of Statistical Inference. Ann Arbor:
University of Michigan Press.
 Lambert, Diane 1992. “Zero-inflated Poisson
Regression, With an Application to Defects in
Manufacturing.” Technometrics 34: 1-14.
 Vuong, Quang H. 1989. “Likelihood Ratio Tests for
Model Selection and Non-Nested Hypotheses.”
Econometrica 57: 307-333.
References—Political Science
 Clarke, Kevin A. 2003. “Nonparametric Model
Discrimination in International Relations.” Journal of
Conflict Resolution 47 (1): 72-93.
 Pevehouse, Jon. C. “Interdependence Theory and the
Measurement of International Conflict.” The Journal of
Politics 66 (1): 247-266.
 Zorn, J. Christopher. 1996. “Evaluating Zero-Inflated
and Hurdle Poisson Specifications.” Midwest Political
Science Association. April 18-20.
http://web.polmeth.ufl.edu/papers/96/zorn96.pdf

You might also like