Professional Documents
Culture Documents
https://unsplash.com/photos/9dI3g8owHiI
This article was co-authored with Duncan Gilchrist. Sample code, along
with basic simulation results, is available on GitHub.
The rst and most intuitive place we look is the raw correlation. Are
users engaging with X more likely to have outcome Y? Unfortunately,
raw correlations alone are rarely actionable. The complicating factor
here is a set of other features that might a ect both X and Y. Economists
call these confounding variables.
Heres the good news: just because we cant always AB test a major
experience doesnt mean we have to y blind when it matters most. A
range of econometric methods can illuminate the causal relationships
at play, providing actionable insights for the path forward.
Econometric Methods
1. Controlled Regression
3. Di erence-in-Di erences
This post will be applied and succinct. For each method, well open
with a high-level overview, run through one or two applications in tech,
and highlight major underlying assumptions or common pitfalls.
Some of these tools will work better in certain situations than others.
Our goal is to get you the baseline knowledge you need to identify
which method to use for the questions that matter to you, and to
implement e ectively.
Lets say we want to know the impact of some existing product feature
e.g., live chat support on an outcome, product sales. The why we
care is hopefully self-evident: if the impact of live chat is large enough
to cover the costs, we want to expand live chat support to boost pro ts;
if its small, were unlikely to expand the feature, and may even
deprecate it altogether to save costs.
Since youth is positively correlated with chat usage and sales, the raw
correlation between chat usage and sales would overstate the causal
relationship. But we can make progress by estimating a regression of
sales on chat usage controlling for age. In R:
But before moving on, lets brie y cover the concept of bad controls, or
why-we-shouldnt-just-throw-in-the-kitchen-sink-and-see-what-sticks
(statistically). Suppose we were concerned about general interest in the
product as a confounder: the more interested the user is, the more she
engages with our features, including live chat, and also the more she
buys from us. We might think that controlling for attributes like the
proportion of emails from us she opens could be used as a proxy for
interest. But insofar as the treatment (engaging in live chat) could in
itself impact this feature (e.g., because she wants to see follow-up
responses from the agent), we would actually be inducing included
variable bias. The take-away? Be wary of controlling for variables that
are themselves not xed at the time the treatment was determined.
library(rdd)
RDestimate(Y ~ D, data = ..., subset = ..., cutpoint = ...)
Results are externally valid if they are unbiased for the full
population.
How can we validate the parallel trends assumption? There are a few
ways to make progress, both before and after rolling out the test.
Ok, so weve designed and rolled out a good experiment, but with
everyone moving fast, stu inevitable happens. Common problems
with DD often come in two forms:
In R:
The two assumptions required for internal validity in RDD apply here as
well. First, after conditioning on the xed e ects, users are as good as
randomly assigned to to their X values in this case, their portfolio
returns. Second, there can be no confounding discontinuities, i.e.,
conditional on the xed e ects, users cannot otherwise be treated
di erently based on their X.
Instrumental variable (IV) methods are perhaps our favorite method for
causal inference. Recall our earlier notation: we are trying to estimate
the causal e ect of variable X on outcome Y, but cannot take the raw
correlation as causal because there exists some omitted variable(s), C.
An instrumental variable, or instrument for short, is a feature or set of
features, Z, such that both of the following are true:
In R:
library(aer)
fit <- ivreg(Y ~ X | Z, data = df)
summary(fit, vcov = sandwich, df = Inf, diagnostics = TRUE)
Looking at the correlation of churn with referrals will of course not give
us the causal e ect. Users who refer their friends are de facto more
committed to our product.
But if our company has a strong referral program, its likely been
running lots of AB tests pushing users to refer more email tests, onsite
banner ad tests, incentives tests, you name it. The IV strategy is to focus
on a successful AB test one that increased referrals and use that
experiments bucketing as an instrument for referring. (If IV sounds a
little like RDD, thats because it is! In fact, IV is sometimes referred to as
Fuzzy RDD.)
IV results are internally valid provided the strong rst stage and
exclusion restriction assumptions (above) are satis ed:
Well likely have a strong rst stage as long as the experiment we
chose was successful at driving referrals. (This is important
because if Z is not a strong predictor of X, the resulting second
stage estimate will be biased.) The R code above reports the F-
statistic, so we can check the strength of our rst stage directly. A
good rule of thumb is that the F-statistic from the rst stage should
be least 11.
A quick note on statistical signi cance in IV. You may have noticed that
the R code to print the IV results isnt just a simple call to summary( t).
Thats because we have be careful about how we compute standard
errors in IV models. In particular, standard errors have to be corrected
to account for the two stage design, which generally makes them
slightly larger.
Want to estimate the causal e ects of X on Y but dont have any good
historical AB tests on hand to leverage? AB tests can also be
implemented speci cally to facilitate IV estimation! These AB tests even
come with a sexy name Randomized Encouragement Trials.
. . .
But its even more fun to improve that future. Where, when, how, and
with whom can we intervene for better outcomes? By shedding light on
the mechanisms driving the outcomes we care about, causal inference
gives us the insights to focus our e orts on investments that better
serve our users and our business.