You are on page 1of 37

Causal Inference

Marginal Structural Models

Rhoderick Machekano

Center for Health Care Research and Policy


Department of Medicine
Case Western Reserve University
Cleveland, OH 44109
rhoderick.machekano@case.edu

March, 2010
Association

Interest in understanding why values of an outcome variable Y


vary over the units in a population
Y is the response variable - variable to be explained
In associational inference, we are satisfied with discovering how
the values of Y are associated with the values of the variables
defined on the units (attributes A) in our population.
The conditional distribution characterizes how Y values changes
as A varies e.g. E(Y |A = a) = β0 + β1 A
The parameter β1 is the associational parameter
Associational inference consists of estimating and testing the
parameters β using observed data on Y and A

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
2 / 27
Causal Inference

Causal inference addresses comparisons of different treatments if


applied to the same units.
Rubin’s causal model posits existance of potential outcomes for
each unit
CI is a prediction problem - what would have happened under
different treatment options

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
3 / 27
The Potential Outcomes Framework

Describes the responses that would have been observed had an


individual/unit been subjected to each possible treatment
Causal Effect with a binary treatment: y1i − y0i
It is critical that the unit be potentially exposable to any of the
exposures
Example: The method of instruction a student receives can be a
cause of students’ performance in a test, but the student’s race or
gender cannot be a cause.
We observe only one potential outcome corresponding to the
treatment received - the other potential outcome(s) are missing:
Fundamental Problem of Causal Inference

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
4 / 27
Fundamental Problem of Causal Inference1

Hypothetical complete data Observed data

Unit Pretreat Treat Potential Treat Unit Pretreat Treat Potential Treat
i inputs indic ouctomes effect i inputs indic ouctomes effect
Xi Ti y0i y1i y1i − y0i Xi Ti y0i y1i y1i − y0i
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at the
end of the year.

1
Source: Gelman & Hill
R. Machekano ( Center for Health Care Research and Policy
CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
5 / 27
Fundamental Problem of Causal Inference1

Hypothetical complete data Observed data

Unit Pretreat Treat Potential Treat Unit Pretreat Treat Potential Treat
i inputs indic ouctomes effect i inputs indic ouctomes effect
Xi Ti y0i y1i y1i − y0i Xi Ti y0i y1i y1i − y0i
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at the
end of the year.

1
Source: Gelman & Hill
R. Machekano ( Center for Health Care Research and Policy
CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
5 / 27
Fundamental Problem of Causal Inference1

Hypothetical complete data Observed data

Unit Pretreat Treat Potential Treat Unit Pretreat Treat Potential Treat
i inputs indic ouctomes effect i inputs indic ouctomes effect
Xi Ti y0i y1i y1i − y0i Xi Ti y0i y1i y1i − y0i
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at the
end of the year.

1
Source: Gelman & Hill
R. Machekano ( Center for Health Care Research and Policy
CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
5 / 27
Fundamental Problem of Causal Inference1

Hypothetical complete data Observed data

Unit Pretreat Treat Potential Treat Unit Pretreat Treat Potential Treat
i inputs indic ouctomes effect i inputs indic ouctomes effect
Xi Ti y0i y1i y1i − y0i Xi Ti y0i y1i y1i − y0i
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?

Effect of mathematics program of study on fourth graders

We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at the
end of the year.

1
Source: Gelman & Hill
R. Machekano ( Center for Health Care Research and Policy
CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
5 / 27
Approaches to the fundamental problem of causal
inference

Is causal inference impossible?


Two solutions to the fundamental problem:
1 Scientific solution
2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
6 / 27
Approaches to the fundamental problem of causal
inference

Is causal inference impossible?


Two solutions to the fundamental problem:
1 Scientific solution
2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
6 / 27
Approaches to the fundamental problem of causal
inference

Is causal inference impossible?


Two solutions to the fundamental problem:
1 Scientific solution
2 Statistical solution

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
6 / 27
Scientific solution

Exploits homogeneity and invariance assumptions to find close


substitutes
Assuming all remains the same, a measurement before treatment
can be substituted for the other potential outcome
Examples include studies in animals (rats), physical sciences etc

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
7 / 27
Statistical Solution

The average causal effect T is the expected value of the


difference Y1i − Y0i over the units in the population.
T = E(Y1 ) − E(Y0 ) implies information of different units that can
be observed can be used to gain knowledge on T
The observed data only gives us E(YT |T = 1) = E(Y1 |T = 1) and
E(YT |T = 0) = E(Y0 |T = 0)
The way the units get to be selected to the different exposures is
very important
We want to compare outcomes from similar units - randomization
The goal is to have treatment assignment T independent of the
potential outcomes and all other variables

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
8 / 27
Observational Studies

Non-randomized T
Not all studies can be
randomized
Units often end up treated
or not based on
characteristics that are
T Y
predictive of the outcome -
systematic differences
Solution is statistical
adjustments
X

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
9 / 27
Causal Inference in Observational Studies

Assume conditional independence:conditional on the confounding


covariates distribution of treatments across units is random with
respect to the potential outcomes
The distribution of the potential outcomes (y0 , y1 ) is the same
across treatment levels conditioning on confounding variates X
i.e. y0 , y1 ⊥ T | X - ignorability
If choice of treatment is made based on other covariates
predictive of the outcome but are not measured, then we have
non-ignorable treatment mechanism
With ignorability satisfied, we can use regression modeling
adjusting for confounders
Unbalance and lack of overlap creates problems

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
10 / 27
Propensity scores as a solution to observational
studies

Propensity scores as a tool to acheive balance and overlap


through matching
Inverse of Propensity score as weight for a data point

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
11 / 27
Marginal Structrural Models for point treatment
studies

If we a have a binary treatment, exposed unexposed, causal


inference is given by E(Y1 ) − E(Y0 )
When treatment assignment is associated with prognostic factors
X , subjects who are exposed are a selective subgroup and
sample average of their outcomes may systematically over or
underestimate the population mean counterfactual
The selection bias can be corrected when there are no
unmeasured confounders by weighting each unit’s data by the
inverse of the propensity score πi
Pn I(Ti =t)yi
i=1 πti
E(Yt ) = Pn I(Ti =t)
i=1 πti

We can model the counterfactual outcome: E(Yt ) = β0 + β1 t

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
12 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Inverse Probability Weighting

1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
13 / 27
Toy Example of Effect of Weighting

Xi A A A B B B C C C
Y1i 1 1 1 2 2 2 3 3 3
Ti 1 0 0 1 1 1 1 1 0
1 2 2
πi 3
1 1 1 3 3
Ti
πi
3 0 0 1 1 1 1.5 1.5 0

True exposure counterfactual mean E(Y1 ) = 2


13
Mean among exposed E(Y | T = 1) = 6
3x1+0x1+0x1+1x2+1x2+1x2+1.5x3+1.5x3+0x3
IPW mean = 9
=2

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
14 / 27
Example: Complete Data

> head(simdat)
+ tail(simdat)
covar y1 y0 treat outcome
1 0.08317643 1.650394 0.8060006 1 1.650394
2 0.82471339 3.546720 2.5126394 0 2.512639
3 0.98284952 4.650695 2.2191206 1 4.650695
4 0.91146710 3.433277 2.1412135 0 2.141214
5 0.81274144 3.584094 2.4440184 0 2.444018
6 0.71152091 4.019010 2.6391156 1 4.019010
covar y1 y0 treat outcome
45 0.16681624 2.558554 1.155048 1 2.558554
46 0.30581451 2.131925 1.416699 1 2.131925
47 0.49911482 2.943167 2.886796 1 2.943167
48 0.24429569 2.524773 1.514149 1 2.524773
49 0.08419395 1.004389 1.284473 1 1.004389
50 0.65493092 3.387608 2.819581 0 2.819581

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
15 / 27
Treatment Mechanism

1
pr (T = 1|w) = 1+exp −(1−2w)

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
16 / 27
Estimation

Causal Effects from the potential outcomes


> mean(y1)-mean(y0)
[1] 1.103484
Unadjusted estimate
mean(Y[A==1])-mean(Y[A==0])
[1] 0.6900297
Adjusted regression
> display(lm(Y˜A+w))
lm(formula = Y ˜ A + w)
coef.est coef.se
(Intercept) 0.81 0.25
A 1.24 0.19
w 2.11 0.32
---
n = 50, k = 3
residual sd = 0.59, R-Squared = 0.56

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
17 / 27
Inverse Probability of Treatment Weighted
Estimator

We know the true propensity of treat i.e. the probability of receiving treatment or control
given the covariates
Calculate the weights
wt=ifelse(A==1,1/pa,1/(1-pa))

Perform a weighted regression of the outcome on exposure


> display(lm(Y˜A, weight=wt))
lm(formula = Y ˜ A, weights = wt)
coef.est coef.se
(Intercept) 2.11 0.19
A 1.01 0.25
---
n = 50, k = 2
residual sd = 1.18, R-Squared = 0.26

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
18 / 27
Application: Effect of Job Training Program on
earnings

> head(nws)
age educ black married nodegree re75 re78 hisp treat educ_cat4
1 42 16 0 1 0 0.000 100.4854 0 0 4
2 20 13 0 0 0 3317.468 4793.7451 0 0 3
3 37 12 0 1 0 22781.855 25564.6699 0 0 2
4 48 12 0 1 0 20839.355 20550.7441 0 0 2
5 51 12 0 1 0 21575.178 22783.5879 0 0 2
6 18 11 0 0 1 1455.532 2157.4807 0 0 1

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
19 / 27
Examine Exposure-Covariate Association

Covariate Exposed Unexposed p-value


Age 25.8 33.4 < 0.001
Black 84% 9.7% < 0.001
Hispanic 5.6% 6.7% 0.694
Married 19% 73% < 0.001
Non-degreed 71% 30% < 0.001
Education 10 12 < 0.001
1975 Salary 1532 14380 < 0.001

There are systematic differences between subjects exposed to the job training
program compared to those not exposed

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
20 / 27
Naive Estimates - using regression methods

> nws.unadjusted=glm(re78˜treat)
> display(nws.unadjusted)
glm(formula = re78 ˜ treat)
coef.est coef.se
(Intercept) 15750.30 79.84
treat -9401.16 801.98
---
n = 18667, k = 2
residual deviance = 2.198893e+12, null deviance = 2.215081e+12 (differen
overdispersion parameter = 117808349.3
residual sd is sqrt(overdispersion) = 10853.96

> mean(re78[treat==1])-mean(re78[treat==0])
[1] -9401.156

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
21 / 27
Adjusting for pre-treatment covariates

> nws.adjusted=glm(re78˜treat+age+educ+married+black+hisp+nodegree+re75+re
+ display(nws.adjusted)
glm(formula = re78 ˜ treat + age + educ + married + black + hisp +
nodegree + re75 + re74)
coef.est coef.se
(Intercept) 4427.01 449.18
treat 640.50 583.24
age -110.15 5.88
educ 247.02 28.80
married 180.24 144.33
black -387.15 190.58
hisp -50.18 228.22
nodegree 645.35 179.45
re75 0.51 0.01
re74 0.30 0.01
---
n = 18667, k = 10
residual deviance = 1.074142e+12, null deviance = 2.215081e+12 (differen
overdispersion parameter = 57573147.6
residual sd is sqrt(overdispersion) = 7587.70

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
22 / 27
Inverse Weighted Estimator

propensity.model = glm(treat˜age+educ+married+black+hisp+nodegree+re75+re74, family="binomial")

phat = predict(propensity.model, type="response")

wt = ifelse(treat==1,1/phat,1/(1-phat))

st.wt = ifelse(treat==1,mean(treat)/phat,mean(1-treat)/(1-phat))

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
23 / 27
Causal Effect estimation

> nws.weighted=glm(re78˜treat,weights=wt)
> display(nws.weighted)
glm(formula = re78 ˜ treat, weights = wt)
coef.est coef.se
(Intercept) 15647.00 91.82
treat -7116.53 151.84
---
n = 18667, k = 2
residual deviance = 2.937325e+12, null deviance = 3.282995e+12 (differen
overdispersion parameter = 157370755.5
residual sd is sqrt(overdispersion) = 12544.75

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
24 / 27
Distribution of propensity scores

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
25 / 27
Improved causal estimate

> nws.weighted=glm(re78˜treat,weights=wt2)
+ display(nws.weighted)
+
glm(formula = re78 ˜ treat, weights = wt2)
coef.est coef.se
(Intercept) 8037.06 159.77
treat 493.40 204.14
---
n = 6809, k = 2
residual deviance = 1.182667e+12, null deviance = 1.183682e+12 (differen
overdispersion parameter = 173742743.2
residual sd is sqrt(overdispersion) = 13181.15

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
26 / 27
Pros and Cons of IPW estimators

Accounts for selection bias (and more) in observational studies


Is more efficient compared to propensity matching - uses all data
available
Can be extended to time-varying confounders and exposures
Behaves badly in situations where few individuals have very large
weights - increases variance. Happens when background
characteristics are strongly predictive of treatment.
Sensitive to model specification

R. Machekano ( Center for Health Care Research and Policy


CausalDepartment
Inference of Medicine Case Western Reserve
MarchUniversity
2010 Cleveland
27 / 27

You might also like