Causal Inference MSM

Causal Inference
Marginal Structural Models
Rhoderick Machekano
Center for Health Care Research and Policy

Department of Medicine
Case Western Reserve University
Cleveland, OH 44109
rhoderick.machekano@case.edu
March, 2010
Association
Interest in understanding why values of an outcome variable Y

vary over the units in a population
Y is the response variable - variable to be explained
In associational inference, we are satisfied with discovering how
the values of Y are associated with the values of the variables
defined on the units (attributes A) in our population.
The conditional distribution characterizes how Y values changes
as A varies e.g. E(Y |A = a) = β0 + β1 A
The parameter β1 is the associational parameter
Associational inference consists of estimating and testing the
parameters β using observed data on Y and A
R. Machekano ( Center for Health Care Research and Policy

CausalDepartment
Inference of Medicine Case Western Reserve
March
University
2010 Cleveland
2 / 27
Causal Inference
Causal inference addresses comparisons of different treatments if

applied to the same units.
Rubin’s causal model posits existance of potential outcomes for
each unit
CI is a prediction problem - what would have happened under
different treatment options

CausalDepartment
March
University
2010 Cleveland
3 / 27
The Potential Outcomes Framework
Describes the responses that would have been observed had an

individual/unit been subjected to each possible treatment
Causal Effect with a binary treatment: y1i − y0i
It is critical that the unit be potentially exposable to any of the
exposures
Example: The method of instruction a student receives can be a
cause of students’ performance in a test, but the student’s race or
gender cannot be a cause.
We observe only one potential outcome corresponding to the
treatment received - the other potential outcome(s) are missing:
Fundamental Problem of Causal Inference

CausalDepartment
March
University
2010 Cleveland
4 / 27
Fundamental Problem of Causal Inference1
Hypothetical complete data Observed data
Unit Pretreat Treat Potential Treat Unit Pretreat Treat Potential Treat
i inputs indic ouctomes effect i inputs indic ouctomes effect
Xi Ti y0i y1i y1i − y0i Xi Ti y0i y1i y1i − y0i
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?
Effect of mathematics program of study on fourth graders
We wish to compare a novel study program to a standard program of study on fourth graders. Outcome is a score on test at the
end of the year.
1
Source: Gelman & Hill
CausalDepartment
March
University
2010 Cleveland
5 / 27
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?
end of the year.
1
CausalDepartment
March
University
2010 Cleveland
5 / 27
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?
end of the year.
1
CausalDepartment
March
University
2010 Cleveland
5 / 27
1 1 50 0 69 75 6 1 1 50 0 69 ? ?
2 1 98 0 111 108 -3 2 1 98 0 111 ? ?
3 2 80 1 92 102 10 3 2 80 1 ? 102 ?
4 1 98 1 112 111 -1 4 1 98 1 ? 111 ?
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
100 1 104 1 111 114 3 100 1 104 1 ? 114 ?
end of the year.
1
CausalDepartment
March
University
2010 Cleveland
5 / 27
Approaches to the fundamental problem of causal
inference
Is causal inference impossible?

Two solutions to the fundamental problem:
1 Scientific solution
2 Statistical solution

CausalDepartment
March
University
2010 Cleveland
6 / 27
inference


CausalDepartment
March
University
2010 Cleveland
6 / 27
inference


CausalDepartment
March
University
2010 Cleveland
6 / 27
Scientific solution
Exploits homogeneity and invariance assumptions to find close

substitutes
Assuming all remains the same, a measurement before treatment
can be substituted for the other potential outcome
Examples include studies in animals (rats), physical sciences etc

CausalDepartment
March
University
2010 Cleveland
7 / 27
Statistical Solution
The average causal effect T is the expected value of the

difference Y1i − Y0i over the units in the population.
T = E(Y1 ) − E(Y0 ) implies information of different units that can
be observed can be used to gain knowledge on T
The observed data only gives us E(YT |T = 1) = E(Y1 |T = 1) and
E(YT |T = 0) = E(Y0 |T = 0)
The way the units get to be selected to the different exposures is
very important
We want to compare outcomes from similar units - randomization
The goal is to have treatment assignment T independent of the
potential outcomes and all other variables

CausalDepartment
March
University
2010 Cleveland
8 / 27
Observational Studies
Non-randomized T
Not all studies can be
randomized
Units often end up treated
or not based on
characteristics that are
T Y
predictive of the outcome -
systematic differences
Solution is statistical
adjustments
X

CausalDepartment
March
University
2010 Cleveland
9 / 27
Causal Inference in Observational Studies
Assume conditional independence:conditional on the confounding

covariates distribution of treatments across units is random with
respect to the potential outcomes
The distribution of the potential outcomes (y0 , y1 ) is the same
across treatment levels conditioning on confounding variates X
i.e. y0 , y1 ⊥ T | X - ignorability
If choice of treatment is made based on other covariates
predictive of the outcome but are not measured, then we have
non-ignorable treatment mechanism
With ignorability satisfied, we can use regression modeling
adjusting for confounders
Unbalance and lack of overlap creates problems

CausalDepartment
MarchUniversity
2010 Cleveland
10 / 27
Propensity scores as a solution to observational
studies
Propensity scores as a tool to acheive balance and overlap

through matching
Inverse of Propensity score as weight for a data point

CausalDepartment
MarchUniversity
2010 Cleveland
11 / 27
Marginal Structrural Models for point treatment
studies
If we a have a binary treatment, exposed unexposed, causal

inference is given by E(Y1 ) − E(Y0 )
When treatment assignment is associated with prognostic factors
X , subjects who are exposed are a selective subgroup and
sample average of their outcomes may systematically over or
underestimate the population mean counterfactual
The selection bias can be corrected when there are no
unmeasured confounders by weighting each unit’s data by the
inverse of the propensity score πi
Pn I(Ti =t)yi
i=1 πti
E(Yt ) = Pn I(Ti =t)
i=1 πti
We can model the counterfactual outcome: E(Yt ) = β0 + β1 t

CausalDepartment
MarchUniversity
2010 Cleveland
12 / 27
Inverse Probability Weighting
1
1 Calculate weights wi = pr (Ti =t|Xi )
2 Perform a weighted regression of the outcome Y on exposure
variable T
3 Why does this work?
Creats a psuedo-population consisting of wi copies of unit i
In the pseudo-population, treatment T is unconfounded
The distribution of the counterfactuals in the pseudo-population is
the same as in the study population

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
1
variable T

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
1
variable T

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
1
variable T

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
1
variable T

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
1
variable T

CausalDepartment
MarchUniversity
2010 Cleveland
13 / 27
Toy Example of Effect of Weighting
Xi A A A B B B C C C
Y1i 1 1 1 2 2 2 3 3 3
Ti 1 0 0 1 1 1 1 1 0
1 2 2
πi 3
1 1 1 3 3
Ti
πi
3 0 0 1 1 1 1.5 1.5 0
True exposure counterfactual mean E(Y1 ) = 2

13
Mean among exposed E(Y | T = 1) = 6
3x1+0x1+0x1+1x2+1x2+1x2+1.5x3+1.5x3+0x3
IPW mean = 9
=2

CausalDepartment
MarchUniversity
2010 Cleveland
14 / 27
Example: Complete Data
> head(simdat)
+ tail(simdat)
covar y1 y0 treat outcome
1 0.08317643 1.650394 0.8060006 1 1.650394
2 0.82471339 3.546720 2.5126394 0 2.512639
3 0.98284952 4.650695 2.2191206 1 4.650695
4 0.91146710 3.433277 2.1412135 0 2.141214
5 0.81274144 3.584094 2.4440184 0 2.444018
6 0.71152091 4.019010 2.6391156 1 4.019010
covar y1 y0 treat outcome
45 0.16681624 2.558554 1.155048 1 2.558554
46 0.30581451 2.131925 1.416699 1 2.131925
47 0.49911482 2.943167 2.886796 1 2.943167
48 0.24429569 2.524773 1.514149 1 2.524773
49 0.08419395 1.004389 1.284473 1 1.004389
50 0.65493092 3.387608 2.819581 0 2.819581

CausalDepartment
MarchUniversity
2010 Cleveland
15 / 27
Treatment Mechanism
1
pr (T = 1|w) = 1+exp −(1−2w)

CausalDepartment
MarchUniversity
2010 Cleveland
16 / 27
Estimation
Causal Effects from the potential outcomes

> mean(y1)-mean(y0)
[1] 1.103484
Unadjusted estimate
mean(Y[A==1])-mean(Y[A==0])
[1] 0.6900297
Adjusted regression
> display(lm(YÃ+w))
lm(formula = Y ˜ A + w)
coef.est coef.se
(Intercept) 0.81 0.25
A 1.24 0.19
w 2.11 0.32
---
n = 50, k = 3
residual sd = 0.59, R-Squared = 0.56

CausalDepartment
MarchUniversity
2010 Cleveland
17 / 27
Inverse Probability of Treatment Weighted
Estimator
We know the true propensity of treat i.e. the probability of receiving treatment or control
given the covariates
Calculate the weights
wt=ifelse(A==1,1/pa,1/(1-pa))
Perform a weighted regression of the outcome on exposure

> display(lm(YÃ, weight=wt))
lm(formula = Y ˜ A, weights = wt)
coef.est coef.se
(Intercept) 2.11 0.19
A 1.01 0.25
---
n = 50, k = 2
residual sd = 1.18, R-Squared = 0.26

CausalDepartment
MarchUniversity
2010 Cleveland
18 / 27
Application: Effect of Job Training Program on
earnings
> head(nws)
age educ black married nodegree re75 re78 hisp treat educ_cat4
1 42 16 0 1 0 0.000 100.4854 0 0 4
2 20 13 0 0 0 3317.468 4793.7451 0 0 3
3 37 12 0 1 0 22781.855 25564.6699 0 0 2
4 48 12 0 1 0 20839.355 20550.7441 0 0 2
5 51 12 0 1 0 21575.178 22783.5879 0 0 2
6 18 11 0 0 1 1455.532 2157.4807 0 0 1

CausalDepartment
MarchUniversity
2010 Cleveland
19 / 27
Examine Exposure-Covariate Association
Covariate Exposed Unexposed p-value

Age 25.8 33.4 < 0.001
Black 84% 9.7% < 0.001
Hispanic 5.6% 6.7% 0.694
Married 19% 73% < 0.001
Non-degreed 71% 30% < 0.001
Education 10 12 < 0.001
1975 Salary 1532 14380 < 0.001
There are systematic differences between subjects exposed to the job training
program compared to those not exposed

CausalDepartment
MarchUniversity
2010 Cleveland
20 / 27
Naive Estimates - using regression methods
> nws.unadjusted=glm(re78˜treat)
> display(nws.unadjusted)
glm(formula = re78 ˜ treat)
coef.est coef.se
(Intercept) 15750.30 79.84
treat -9401.16 801.98
---
n = 18667, k = 2
residual deviance = 2.198893e+12, null deviance = 2.215081e+12 (differen
overdispersion parameter = 117808349.3
residual sd is sqrt(overdispersion) = 10853.96
> mean(re78[treat==1])-mean(re78[treat==0])
[1] -9401.156

CausalDepartment
MarchUniversity
2010 Cleveland
21 / 27
Adjusting for pre-treatment covariates
> nws.adjusted=glm(re78˜treat+age+educ+married+black+hisp+nodegree+re75+re
+ display(nws.adjusted)
glm(formula = re78 ˜ treat + age + educ + married + black + hisp +
nodegree + re75 + re74)
coef.est coef.se
(Intercept) 4427.01 449.18
treat 640.50 583.24
age -110.15 5.88
educ 247.02 28.80
married 180.24 144.33
black -387.15 190.58
hisp -50.18 228.22
nodegree 645.35 179.45
re75 0.51 0.01
re74 0.30 0.01
---
n = 18667, k = 10

CausalDepartment
MarchUniversity
2010 Cleveland
22 / 27
Inverse Weighted Estimator
propensity.model = glm(treatãge+educ+married+black+hisp+nodegree+re75+re74, family="binomial")
phat = predict(propensity.model, type="response")
wt = ifelse(treat==1,1/phat,1/(1-phat))
st.wt = ifelse(treat==1,mean(treat)/phat,mean(1-treat)/(1-phat))

CausalDepartment
MarchUniversity
2010 Cleveland
23 / 27
Causal Effect estimation
> nws.weighted=glm(re78˜treat,weights=wt)
> display(nws.weighted)
glm(formula = re78 ˜ treat, weights = wt)
coef.est coef.se
(Intercept) 15647.00 91.82
treat -7116.53 151.84
---
n = 18667, k = 2

CausalDepartment
MarchUniversity
2010 Cleveland
24 / 27
Distribution of propensity scores

CausalDepartment
MarchUniversity
2010 Cleveland
25 / 27
Improved causal estimate
> nws.weighted=glm(re78˜treat,weights=wt2)
+ display(nws.weighted)
+
glm(formula = re78 ˜ treat, weights = wt2)
coef.est coef.se
(Intercept) 8037.06 159.77
treat 493.40 204.14
---
n = 6809, k = 2

CausalDepartment
MarchUniversity
2010 Cleveland
26 / 27
Pros and Cons of IPW estimators
Accounts for selection bias (and more) in observational studies

Is more efficient compared to propensity matching - uses all data
available
Can be extended to time-varying confounders and exposures
Behaves badly in situations where few individuals have very large
weights - increases variance. Happens when background
characteristics are strongly predictive of treatment.
Sensitive to model specification

CausalDepartment
MarchUniversity
2010 Cleveland
27 / 27

Causal Inference MSM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Causal Inference MSM

Uploaded by

Copyright:

Available Formats

Causal Inference

Marginal Structural Models

Center for Health Care Research and Policy

Interest in understanding why values of an outcome variable Y

R. Machekano ( Center for Health Care Research and Policy

Causal inference addresses comparisons of different treatments if

R. Machekano ( Center for Health Care Research and Policy

Describes the responses that would have been observed had an

R. Machekano ( Center for Health Care Research and Policy

Hypothetical complete data Observed data

Effect of mathematics program of study on fourth graders

Hypothetical complete data Observed data

Effect of mathematics program of study on fourth graders

Hypothetical complete data Observed data

Effect of mathematics program of study on fourth graders

Hypothetical complete data Observed data

Effect of mathematics program of study on fourth graders

Is causal inference impossible?

R. Machekano ( Center for Health Care Research and Policy

Is causal inference impossible?

R. Machekano ( Center for Health Care Research and Policy

Is causal inference impossible?

R. Machekano ( Center for Health Care Research and Policy

Exploits homogeneity and invariance assumptions to find close

R. Machekano ( Center for Health Care Research and Policy

The average causal effect T is the expected value of the

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

Assume conditional independence:conditional on the confounding

R. Machekano ( Center for Health Care Research and Policy

Propensity scores as a tool to acheive balance and overlap

R. Machekano ( Center for Health Care Research and Policy

If we a have a binary treatment, exposed unexposed, causal

We can model the counterfactual outcome: E(Yt ) = β0 + β1 t

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

True exposure counterfactual mean E(Y1 ) = 2

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

Causal Effects from the potential outcomes

R. Machekano ( Center for Health Care Research and Policy

Perform a weighted regression of the outcome on exposure

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

Covariate Exposed Unexposed p-value

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

propensity.model = glm(treat˜age+educ+married+black+hisp+nodegree+re75+re74, family="binomial")

phat = predict(propensity.model, type="response")

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

R. Machekano ( Center for Health Care Research and Policy

Accounts for selection bias (and more) in observational studies

R. Machekano ( Center for Health Care Research and Policy

You might also like