Professional Documents
Culture Documents
EDITED BY
1
2011
1
Oxford University Press
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright 2011 by Oxford University Press, Inc.
Published by Oxford University Press, Inc.
198 Madison Avenue, New York, New York 10016
www.oup.com
Oxford is a registered trademark of Oxford University Press
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.
____________________________________________
Library of Congress Cataloging-in-Publication Data
ISBN-13: 978-0-19-975464-9
____________________________________________
Printed in USA
on acid-free paper
Preface
v
Preface
vi
Preface
vii
This page intentionally left blank
Contents
Contributors xi
ix
Contents
Index 353
Contributors
xi
Contributors
xii
part i
3
4 Causality and Psychopathology
Over 35 years ago Rubin (1974) began to talk about strong causal infer-
ences that could be made from experimental and nonexperimental studies
using the so-called potential outcomes approach. This approach clarified the
nature of the effects of causes A vs. B by asking us to consider what would
happen to a given subject under these two conditions. Forget for a moment
that at a single instant a subject cannot experience both conditions—Rubin
provided a formal way to think about how we could compare potential rather
than actual outcomes. The contrast of the potential outcomes was argued to
provide a measure of an individual causal effect, and Rubin and his collea-
gues showed that the average of these causal effects across many individuals
could be estimated under certain conditions. Although approaches to causal
analysis have also been developed by philosophers and engineers (see Pearl,
2009), the formal approaches of Rubin and his colleagues (e.g., Holland,
1986; Frangakis & Rubin, 2002) and statistical epidemiologists (Greenland,
Pearl, & Robins, 1999a, 1999b; Robins, 1986; Robins, Hernan, & Brumback,
2000) have prompted researchers to have new appreciation for the strengths
and limitations of both experimental and nonexperimental designs.
This volume is designed to promote conversations among those concerned
with causal inference in the abstract and those interested in causal explana-
tion of psychopathology more specifically. Authors include prominent contri-
butors from both types of literature. Some of the chapters from experts in
causal analysis are rather technical, but all engage important and cutting-
edge issues in the field. The psychopathology experts raise challenging issues
that are likely to be the subject of discussion for years to come.
In this introductory chapter, I give an overview of some of the themes that
will be discussed in the subsequent chapters. These themes have to do with
the assessment of causal effects, the sources of bias in clinical trials and
nonexperimental designs, and the potential of innovative designs and per-
spectives. In addition to the themes that are developed later in the volume,
I discuss two topics that are not fully discussed elsewhere in the volume. One
topic focuses on the role of time when considering the effects of causes in
psychopathology research. The other topic is mediation analysis, which is a
statistical method that was developed in psychology to describe the interven-
ing processes between an intervention and an outcome of that intervention.
A vs. what would have been observed if that person received treatment B. The
outcome under treatment A is called YA(U) and the outcome under treatment
B is called YB(U). Because only one treatment can be administered for a
given measurement of Y(U), the definition of the causal effect depends on
a counterfactual consideration, namely, what the outcome of someone with
treatment A would have been had he or she received treatment B or what the
outcome of someone assigned to treatment B would have been had he or she
received treatment A. Our inability to observe both outcomes is what Holland
(1986) called ‘‘the fundamental problem of causal inference.’’
Panel A Panel B
E C E
T Y X Y
Figure 1.1 Schematic representation of Treatment condition (T) on outcome (Y). Boxes
represent observed values and circles represent latent variables. In the panel on the left
(Panel A) the treatment is the only systematic influence on Y, but in the panel on the
right (Panel B) there is a confounding variable (C) that influences both the treatment
and the outcome.
Clinical Trials
One might suppose that this formal representation works for experiments
involving humans as subjects. However, things get complicated quickly in
this situation, as is well documented in the literature on clinical trials (e.g.,
Fleiss, 1986; Everitt & Pickles, 2004; Piantadosi, 2005). It is easy enough to
assign people randomly to either group A or group B and to verify that the
two groups are statistically equivalent in various characteristics, but human
subjects are agents who can undo the careful experimental design.
Individuals in one group might not like the treatment to which they are
assigned and may take various actions, such as failing to adhere with the
treatment, switching treatments, selectively adding additional treatments, or
withdrawing from the study entirely.
This issue of nonadherence introduces bias into the estimate of the average
causal effect (see Efron, 1998; Peduzzi, Wittes, Detre, & Holford, 1993, for
detailed discussion). For example, if a drug assigned to group A has good
long-term efficacy but temporary negative side effects, such as dry mouth or
drowsiness, then persons who are most distressed by the side effects might
refuse to take the medication or cut back on the dose. Persons in group B
may not feel the need to change their assigned treatment, and thus, the two
groups become nonequivalent in adherence. One would expect that the com-
parison of the outcomes in the two groups would underestimate the efficacy
of the treatment.
A different source of bias will be introduced if persons in one group
are more likely to withhold data or to be lost to follow-up compared to the
1 Integrating Causal Analysis into Psychopathology Research 7
other group. This issue of missing data is another threat to clear causal
inference in clinical trials. Mortality and morbidity are common reasons for
follow-up data being missing, but sometimes data are missing because sub-
jects has become so high-functioning that they do not have time to give to
follow-up measurement. If observations that are missing come from a dis-
tribution other than the observations that were completed and if this discre-
pancy is different in groups A and B, then there is potential for the estimate
of the causal effect to become biased (see Little & Rubin, 2002)
For many clinical studies, the bias in the causal effect created by differential
nonadherence and missing data patterns is set aside rather than confronted
directly. Instead, the analysis of the trials typically emphasizes intent to
treat (ITT). This requires that subjects be analyzed within the groups originally
randomized, regardless of whether they were known to have switched treat-
ment or failed to provide follow-up data. Missing data in this case must be
imputed, using either formal imputation methods (Little & Rubin, 2002) or
informal methods such as carrying the last observed measurement forward.
ITT shifts the emphasis of the trial toward effectiveness of the treatment proto-
col, rather than efficacy of the treatment itself (see Piantadosi, 2005, p. 324). For
example, if treatment A is a new pharmacologic agent, then the effectiveness
question is how a prescription of this drug is likely to change outcome
compared to no prescription. The answer to this question is often quite dif-
ferent from whether the new agent is efficacious when administered in tightly
controlled settings since effectiveness is affected by side effects, cost of treat-
ment, and social factors such as stigma associated with taking the treatment.
Indeed, as clinical researchers reach out to afflicted persons who are not
selected on the basis of treatment-seeking or volunteer motives, nonadherence
and incomplete data are likely to be increasingly more common and challen-
ging in effectiveness evaluation. Although these challenges are real, there are
important reasons to examine the effectiveness of treatments in representative
samples of persons outside of academic medical centers.
Whereas ITT and ad hoc methods of filling in missing data can provide
rigorous answers to effectiveness questions, causal theorists are drawn to
questions of efficacy. Given that we find that a treatment plan has no clear
effectiveness, do we then conclude that the treatment would never be effica-
cious? Or suppose that overall effectiveness is demonstrated: Can we look
more carefully at the data to determine if the treatment caused preventable
side effects? Learning more about the specific causal paths in the develop-
ment and/or treatment of psychopathology is what stimulates new ideas
about future interventions. It also helps to clarify how definitive results are
from clinical trials or social experiments (e.g., Barnard, Frangakis, Hill, &
Rubin, 2003). Toh and Hernán (2008) contrast findings based on an ITT
approach to findings based on causally informative analyses.
8 Causality and Psychopathology
themselves are created using methods such as logistic regression and non-
linear classification algorithms with predictor variables that are conceptually
prior to the causal action of T on Y. One important advantage of this
approach is that the analyst is forced to study the distributions of the pro-
pensity scores in the two groups to be compared. Often, one discovers that
there are some persons in one group who have no match in the other group
and vice versa. These unique individuals are not simply included as extra-
polations, as they are in traditional linear model adjustments, but are instead
set aside for the estimation of the causal effect. The computation of the
adjusted group difference is based on either matching of propensity scores
or forming propensity score strata. This approach is used to make the groups
comparable in a way that is an approximation to random assignment given
the correct estimation of the propensity score (see Gelman & Hill, 2007).
Propensity score adjustment neither assumes a simple linear relation
between the confounder variables and the treatment nor leads to a unique
result. Different methods for computing the propensity score can yield dif-
ferent estimates of the average causal effect. The ways that propensity scores
might be used to improve causal inference continue to be developed. For
example, based on work by Robins (1993), Toh and Hernán (2008) describe a
method called inverse probability weighting for adjustment of adherence and
retention in clinical trials. This method uses propensity score information to
give high weight to individuals who are comparable across groups and low
weight to individuals who are unique to one group.
Whereas direct adjustment and calculation of propensity scores make use
of measured variables that describe the possible imbalance of the groups
indexed by T, the method of instrumental variables attempts to adjust for
confounding by using knowledge about the relation of a set of variables I to
the outcome Y. If I can affect Y only through the variable T, then it is possible
to isolate spurious correlation between the treatment (T) and the outcome
(Y). Figure 1.2 shows a representation of this statement. The instrumental
variable I is said to cause a change in T and, through this variable, to affect Y.
I T Y
Figure 1.2 Schematic representation of how an instrumental variable (I) can isolate the
causal effect from the correlation between the treatment variable (T) and the error
term (E).
12 Causality and Psychopathology
As helpful as the DAG representations of cause can be, they tend to empha-
size causal relations as if they occur all at once. These models may be
perfectly appropriate in engineering applications where a state change in T
is quickly followed by a response change in Y. In psychopathology research,
on the other hand, both processes that are hypothesized to be causes and
processes that are hypothesized to be effects tend to unfold over time. For
example, in clinical trials of fluoxetine, the treatment is administered for 4–6
weeks before it is expected to show effectiveness (Quitkin et al., 2003). When
the treatment is ended, the risk of relapse is typically expected to increase
1 Integrating Causal Analysis into Psychopathology Research 13
with time off the medication. There are lags to both the initial effect of
the treatment and the risk of relapse. Figure 1.3A shows one representation
of this effect over time, where the vertical arrows represent a pattern of
treatments.
Another pattern is expected in preventive programs aimed at reducing
externalizing problems in high-risk children through the improvement of
parenting skills of single mothers. The Incredible Years intervention of
Webster-Stratton and her colleagues (e.g., Gross et al., 2003) takes 12
weeks to unfold and involves both parent and teacher sessions, but the
impact of the program is expected to continue well beyond the treatment
period. The emphasis on positive parenting, warm but structured interac-
tions, and reduction of harsh interactions is expected to affect the mother–
child relationships in ways that promote health, growth, and reduction of
conduct problems. Figure 1.3B shows how this pattern might look over time,
with an initial lag of treatment and a subsequent shift.
For some environmental shocks or chemical agents with pharmacokinetics
of rapid absorption, metabolism, and excretion, the temporal patterns might
be similar to those found in engineering applications. These are character-
ized by rapid change following the treatment and fairly rapid return to base-
line after the treatment is ended. Figure 1.3C illustrates this pattern, which
might be typical for heart rate change following a mild threat such as a fall or
for headache relief following the ingestion of a dose of analgesic.
As Costello and Angold discuss (Chapter 11), the consideration of these
patterns of change is complicated by the fact that the outcome being studied
might not be stable. Psychological/biological processes related to symptoms
might be developing due to maturation or oscillating due to circadian
rhythms, or they might be affected by other processes related to the treatment
itself. In randomized studies, the control group can give a picture of the
trajectory of the naturally occurring process, so long as adequate numbers
of assessments are taken over time. However, the comparison of the treat-
ment and control group may no longer give a single outcome but, rather, a
series of estimated causal effects at different end points, both because of the
hypothesized time patterns illustrated in Figure 1.3 and because of the
natural course of the processes under study. Although one might expect
that effects that are observed at adjacent times are due to the same causal
mechanism, there is no guarantee that the responses are from the same
people. One group of persons might have a short-lived response at one
time and another group might have a response at the next measured time
point.
Muthén and colleagues’ parametric growth mixture models (Chapter 7)
shift the attention to the individual over time, rather than specific (and
perhaps arbitrarily chosen) end points. These models allow the expected
14 Causality and Psychopathology
Panel A
10.0
5.0
0.0
0 5 10 15
Time
Panel B
show
10.0
5.0
0.0
0 5 10 15
Time
Panel C
10.0
Panel A Panel B eM
eY M eY
a b
c
T Y T c´ Y
Figure 1.4 Traditional formulation of Baron and Kenny (1986) mediation model, with
Panel A showing total effect (c) and Panel B showing indirect (a*b) and direct (c0 )
effect decomposition.
1 Integrating Causal Analysis into Psychopathology Research 17
Although Kenny and his colleagues have explicitly warned that the analysis
is appropriate only when the ordering of variables is unambiguous, many
published studies have not established this order rigorously. Even if an experi-
mental design guarantees that the mediating and outcome processes (M, Y)
follow the intervention (T), M and Y themselves are often measured at the same
point in time and the association between M and Y is estimated as a correlation
rather than a manipulated causal relation. This leaves open the possibility of
important bias in the estimated indirect effect of T on Y through M.
Figure 1.5A is an elaboration of Figure 1.4B that represents the possibility
of other influences besides T on the association between M and Y. This is
shown as correlated residual terms, eM and eY. For example, if we were trying
to explain the effect of CBT (T) on depression (Y) through changes in control
of dysfunctional attitudes (M), we could surmise that there is a correlation of
degree of dysfunctional attitudes and depression symptoms that would be
observed even in the control group. Baseline intuitions, insight, or self-help
guides in the lay media might have led to covariation in the degree of
dysfunctional attitudes and depression. In fact, part of this covariation
could be reverse pathways such that less depressed persons more actively
read self-help strategies and then change their attitudes as a function of
Panel A
eM
eY
Y
Panel B
a c´
eM
M0 M
g1
rMY
b
eY
Y0 Y
g2
Figure 1.5 Formulation of mediation model to show correlated errors (Panel A) and an
extended model that includes baseline measures of the mediating variable (M0) and
the outcome measure () and the outcome measure (Y0).
18 Causality and Psychopathology
the reading. If these sources of covariation are ignored, then the estimate of
the b effect will be biased, as will be the product, a * b. In most cases, the
bias will overestimate the amount of effect of T that goes through M.
Hafeman (2008) has provided an analysis of this source of bias from an
epidemiologic and causal analysis framework.
Although Figure 1.5A represents a situation where b will be biased when
the usual Baron and Kenny (1986) analysis is carried out, the model shown in
Figure 1.5A cannot itself be used to respecify the analysis to eliminate the
bias. This is because the model is not empirically identified. This means that
we cannot estimate the size of the correlation between eM and eY while also
estimating a, b, and c0 . However, investigators often have information that is
ignored that can be used to resolve this problem. Figure 1.5B shows a model
with adjustments for baseline (prerandomization) measures of the outcome
(Y0) and mediating process (M0). When these baseline measures are included,
it is possible both to account for baseline association between Y and M and to
estimate a residual correlation between Y and M. The residual correlation can
be estimated if it is reasonable to consider the baseline M0 as an instrumental
variable that has an impact on the outcome Y only through its connection with
the postrandomized measure of the mediating process, M1
How important can this adjustment be? Consider a hypothetical numerical
example in which a = 0.7, b = 0.4, and c0 = 0.28. Assuming that the effects
are population values, these values indicate a partial mediation model. The
total effect of T on Y (c in Figure 1.4A) is the sum of the direct and indirect
effects, 0.56 = 0.28 + (0.70)(0.40), and exactly half the effect goes through M.
The stability of the mediation process from baseline to postintervention is
represented by g1 and the comparable stability of the outcome variable is g2.
Finally, the degree of correlation between M0 and Y0 is rmy.
Figure 1.6 shows results from an analysis of the bias using the Figure
1.4B model to represent mediation for different levels of correlation between
M0 and Y0. The results differ depending on how stable are the mediating and
outcome processes in the control group. (For simplicity, the figure assumes
that they are the same, i.e., g1 = g2.) Focusing on the estimate of the indirect
effect, a * b, one can see that there is no bias if M and Y have no stability:
The estimate is the expected 0.28 for all values of rmy when g1 = g2 = 0.
However, when stability in M and Y is observed, the correlation between M0
and Y0 is substantial. Given that symptoms, such as depression, and coping
strategies, such as cognitive skills, tend to be quite stable in longitudinal
1. There can be further refinements to the model shown in Figure 1.5B. One might consider a
model where Y0 is related to the mediating process M. For example, if less depressed persons
in the study were inclined to seek self-help information and M represented new cognitive skills
that are available in the media, then the path between Y0 and M could be non-zero and
negative.
1 Integrating Causal Analysis into Psychopathology Research 19
1.00
0.80
Product a*b
Stability .8
Stability .6
0.60 Stability .4
Stability .2
Stability .0
0.40
0.20
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Baseline Corr of M and Y
Figure 1.6 Chart showing the expected values of the indirect effect estimated from the
model in Panel B of Figure 1.4 when the actual model was Panel B of Figure 1.5 with
values a=.7, b=.4 and c0 =.28. Different lines show values associated with different
stabilities of the M and Y processes (g1, g2 in Figure 1.5B) as a function of the baseline
correlation between M and Y.
20 Causality and Psychopathology
aspects of the booster session from the original intervention. This difference
could affect the strength of the relation between M and T in the direct
manipulation condition. Nonetheless, the new information provided by
direct manipulation of M is likely to increase the confidence one has in
the estimate of the indirect causal path.
Noting that it is the correlational nature of the link between M and Y that
makes it challenging to obtain unbiased estimates of indirect (mediated)
effects in randomized studies, it should not be surprising that the challenges
are much greater in nonexperimental research. There are a number of stu-
dies published in peer review journals that attempt to partition assumed
causal effects into direct and indirect components. For example, Mohr
et al. (2003) reported that an association between traumatic stress and ele-
vated health problems in 713 active police officers was fully mediated by
subjective sleep problems in the past month. All the variables were measured
in a cross-sectional survey. The path of stress to sleep to health problems is
certainly plausible, but it is also possible that health problems raise the risk
of both stress and sleep problems.
Even if there is no dispute about the causal order, there can be dispute
about the meaning of the mediation analysis in cases such as this.
Presumably, the underlying model unfolds on a daily basis: Stress today
disrupts sleep tonight, and this increases the risk of health problems tomor-
row. One might hope that cross-sectional summaries of stress and sleep
patterns obtained for the past month would be informative about the mediat-
ing process. However, Maxwell and Cole (Cole & Maxwell, 2003; Maxwell &
Cole, 2007) provided convincing evidence that there is no certain connection
between a time-dependent causal model and a result based on cross-sectional
aggregation of data. They studied the implications of a stationary model
where causal effects were observed on a daily basis for a number of days
or parts of days. In addition to the mediation effects represented in Figure
1.4B (a, b, c0 ), they represented the stability of the T, M, and Y processes from
one time point to the next. They studied the inferences that would be made
from a cross-sectional analysis of the variables under different assumptions
about the mediation effects and the stability of the processes. The bias of the
cross-sectional analysis was greatly influenced by the process stability, and the
direction of the bias was not consistent. Sometimes the bias of the indirect
effect estimate was positive and sometimes it was negative.
The Maxwell and Cole work prompts psychopathology researchers to think
carefully about the temporal patterns in mediation and to take seriously the
assumptions that were articulated by Judd and Kenny (1981). Others have
called for modifications of the original positions taken by Kenny and his
colleagues. An important alternate perspective has been advanced by
MacArthur Network researchers (Kraemer, Kiernan, Essex, & Kupfer, 2008),
1 Integrating Causal Analysis into Psychopathology Research 21
who call into question the Baron and Kenny (1986) distinction between
mediation and moderation. As we have already reviewed in Figure 1.4, a
third variable is said by Baron and Kenny (1986) to be a mediator if it
both has a direct association with Y adjusting for T and can be represented
as being related linearly with T. A moderator, according to Baron and Kenny
(1986), is a third variable (W) that is involved in a statistical interaction with
T when T and W are used to predict Y. The MacArthur researchers note that
the Baron and Kenny distinction is problematic if various nonlinear transfor-
mations of Y are considered. Such transformations can produce interaction
models, even if there is no evidence that the causal effect is moderated. They
propose to limit the concept of moderation to effect modifiers. If the third
variable represents a status before the treatment is applied and if the size of
the TY effect varies with the level of the status, then moderation is demon-
strated from the MacArthur perspective. For randomized studies, the moder-
ating variable would be expected to be uncorrelated with T. If
psychopathology researchers embrace the MacArthur definition of modera-
tion, considerable confusion in the literature will be avoided in the future.
Conclusion
It will often take more effort to use the modern tools of causal analysis,
but the benefit of the effort is that researchers will be able to talk more
explicitly about interesting causal theories and patterns rather than about
associations that have been edited to remove any reference to ‘‘cause’’ or
‘‘effect.’’ In the long run the more sophisticated analyses will lead to more
nuanced prevention and treatment interventions and a deeper understanding
of the determinants of psychiatric problems and disorders. Many examples of
these insights are provided in the chapters that follow.
References
Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal stratification
approach to broken randomized experiments: A case study of school choice vou-
chers in New York City. Journal of the American Statistical Association, 98, 299–311.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in
social psychological research: Conceptual, strategic and statistical considerations.
Journal of Personality and Social Psychology, 51, 1173–1182.
Becker, M. H., & Maiman, L. A. (1975). Sociobehavioral determinants of compliance
with health and medical care. Medical Care, 13(1), 10–24.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Brotman, L. M., Gouley, K. K., Huang, K.-Y., Rosenfelt, A., O’Neal, C., Klein, R. G.,
et al. (2008). Preventive intervention for preschoolers at high risk for antisocial
behavior: Long-term effects on child physical aggression and parenting practices.
Journal of Clinical Child and Adolescent Psychology, 37, 386–396.
Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal
data: Questions and tips in the use of structural equation modeling. Journal of
Abnormal Psychology, 112, 558–577.
Connor, D. F., Glatt, S. J., Lopez, I. D., Jackson, D., & Melloni, R. H., Jr. (2002).
Psychopharmacology and aggression. I: A meta-analysis of stimulant effects on
overt/covert aggression-related behaviors in ADHD. Journal of the American
Academy of Child & Adolescent Psychiatry, 41(3), 253–261.
Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between
poverty and psychopathology: A natural experiment. Journal of the American Medical
Association, 290, 2023–2029.
Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic
epidemiology contribute to understanding environmental determinants of disease?
International Journal of Epidemiology, 32, 1–22.
Dobson, K. S. (1989). A meta-analysis of the efficacy of cognitive therapy for depres-
sion. Journal of Consulting and Clinical Psychology, 57(3), 414–419.
Efron, B. (1998). Forward to special issue on analyzing non-compliance in clinical
trials. Statistics in Medicine, 17, 249–250.
Everitt, B. S., & Pickles, A. (2004). Statistical aspects of the design and analysis of clinical
trials. London: Imperial College Press.
Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: Wiley.
Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics, 58, 21–29.
1 Integrating Causal Analysis into Psychopathology Research 23
Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J., Davila-
Roman, V. G., et al. (2009). Treatment of depression after coronary artery bypass
surgery: A randomized controlled trial. Archives of General Psychiatry, 66(4), 387–396.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical
models. New York: Cambridge University Press.
Greenland, S., Pearl, J., & Robins, J. M. (1999a). Causal diagrams for epidemiologic
research. Epidemiology, 10(1), 37–48.
Greenland, S., Pearl, J., & Robins, J. M. (1999b). Confounding and collapsibility in
causal inference. Statistical Science, 14(1), 29–46.
Gross, D., Fogg, L., Webster-Stratton, C., Garvey, C., Julion, W., & Grady, J. (2003).
Parent training of toddlers in day care in low-income urban communities. Journal of
Consulting and Clinical Psychology, 71, 261–278.
Hafeman, D. M. (2008). A sufficient cause based approach to the assessment of
mediation. European Journal of Epidemiology, 23, 711–721.
Hansen, R. A., Gartlehner, G., Lohr, K. N., Gaynes, B. N., & Carey, T. S. (2005).
Efficacy and safety of second-generation antidepressants in the treatment of major
depressive disorder. Annals of Internal Medicine, 143, 415–426.
Hareli, S., & Hess, U. (2008). The role of causal attribution in hurt feelings and related
social emotions elicited in reaction to other’s feedback about failure. Cognition &
Emotion, 22(5), 862–880.
Hegarty, J. D., Baldessarini, R. J., Tohen, M., & Waternaux, C. (1994). One hundred
years of schizophrenia: A meta-analysis of the outcome literature. American Journal
of Psychiatry, 151(10), 1409–1416.
Hernán, M. A., & Robins, J. M. (2006). Instruments for causal inference: an epide-
miologist’s dream? Epidemiology, 17(4), 360–372.
Holland, P. (1986). Statistics and causal inference (with discussion). Journal of the
American Statistical Association, 81, 945–970.
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treat-
ment evaluations. Evaluation Review, 5, 602–619.
Kraemer, H., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria
defining moderators and mediators differ between the Baron & Kenny and
MacArthur approaches. Health Psychology, 27(Suppl. 2), S101–S108.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.).
New York: Wiley.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York:
Lawrence Erlbaum.
Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal
mediation. Psychological Methods, 12(1), 23–44.
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data:
A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.
Mohr, D., Vedantham, K., Neylan, T., Metzler, T. J., Best, S., & Marmar, C. R. (2003).
The mediating effects of sleep in the relationship between traumatic stress and
health symptoms in urban police officers. Psychosomatic Medicine, 65, 485–489.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference. New York:
Cambridge University Press.
O’Mahony, S. M., Marchesi, J. R., Scully, P., Codling, C., Ceolho, A., Quigley, E. M. M.,
et al. (2009). Early life stress alters behavior, immunity, and microbiota in rats:
Implications for irritable bowel syndrome and psychiatric illnesses. Biological
Psychiatry, 65(3), 263–267.
24 Causality and Psychopathology
Pearl, J. (2009). Causality: Models, reasoning and inference. (Second edition) New York:
Cambridge University Press.
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference
on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan
Kaufmann.
Peduzzi, P., Wittes, J., Detre, K., & Holford, T. (1993). Analysis as-randomized and the
problem of non-adherence: An example from the Veterans Affairs Randomized Trial
of Coronary Artery Bypass Surgery. Statistics in Medicine, 12, 1185–1195.
Piantadosi, S. (2005). Clinical trials: A methodologic perspective (2nd ed.). New York:
Wiley.
Quitkin, F. M., Petkova, E., McGrath, P. J., Taylor, B., Beasley, C., Stewart, J., et al.
(2003). When should a trial of fluoxetine for major depression be declared failed?
American Journal of Psychiatry, 160(4), 734–740.
Rehder, B., & Kim, S. (2006). How causal knowledge affects classification: A generative
theory of categorization. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 32(4), 659–683.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with
sustained exposure periods—applications to control of the healthy worker survivor
effect. Mathematical Modeling, 7, 1393–1512.
Robins, J. M. (1993). Analytic methods for estimating HIV treatment and cofactor
effects. In D. G. Ostrow & R. C. Kessler (Eds.), Methodological issues of AIDS
mental health research (pp. 213–288). New York: Springer.
Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and
causal inference in epidemiology. Epidemiology, 11(5), 550–560.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of propensity scores in
observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-
randomized studies. Journal of Educational Psychology, 66(5), 688–701.
Rubin, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58.
Rubin, D. B. (1980). Discussion of ‘‘Randomization analysis of experimental data in
the Fisher randomization test,’’ by D. Basu. Journal of the American Statistical
Association, 75, 591–593.
Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of
Statistical Planning and Inference, 25, 279–292.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experi-
mental designs for generalized causal inference. Boston: Houghton-Mifflin.
Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why
experiments are often more effective than mediational analyses in examining psy-
chological processes. Journal of Personality and Social Psychology, 89(6), 845–851.
Toh, S., & Hernán, M. (2008). Causal inference from longitudinal studies with baseline
randomization. International Journal of Biostatistics, 4(1), article 22. Retrieved from
http://www.bepress.com/ijb/vol4/iss1/22
2
Introduction
25
26 Causality and Psychopathology
The promises of understanding and progress have not been kept, and the
application of science to human affairs has often done great harm.
Public health institutions were caught by surprise by the resurgence
of old diseases and the appearance of new ones. . . . Pesticides increase
pests, create new pest problems and contribute to the load of poison in our
habitat. Antibiotics create new pathogens resistant to our drugs. (p. 1)
relationship between the causal effects uncovered in our studies and results
of interventions based on their removal.
More specifically, we will argue that the unrealistic expectations of the
success of interventions arise in the potential outcomes frame because of a
premature emphasis on the effects of causal manipulation (understanding
what would happen if the exposure were altered) at the expense of two
other tasks that must come first in epidemiologic research: (1) causal identi-
fication (identifying if an exposure did cause an outcome) and (2) causal
explanation (understanding how the exposure caused the outcome). We will
describe an alternative approach that specifies all three of these steps—causal
identification, followed by causal explanation, and then the effects of causal
manipulation. While this alternative approach will not solve the discrepancy
between the results of our studies and the results of our interventions, it
makes the sources of the discrepancy explicit.
The roles of causal identification and causal explanation in causal infer-
ence, which we build upon here, have been most fully elaborated by Shadish,
Cook, and Campbell (2002), heirs to a prominent counterfactual tradition in
psychology (Cook & Campbell 1979) . We think that a dialogue between these
two counterfactual traditions (i.e., the potential outcomes tradition and the
Cook and Campbell tradition as most recently articulated in Shadish et al.)
can provide a more realistic assessment of what our studies can accomplish
and, perhaps, a platform for a more successful translation of basic research
findings into sound public-health interventions.
To make these arguments, we will (1) review the history and principles of
the potential outcomes model, (2) describe the limitations of this model as
the basis for interventions in the real world, and (3) propose an alternative
based on an integration of the potential outcomes model with other counter-
factual traditions.
We wish to make clear at the outset that virtually all of the ideas in this
chapter already appear in the causal inference literature (Morgan & Winship,
2007). This chapter simply presents the picture we see as we stand on the
shoulders of the giants in causal inference.
outcomes throughout. Thus, individuals will be either exposed or not and will
either develop the disease or not.
The concept at the heart of the potential outcomes model is the causal
effect of an exposure. A causal effect is defined as the difference between the
potential outcomes that would arise for an individual under two different
exposure conditions. In considering a disease outcome, each individual has
a potential outcome for the disease under each exposure condition.
Therefore, when comparing two exposure conditions (exposed and not
exposed), there are four possible pairs of potential outcomes for each indivi-
dual. An individual can develop the disease under both conditions, only
under exposure, only under nonexposure, or under neither condition.
Greenland and Robins (1986) used response types as a shorthand to
describe these different pairs of potential outcomes. Individuals who would
develop the disease under either condition (i.e., whether or not they were
exposed) are called ‘‘doomed’’; those who would develop the disease only if
they were exposed are called ‘‘causal types’’; those who would develop the
disease only if they were not exposed are called ‘‘preventive types’’; and those
who would not develop the disease under either exposure condition are called
‘‘immune.’’
Every individual is conceptualized as having a potential outcome under
each exposure that is independent of the actual exposure. Potential outcomes
are determined by the myriad of largely unknown genetic, in utero, child-
hood, adult, social, psychological, and biological causes to which the indivi-
duals have been exposed, other than the exposure under study.
The effect of the exposure for each individual is the difference between the
potential outcome under the two exposure conditions, exposed and not. For
example, if an individual’s potential outcomes were to develop the disease if
exposed but not if unexposed, then the exposure is causal for that individual
(i.e., he or she is a causal type).
Rubin uses the term treatment to refer to these types of exposures and
describes a causal effect in language that implies an imaginary clinical trial.
In Rubin’s (1978) terms, ‘‘The causal effect of one treatment relative to
another for a particular experimental unit is the difference between the
result if the unit had been exposed to the first treatment and the result if,
instead, the unit had been exposed to the second treatment’’ (p. 34). One of
Rubin’s contributions was the popularization of this definition of a causal
effect in an experiment and the extension of the definition to observational
studies (Hernan, 2004).
For example, the causal effect of smoking one pack of cigarettes a day for
a year (i.e., the first treatment) relative to not smoking at all (the second
treatment) is the difference between the disease outcome for an individual if
he or she smokes a pack a day for a year compared with the disease outcome
2 What Would Have Been is Not What Would Be 29
in that same individual if he or she does not smoke at all during this same
time interval.
One can think about the average causal effect in a population simply as
the average of the causal effects for all of the individuals in the population. It
is the difference between the disease experience of the individuals in a parti-
cular population if we were to expose them all to smoking a pack a day and
the disease experience if we were to prevent them from smoking at all during
this same time period.
A useful metaphor for this tradition is that of ‘‘magic powder,’’ where the
magic powder can remove an exposure. Imagine we sprinkle an exposure on
a population and observe the disease outcome. Imagine then that we use
magic powder to remove that exposure and can go back in time to see the
outcome in the same population. The problem of causal inference is two-
fold—we do not have magic powder and we cannot go back in time. We can
never see the same people at the same time exposed and unexposed. That is,
we can never see the same people both smoking a pack of cigarettes a day for
a year and, simultaneously, not smoking cigarettes at all for a year.
From a potential outcomes perspective, this is conceptualized as a miss-
ing-data problem. For each individual, at least one of the exposure experi-
ences is missing. In our studies, we provide substitutes for the missing data.
Of course, our substitutes are never exactly the same as what we want.
However, they can provide the correct answer if the potential outcomes of
the substitute are the same as the potential outcomes of the target, the
person or population you want information about.
The potential outcomes model is clearly a counterfactual model in the sense
that the same person cannot simultaneously experience both exposure and
nonexposure. The outcomes of at least one of the exposure conditions must
represent a counterfactual, an outcome that would have, but did not, happen.
Rubin (2005), however, objects to the use of the term counterfactual when
applied to his model. Counterfactual implies there is a fact (e.g., the outcome
that did occur in a group of exposed individuals) to which the counterfactual
(e.g., the outcome that would have occurred had this group of individuals not
been exposed) is compared. However, for Rubin, there is no fact to begin
with. Rather, the comparison is between the potential outcomes of two
hypothetical exposure conditions, neither of which necessarily reflects an
occurrence. The causal effect for Rubin is between two hypotheticals. Thus,
in the potential outcomes frame, when epidemiologists use the term counter-
factual, they mean ‘‘hypothetical’’ (Morgan & Winship, 2007). This subtle
distinction has important implications, as we shall see.
This notion of a causal effect as a comparison between two hypotheticals
derives from the rootedness of the potential outcomes frame in experimental
traditions. Holland (1986), an early colleague of Rubin and explicator of his
30 Causality and Psychopathology
work, makes this experimental foundation clear in his summary of the three
main tenets of the potential outcomes model.
First, the potential outcomes model studies the effects of causes and not
the causes of effects. Thus, the goal is to estimate the average causal effect of
an exposure, not to identify the causes of an outcome. For a population, this
is the average causal effect, defined as the average difference between two
potential outcomes for the same individuals, the potential outcome under
exposure A vs. the potential outcome under exposure B. The desired, but
unobservable, true causal effect is the difference in outcome in one population
under two hypothetical exposure conditions: if we were to expose the entire
population to exposure A vs. if we were to expose them to exposure B. As in
an experiment, the exposure is treated as if it were in the control of the
experimenter; the goal is to estimate the effect that this manipulation
would have on the outcome.
Second, the effects of causes are always relative to particular comparisons.
One cannot ask questions about the effect of a particular exposure without
specifying the alternative exposure that provides the basis for the comparison.
For example, smoking a pack of cigarettes a day can be preventive of lung cancer
if the comparison was smoking four packs of cigarettes a day but is clearly causal
if the comparison was with smoking zero packs a day. As in an experiment,
the effect is the difference between two hypothetical exposure conditions.
Third, potential outcomes models limit the types of factors that can be
defined as causes. In particular, attributes of units (e.g., attributes of people
such as gender) are not considered to be causes. This requirement clearly derives
from the experimental, interventionist grounding of this model. To be a cause
(or at least a cause of interest), the factor must be manipulable. In Holland (1986,
p. 959) and Rubin’s terminology, ‘‘No causation without manipulation.’’1
The focus on the effect of causes, the precise definition of the two com-
parison groups, and the emphasis on manipulability clearly root the potential
outcomes approach in experimental traditions. Strengths of this approach
include the clarity of the definition of the causal effect being estimated and
the articulation of the assumptions necessary for this effect to be valid. These
assumptions are (1) that the two groups being compared (e.g., the exposed
and the unexposed) are exchangeable (i.e., they have the same potential out-
comes) and (2) that the stable unit treatment value assumption (SUTVA)
holds. While exchangeability is well understood in epidemiology, the require-
ments of SUTVA may be less accessible.
1. Rubin (1986), in commenting on Holland’s 1986 article, is not as strict as Holland in demand-
ing that causes be, by definition, manipulable. Nonetheless, he contends that one cannot
calculate the causal effect of a nonmanipulable cause and coauthored the ‘‘no causation with-
out manipulation’’ mantra.
2 What Would Have Been is Not What Would Be 31
SUTVA is simply the a priori assumption that the value of Y [i.e., the
outcome] for unit u [e.g., a particular person] when exposed to treat-
ment t [e.g., a particular exposure or risk factor] will be the same no
matter what mechanism is used to assign treatment t to unit u and no
matter what treatments the other units receive . . . SUTVA is violated
when, for example, there exist unrepresented versions of treatments
(Ytu depends on which version of treatment t was received) or inter-
ference between units (Ytu depends on whether unit u0 received treat-
ment t or t0 ). (p. 961)
groups of people: one with treatment A, the exposure of interest, and one
with treatment B, the substitute for the potential outcomes of the same group
under the second treatment option. In order for the substitution to yield an
accurate effect estimate (i.e., for exchangeability to hold), we must ensure
that the smokers and nonsmokers are as similar as possible on all causes of
the outcome (other than smoking). This can be accomplished by random
assignment in a randomized controlled trial. To meet SUTVA assumptions,
we have to (1) be vigilant to define our exposure precisely so there is only one
version of each treatment and be certain that how individuals entered the
smoking and nonsmoking groups did not influence their outcome and
(2) ensure the smoking habits of some individuals in our study did not
influence the outcomes of other individuals.
Barring other methodological problems, it would be assumed that if we
did the intervention in real life, that is, if we prevented people from smoking
a pack of cigarettes a day for a year, the average causal effect estimated from
our study would approximate this intervention effect. The potential outcomes
model is an attempt to predict the average causal effect that would arise (or
be prevented) from a particular manipulation under SUTVA. It is self-
consciously interventionist.
Indeed, causal questions are framed in terms of intervention conse-
quences. To ensure the validity of the causal effects uncovered in epidemio-
logic studies, researchers are encouraged to frame the causal question in
these terms. As a prototypical example, Glymour (2007), in a cogent metho-
dologic critique of a study examining the effect of childhood socioeconomic
position on adult health, restated the goal of the study in potential outcome
terms. ‘‘The primary causal question of interest is how adult health would
differ if we intervened to change childhood socio-economic position’’ (p. 566).
It is critical to note that even when we do not explicitly begin with this
type of model, the interventionist focus of the potential outcomes frame
implicitly influences our thinking through its influence on our methods.
For example, this notion is embodied in our understanding of the attributable
risk as the proportion of disease that would be prevented if we were to
remove this exposure (Last, 2001). More generally, authors often end study
reports with a statement about the implications of their findings for inter-
vention or policy that reflect this way of thinking.
To ensure the internal validity of our inferences, we isolate the effects of our
causes from the context in which they act. We do this by narrowly defining
2 What Would Have Been is Not What Would Be 33
While this is necessary for the estimation of precise causal effects in our
studies, it is not likely to reflect the meaning of the exposure or treatment in
the real world. The removal of causes or the provision of treatments, no
matter how well defined, is never surgical. Unlike the removal of causes
by the magic powder in our thought experiments, interventions are often
crude and messy. Public-health interventions are inherently broad. Even in
a clinical context, treatment protocols are never followed precisely in real-
world practice.
In public-health interventions, there are also different ways of getting into
‘‘treatment,’’ and these may well have different effects on the outcome. For
instance, the effect of an intervention offering a service may be very different
for those who use it only after it has become popular (the late adopters). Early
adopters of a low-fat diet, for example, may increase their intake of fruits and
vegetables to compensate for the caloric change. Late adopters may substitute
low-fat cookies instead. A low-fat diet was adopted by both types of people,
but the effect on an outcome (e.g., weight loss) would likely differ. There are
always different versions of treatments, and the mechanisms through which
individuals obtain the treatments will frequently impact the effect of the
treatments on the outcome.
Natural Confounding
Recall that the estimation of the true causal effect requires exchangeability of
potential outcomes between the exposed and unexposed groups in our stu-
dies. Exchangeability is necessary to isolate the causal effect of interest. For
example, in examining the effects of alcohol abuse on vehicular fatalities, we
may control for the use of illicit drugs. We do so because those who abuse
alcohol may be more likely to also abuse other drugs that are related to
vehicular fatalities.
If the association between alcohol abuse and illicit drug use is a form of
‘‘natural confounding,’’ that is, the association between alcohol and drug use
arises in naturally occurring populations and is not an artifact of selection
into the study, then this association is likely to have important influences in a
real-world intervention. That is, the way in which individuals came to be
exposed may influence the effect of the intervention, in violation of SUTVA.
For example, when two activities derive from a similar underlying factor
(social, psychological, or biologic), the removal of one may influence the
presence of the other over time; it may activate a feedback loop. Thus, the
causal effect of alcohol abuse on car accidents may overestimate the effect of
the removal of alcohol abuse from a population if the intervention on alcohol
use inadvertently increases marijuana use.
As this example illustrates, an intervention may influence not only the
exposure of interest but also other causes of the outcome that are linked with
the exposure in the real world. In our studies, we purposely break this link.
We overcome the problem of the violation of SUTVA by imposing narrow
limits on time and place so that SUTVA holds in the study. We control these
36 Causality and Psychopathology
variables, precisely because they are also causes of the outcome under study.
In the real world, however, their influence may make the interventions less
effective than our effect estimates suggest. The control in the study was not
incorrect as it was necessary to isolate the true effect that alcohol use did
have on car accidents among these individuals given the extant conditions of
the study. However, outside the context of the study, removal of the exposure
of interest had unintended consequences over time through its link with
other causes of the outcome.
Context Dependency
Most fundamentally, all causal effects are context-dependent, and therefore,
all effects are local. It is unlikely that a public-health intervention will be
applied only in the exact population in which the causal effects were studied.
Public-health interventions often apply to people who do not volunteer for
them, to a broader swath of the social fabric and over a different historical
time frame. Therefore, even if our effect estimates were perfectly valid, we
would expect effects to vary between our studies and our interventions. For
example, psychiatric drugs are often tested on individuals who meet strict
Diagnostic and Statistical Manual of Mental Disorders criteria, do not have
comorbidities, and are placebo nonresponders. Once the drugs are marketed,
however, they are used to treat individuals who represent a much wider
population. It is unlikely that the effects of the drugs will be similar in
real-world usage as in the studies.
For all these reasons, it seems unlikely that the causal effect of any inter-
vention will reflect the causal effect found in our studies. These problems are
2 What Would Have Been is Not What Would Be 37
well known and much discussed in the social science literature (e.g., Merton,
1936, 1968; Lieberson, 1985) and the epidemiologic literature (e.g.,
Greenland, 2005).
Nonetheless, when carrying out studies, epidemiologists often talk about
trying to identify ‘‘the true causal effect of an exposure,’’ as if this was a
quantification that has some inherent meaning. An attributable risk is inter-
preted as if this provided a quantification of the effect of the elimination of
the exposure under study. Policy implications of etiologic work are discussed
as if they flowed directly from our results. We think that this is an overly
optimistic assessment of what our studies can show. We think that as a field
we tend to estimate the effect exposures had in the past and assume that this
will be the effect in the future. We do this by treating the counterfactual of
the past as equivalent to the potential outcome of the future.
Poor Prenatal
nutrition Neglect viral
Stressful exposure
Trauma Child
event virus
U1 U2 U3
Gene Toxin Vitamin
deficiency
Table 2.1 Differences Between the Potential Outcomes Model and an Integrated
Counterfactual Approach
2. Technically, when the effect for the entire population is of interest, full exchangeability is
required. When the effect for the exposed is of interest, only partial exchangeability is required
(Greenland & Robins, 1986).
2 What Would Have Been is Not What Would Be 41
fixed as it is in this context, say, with a fairly rigid set of social expectations
depending on identified sex at birth. We can ask a question about what an
individual’s life would have been like had he been born male, rather than
female, given this social context.
Fourth, this perspective brings the issue of context dependency front and
center. As Rothman’s (1976) and Mackie’s (1965) models make explicit, shifts
in the component causes and their distributions, variations in the field of
interest, and the sociohistorical context change the impact of the cause and,
indeed, determine whether or not the factor is a cause in this circumstance.
Thus, the impact of a cause is explicitly recognized as context-dependent; the
size of an effect is not universal. A factor can be a cause for some individuals
in some contexts but not in others. Thus, the goal is the ‘‘identification of
causes in the universe,’’ rather than the estimation of universal causal effects.
By ‘‘causes in the universe’’ we mean factors which at some moment in time
have caused the outcome of interest and could theoretically (if all else were
equal) happen again.
Construct Validity
3. Shadish et al. (2002) call this step ‘‘causal description.’’ We think ‘‘causal identification: is a
better fit for our purposes.
42 Causality and Psychopathology
External Validity
of the past. Dynamic models allowing for violations of SUTVA are required to
understand potential outcomes of the future.
Summary
References
Beaglehole, R., & Bonita, R. (1997). Public health at the crossroads: Achievements and
prospects. New York: Cambridge University Press.
Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social
network over 32 years. New England Journal of Medicine, 357, 370–379.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues
for field settings. Chicago: Rand McNally.
2 What Would Have Been is Not What Would Be 45
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization.
Annals of Statistics, 6, 34–58.
Rubin, D. B. (1986). Statistics and causal inference comment: Which ifs have causal
answers. Journal of the American Statistical Association, 81, 961–962.
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling
decisions. Journal of the American Statistical Association, 100, 322–331.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experi-
mental designs for generalized causal inference. Boston: Houghton Mifflin.
Shy, C. M. (1997). The failure of academic epidemiology: Witness for the prosecution.
American Journal of Epidemiology, 145, 479–484.
VanderWeele, T. J., & Hernan, M. A. (2006). From counterfactuals to sufficient com-
ponent causes and vice versa. European Journal of Epidemiology, 21, 855–858.
3
Introduction
Almost two decades have passed since Paul Holland published his highly
cited review paper on the Neyman-Rubin approach to causal inference
(Holland, 1986). Our understanding of causal inference has since increased
severalfold, due primarily to advances in three areas:
2. Graphical models
These advances are central to the empirical sciences because the research
questions that motivate most studies in the health, social, and behavioral
sciences are not statistical but causal in nature. For example, what is the
efficacy of a given drug in a given population? Can data prove an employer
guilty of hiring discrimination? What fraction of past crimes could have been
avoided by a given policy? What was the cause of death of a given individual
in a specific incident?
Remarkably, although much of the conceptual framework and many of the
algorithmic tools needed for tackling such problems are now well established,
they are hardly known to researchers in the field who could put them into
practical use. Why?
Solving causal problems mathematically requires certain extensions in the
standard mathematical language of statistics, and these extensions are not
generally emphasized in the mainstream literature and education. As a
result, large segments of the statistical research community find it hard to
appreciate and benefit from the many results that causal analysis has pro-
duced in the past two decades.
47
48 Causality and Psychopathology
This chapter aims at making these advances more accessible to the gen-
eral research community by, first, contrasting causal analysis with standard
statistical analysis and, second, comparing and unifying various approaches
to causal analysis.
would attain a value y in the population, written p(Yx = y). Alternatively, Pearl
(1995) used expressions of the form p[Y = y | set(X = x)] or p[Y = y | do(X = x)] to
denote the probability (or frequency) that event (Y = y) would occur if treatment
condition (X = x) were enforced uniformly over the population.2 Still a third
notation that distinguishes causal expressions is provided by graphical models,
where the arrows convey causal directionality.3
However, few have taken seriously the textbook requirement that any
introduction of new notation must entail a systematic definition of the
syntax and semantics that govern the notation. Moreover, in the bulk of
the statistical literature before 2000, causal claims rarely appear in the mathe-
matics. They surface only in the verbal interpretation that investigators occa-
sionally attach to certain associations and in the verbal description with
which investigators justify assumptions. For example, the assumption that
a covariate is not affected by a treatment, a necessary assumption for the
control of confounding (Cox, 1958), is expressed in plain English, not in a
mathematical expression.
Remarkably, though the necessity of explicit causal notation is now recog-
nized by most leaders in the field, the use of such notation has remained
enigmatic to most rank-and-file researchers and its potentials still lay grossly
underutilized in the statistics-based sciences. The reason for this, I am firmly
convinced, can be traced to the way in which causal analysis has been pre-
sented to the research community, relying primarily on outdated paradigms
of controlled randomized experiments and black-box ‘‘missing-data’’ models
(Rubin, 1974; Holland, 1986).
The next section provides a conceptualization that overcomes these mental
barriers; it offers both a friendly mathematical machinery for cause–effect
analysis and a formal foundation for counterfactual analysis.
2. Clearly, P[Y = y|do(X = x)] is equivalent to P(Yx = y). This is what we normally assess in a
controlled experiment, with X randomized, in which the distribution of Y is estimated for each
level x of X.
3. These notational clues should be useful for detecting inadequate definitions of causal concepts;
any definition of confounding, randomization, or instrumental variables that is cast in standard
probability expressions, void of graphs, counterfactual subscripts, or do(*) operators, can safely be
discarded as inadequate.
52 Causality and Psychopathology
y ¼ x þ u ð1Þ
where x stands for the level (or severity) of the disease, y stands for the level
(or severity) of the symptom, and u stands for all factors, other than the
disease in question, that could possibly affect Y.4 In interpreting this equa-
tion, one should think of a physical process whereby nature examines the
values of X and U and, accordingly, assigns variable Y the value y = x + u.
To express the directionality inherent in this process, Wright augmented
the equation with a diagram, later called a ‘‘path diagram,’’ in which arrows
are drawn from (perceived) causes to their (perceived) effects and, more
importantly, the absence of an arrow makes the empirical claim that the
value nature assigns to one variable is not determined by the value taken
by another.5
The variables V and U are called ‘‘exogenous’’; they represent observed or
unobserved background factors that the modeler decides to keep unexplained,
that is, factors that influence, but are not influenced by, the other variables
(called ‘‘endogenous’’) in the model.
If correlation is judged possible between two exogenous variables, U and
V, it is customary to connect them by a dashed double arrow, as shown in
Figure 3.1b.
To summarize, path diagrams encode causal assumptions via missing
arrows, representing claims of zero influence, and missing double arrows
(e.g., between V and U), representing the (causal) assumption Cot(U, V) = 0.
V U V U
x=v
y = βx + u
X β Y X β Y
(a) (b)
Figure 3.1 A simple structural equation model, and its associated diagrams.
Unobserved exogenous variables are connected by dashed arrows.
4. We use capital letters (e.g., X,Y,U) for variable names and lower case letters (e.g., x,y,u) for
values taken by these variables.
5. A weaker class of causal diagrams, known as ‘‘causal Bayesian networks,’’ encodes interven-
tional, rather than functional dependencies; it can be used to predict outcomes of randomized
experiments but not probabilities of counterfactuals (for formal definition, see Pearl, 2000a,
pp. 22–24).
3 The Mathematics of Causal Relations 53
W V U W V U
x0
Z X Y Z X Y
(a) (b)
Figure 3.2 (a) The diagram associated with the structural model of Eq. (2). (b) The
diagram associated with the modified model of Eq. (3), representing the intervention
do (X = x0).
z ¼ fZ ðwÞ
x ¼ fX ðz; Þ ð2Þ
y ¼ fY ðx; uÞ
z ¼ fZ ðwÞ
x ¼ x0 ð3Þ
y ¼ fY ðx; uÞ
4
Yx ðuÞ ¼ YMx ðuÞ: ð4Þ
54 Causality and Psychopathology
Z
1 Z
2
W1
Z W
3 2
X
W
3 Y
Figure 3.3 Graphical model illustrating the back-door criterion. Error terms are not
shown explicitly.
and salary level. Our problem is to select a subset of these factors for mea-
surement and adjustment so that if we compare treated vs. untreated subjects
having the same values of the selected factors, we get the correct treatment
effect in that subpopulation of subjects. Such a set of factors is called a
‘‘sufficient set,’’ ‘‘admissible’’ or a set ‘‘appropriate for adjustment.’’ The
problem of defining a sufficient set, let alone finding one, has baffled epide-
miologists and social scientists for decades (for review, see Greenland, Pearl,
& Robins, 1999; Pearl, 2000a, 2009a).
The following criterion, named the ‘‘back-door’’ criterion (Pearl, 1993a),
provides a graphical method of selecting such a set of factors for adjustment.
It states that a set, S, is appropriate for adjustment if two conditions hold:
1. No element of S is a descendant of X.
2. The elements of S ‘‘block’’ all back-door paths from X to Y, that is, all
paths that end with an arrow pointing to X.6
Based on this criterion we see, for example, that each of the sets {Z1, Z2,
Z3}, {Z1, Z3}, and {W2, Z3} is sufficient for adjustment because each blocks
all back-door paths between X and Y. The set {Z3}, however, is not sufficient
for adjustment because it does not block the path X W1 Z1 ! Z3 Z2
! W2 ! Y.
The implication of finding a sufficient set, S, is that stratifying on S is
guaranteed to remove all confounding bias relative to the causal effect of X
on Y. In other words, it renders the causal effect of X on Y identifiable, via
6. In this criterion, a set, S, of nodes is said to block a path, P, if either (1) P contains at least one
arrow-emitting node that is in S or (2) P contains at least one collision node (e.g., ! Z ) that
is outside S and has no descendant in S (see Pearl, 2009b, pp. 16–17, 335–337).
56 Causality and Psychopathology
Since all factors on the right-hand side of the equation are estimable (e.g.,
by regression) from the preinterventional data, the causal effect can likewise
be estimated from such data without bias.
The back-door criterion allows us to write equation 5 directly, after selecting
a sufficient set, S, from the diagram, without resorting to any algebraic manip-
ulation. The selection criterion can be applied systematically to diagrams of any
size and shape, thus freeing analysts from judging whether ‘‘X is conditionally
ignorable given S,’’ a formidable mental task required in the potential-response
framework (Rosenbaum & Rubin, 1983). The criterion also enables the analyst
to search for an optimal set of covariates—namely, a set, S, that minimizes
measurement cost or sampling variability (Tian, Paz, & Pearl, 1998).
7. Before applying this criterion, one may delete from the causal graph all nodes that are not
ancestors of Y.
3 The Mathematics of Causal Relations 57
(2) extending the condition from causal effects to any counterfactual expres-
sion (Shpitser & Pearl, 2007). The corresponding unbiased estimands for
these causal quantities are readable directly from the diagram.
Formulating Assumptions
The distinct characteristic of the potential-outcome approach is that, although
its primitive objects are undefined, hypothetical quantities, the analysis itself
is conducted almost entirely within the axiomatic framework of probability
theory. This is accomplished by postulating a ‘‘super’’ probability function on
both hypothetical and real events, treating the former as ‘‘missing data.’’ In
other words, if U is treated as a random variable, then the value of the
counterfactual Yx(u) becomes a random variable as well, denoted as Yx.
58 Causality and Psychopathology
X ¼ x ) Yx ¼ Y; ð7Þ
which states that for every u, if the actual value of X turns out to be x,
then the value that Y would take on if X were x is equal to the actual value of
Y. For example, a person who chose treatment x and recovered would also
have recovered if given treatment x by design.
The main conceptual difference between the two approaches is that,
whereas the structural approach views the subscript x as an operation that
changes the distribution but keeps the variables the same, the potential-
outcome approach views Yx to be a different variable, unobserved and loosely
connected to Y through relations such as equation 7.
Pearl (2000a, chap. 7) shows, using the structural interpretation of Yx(u),
that it is indeed legitimate to treat counterfactuals as jointly distributed
random variables in all respects, that consistency constraints like equation
7 are automatically satisfied in the structural interpretation, and, moreover,
that investigators need not be concerned about any additional constraints
except the following two:8
8. This completeness result is due to Halpern (1998), who noted that an additional axiom
Performing Inferences
A collection of assumptions of this type might sometimes be sufficient to
permit a unique solution to the query of interest; in other cases, only bounds
on the solution can be obtained. For example, if one can plausibly assume
that a set, Z, of covariates satisfies the conditional independence
Yx ?? XjZ ð10Þ
9. Even with the use of graphs the task is not easy; for example, the reader should try to verify
whether fZ ?? Xz jYg holds in the simple model of Figure 3.2(a). The answer is given in Pearl
(2000a, p. 214).
60 Causality and Psychopathology
These general formulas are applicable to any type of variables,10 any non-
linear interactions, and any distribution and, moreover, are readily estimable
by regression. IE (respectively, DE) represents the average increase in the
outcome Y that the transition from X = x to X = x’ is expected to produce
absent any direct (respectively, indirect) effect of X on Y. When the outcome Y
is binary (e.g., recovery or hiring), the ratio (1 – IE/TE) represents the frac-
tion of responding individuals who owe their response to direct paths, while
(1 – DE/TE) represents the fraction who owe their response to Z-mediated
paths. TE stands for the total effect, TE = E(Y|x’) – E(Y|x), which, in non-
linear systems may or may not be the sum of the direct and indirect effects.
Additional results spawned by the structural–graphical–counterfactual
symbiosis include effect estimation under noncompliance (Balke & Pearl,
1997; Chickering & Pearl, 1997), mediating instrumental variables (Pearl,
1993b; Brito & Pearl, 2006), robustness analysis (Pearl, 2004), selecting pre-
dictors for propensity scores (Pearl, 2010a, 2010c), and estimating the effect
of treatment on the treated (Shpitser & Pearl, 2009). Detailed descriptions of
these results are given in the corresponding articles (available at http://
bayes.cs.ucla.edu/csl_papers.html).
Conclusions
10. Integrals should replace summations when Z is continuous. Generalizations to cases invol-
ving observed or unobserved confounders are given in Pearl (2001) and exemplified in Pearl
(2010a, 2010b). Conceptually, IE measures the average change in Y under the operation of
setting X to x and, simultaneously, setting Z to whatever value it would have obtained under X
= x’ (Robins & Greenland, 1992).
62 Causality and Psychopathology
References
Angrist, J., Imbens, G., & Rubin, D. (1996). Identification of causal effects using
instrumental variables (with comments). Journal of the American Statistical
Association, 91(434), 444–472.
Balke, A., & Pearl, J. (1994a). Counterfactual probabilities: Computational methods,
bounds, and applications. In R. L. de Mantaras and D. Poole (Eds.), Proceedings of the
3 The Mathematics of Causal Relations 63
Tenth Conference on Uncertainty in Artificial Intelligence (pp. 46–54). San Mateo, CA:
Morgan Kaufmann.
Balke, A., & Pearl, J. (1994b). Probabilistic evaluation of counterfactual queries.
In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. I,
pp. 230–237). Menlo Park, CA: MIT Press.
Balke, A., & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect
compliance. Journal of the American Statistical Association, 92(439), 1172–1176.
Brito, C., & Pearl, J. (2006). Graphical condition for identification in recursive SEM.
In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence
(pp. 47–54). Corvallis, OR: AUAI Press.
Chalak, K., & White, H. (2006). An extended class of instrumental variables for the
estimation of causal effects (Tech. Rep. Discuss. Paper). San Diego: University of
California, San Diego, Department of Economics.
Chickering, D., & Pearl, J. (1997). A clinician’s tool for analyzing non-compliance.
Computing Science and Statistics, 29(2):424–431.
Cox, D. (1958). The Planning of Experiments. New York: John Wiley & Sons.
Glymour, M., & Greenland, S. (2008). Causal diagrams. In K. Rothman, S. Greenland,
and T. Lash (Eds.), Modern Epidemiology (3rd ed., pp. 183–209). Philadelphia:
Lippincott Williams & Wilkins.
Greenland, S., Pearl, J., & Robins, J. (1999). Causal diagrams for epidemiologic
research. Epidemiology, 10(1):37–48.
Halpern, J. (1998). Axiomatizing causal reasoning. In G. Cooper and S. Moral (Eds.),
Uncertainty in Artificial Intelligence (pp. 202–210). San Francisco: Morgan Kaufmann.
(Reprinted in Journal of Artificial Intelligence Research, 12, 17–37, 2000.)
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical
Association, 81(396), 945–960.
Holland, P. (1988). Causal inference, path analysis, and recursive structural equations
models. In C. Clogg (Ed.), Sociological Methodology (pp. 449–484). Washington, DC:
American Sociological Association.
Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press.
Mackie, J. (1965). Causes and conditions. American Philosophical Quarterly, 2/4:261–264.
(Reprinted in E. Sosa and M. Tooley [Eds.], Causation. Oxford: Oxford University
Press, 1993.)
MacKinnon, D., Lockwood, C., Brown, C., Wang, W., & Hoffman, J. (2007).
The intermediate endpoint effect in logistic and probit regression. Clinical Trials,
4, 499–513.
Morgan, S., & Winship, C. (2007). Counterfactuals and Causal Inference: Methods and
Principles for Social Research (Analytical Methods for Social Research). New York:
Cambridge University Press.
Neyman, J. (1923). On the application of probability theory to agricultural experiments.
Essay on principles. Statistical Science, 5(4), 465–480.
Pearl, J. (1993a). Comment: Graphical models, causality, and intervention. Statistical
Science, 8(3), 266–269.
Pearl, J. (1993b). Mediating instrumental variables (Tech. Rep. No. TR-210). Los
Angeles: University of California, Los Angeles, Department of Computer Science.
http://ftp.cs.ucla.edu/pub/stat_ser/R210.pdf.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–710.
Pearl, J. (2000a). Causality: Models, Reasoning, and Inference. New York: Cambridge
University Press.
64 Causality and Psychopathology
A large and daunting philosophical literature exists on the nature and mean-
ing of causality. Add to that the extensive discussions in the statistical litera-
ture about what it means to claim that C causes E, and it can be
overwhelming for the scientists, who, after all, are typically just seeking
guidelines about how to conduct and analyze their research. Add to this
mix the inherent problems in psychiatry—which examines an extraordinarily
wide array of potential causal processes from molecules to minds and socie-
ties, some of which permit experimental manipulation but many of which do
not—and you can readily see the sense of frustration and, indeed, futility
with which this issue might be addressed.
In the first section of this chapter, I reflect on two rather practical aspects
of causal inference that I have confronted in my research career in psychia-
tric genetics. The first of these is what philosophers call a ‘‘brute fact’’ of our
world—the unidirectional causal relationship between variation in genomic
DNA and phenotype. The second is the co-twin–control method—a nice
example of trying to use twins as a ‘‘natural experiment’’ to clarify causal
processes when controlled trials are infeasible.
In the second section, I briefly outline and advocate for a particular
approach to causal inference developed by Jim Woodward (2003) that I
term ‘‘interventionism.’’ I argue that this approach is especially well suited
to the needs of our unusual field of psychiatry.
66
4 Causal Thinking in Psychiatry 67
GV ! Risk
Risk ! GV
This claim is specific and limited. It does not apply to other features of our
genetic machinery such as gene expression—the product of genes at the level
68 Causality and Psychopathology
and the other group has not. If the exposed group has a higher rate of disease,
then we can argue on this basis that the risk factor truly causes the disease.
While intuitively appealing, this common nonexperimental approach—like
many in epidemiology—has a key point of vulnerability. While it may be that
the risk factor causes the disease, it is also possible that a set of ‘‘third
variables’’ predispose to both the risk factor and the disease. Such a case
will produce a noncausal risk factor–disease association.
This is a particular problem in psychiatric epidemiology because so many
exposures of interest—stressful life events, social support, educational status,
social class—are themselves complex and the result not only of the environ-
ment (with causal effects flowing from environment to person) but also of
the actions of human beings themselves (with causal effects flowing from
people to their environment) (Kendler & Prescott, 2006). As humans, we
actively create our own environments, and this activity is substantially influ-
enced by our genes (Kendler & Baker, 2007). Thus, for behavioral traits, our
phenotypes quite literally extend far beyond our skin.
Can we use genetically informative designs to get any purchase on these
possible confounds? Sometimes. Let me describe one such method—the co-
twin–control design. I will first illustrate its potential utility with one example
and then describe a critical limitation.
A full co-twin–control design involves the comparison of the association
between a risk factor and an outcome in three samples: (1) an unselected
population, (2) dizygotic (DZ) twin pairs discordant for exposure to the risk
factor, and (3) monozygotic (MZ) pairs discordant for exposure to the risk
factor (Kendler et al., 1993). Three possible different patterns of results are
illustrated in Figure 4.1. The results on the left side of the figure show the
4
All subjects
DZ pairs
MZ pairs
3
OR 2
1.0
1
All Causal Partly Non-causal - Non-causal - All
Genetic Genetic
5
All subjects
DZ pairs
4 MZ pairs
3
OR
1
Males Females
Figure 4.2 Odds Ratios from Cotwin-Control Analyses of the Association between
Drinking Before Age 15 and Alcohol Dependence.
72 Causality and Psychopathology
factors, the risk for alcoholism among the unexposed twins from DZ discordant-
onset pairs would be expected to be the same as that in the MZ pairs.
However, if familial resemblance for deviance is due to genetic factors, the
risk for alcoholism in an unexposed individual would be lower among DZ
than MZ pairs.
As shown in Figure 4.2, the twin pair resemblance was inconsistent with
the causal hypothesis. Instead, the results suggested that early drinking and
later alcoholism are both the result of a shared genetic liability. For example,
among the 213 male and 69 female MZ pairs who were discordant for early
drinking, there was only a slight difference in the prevalence of AD between
the twins who drank early and the co-twins who did not. The ORs were 1.1
for both sexes and were not statistically different from the 1.0 value predicted
by the noncausal model for MZ pairs. The ORs for the DZ pairs were
midway between those of the MZ pairs and the general sample, indicating
that the source of the familial liability is genetic rather than environmental.
I am not claiming that these results are definitive, and they certainly
require replication. It is frankly unlikely that early onset of alcohol consump-
tion has no impact on subsequent risk for problem drinking. Surely, however,
these results should give pause to those who want to stamp out alcohol
problems by restricting the access of adolescents to alcohol and suggest
that non-casual processes might explain at least some of the early drink-
ing–later alcoholism relationship. For those interested in other psychiatric
applications of the co-twin–control method, our group has applied it to clarify
the association between smoking and major depression (Kendler et al., 1993),
stressful life events and major depression (Kendler, Karkowski, & Prescott,
1999), and childhood sexual abuse and a range of psychiatric outcomes
(Kendler et al., 2000).
Lest you leap to the conclusion that this method is a panacea for our
problems of causal inference, I have some bad news. The co-twin–control
method is asymmetric with regard to the causal clarity of its results. Studies
in which MZ twins discordant for risk-factor exposure have equal rates of the
disease can, I think, permit the rather strong inference that the risk factor–
disease association is not causal. However, if in MZ twins discordant for risk-
factor exposure the exposed twin has a significantly higher risk of illness than
the unexposed twin, it is not possible to infer with such confidence that the
risk factor–disease association is causal. This is because in the typical design
it is not possible to rule out the potential that some unique environmental
event not shared with the co-twin produced both the risk factor and the
disease.
For example, imagine we are studying the relationship between early
conduct disorder and later drug dependence. Assume further that we find
many MZ twin pairs of the following type: the conduct-disordered twin
4 Causal Thinking in Psychiatry 73
I have been reading for some years in the philosophy of science (and a bit in
metaphysics) about approaches to causation and explanation. For understand-
able reasons, this is an area often underutilized by psychiatric researchers. I
am particularly interested in the question of what general approach to caus-
ality is most appropriate for the science of psychiatry, which itself is a hybrid
of the biological, psychological, and sociological sciences.
First, I would argue that the deductive-nomological approach emerging
from the logical positivist movement is poorly suited to psychiatric research.
This position—which sees true explanation as being deduced from general
laws as applied to specific situations—may have its applications in physics.
However, psychiatry lacks the broad and deep laws that are seen at the core of
physics. Many, myself included, doubt that psychiatry will ever have laws of
the wide applicability of general relativity or quantum mechanics. It is simply
not, I suggest, in the nature of our discipline to have such powerful and
simple explanations. A further critical limitation of this approach for
74 Causality and Psychopathology
psychiatry, much discussed in the literature, is that it does a poor job at the
critical discrimination between causation and correlation—which I consider a
central problem for our field. The famous example that is most commonly
used here is of flagpoles and shadows. Geometric laws can equally predict the
length of a shadow from the height of a flagpole or the height of the flagpole
from the length of the shadow. However, only one of these two relationships
is causally sensible.
Second, while a mechanistic approach to causation is initially intuitively
appealing, it is also ill-suited as a general approach for our field. By a ‘‘mechan-
istic approach,’’ I mean the idea that causation is best understood as the result
of some direct physical contact, a spatiotemporal process, which involves the
transfer of some process or energy from one object to another. One might
think of this as the billiard ball model of causality—that satisfying click we
hear when our cue ball knocks against another ball, sending it, we hope, into
the designated pocket. How might this idea apply to psychiatric phenomena?
Consider the empirical observation that the rate of suicide declined dra-
matically in England in the weeks after September 11, 2001 (9/11) (Salib,
2003). How would a mechanistic model approach this causal process? It
would search for the specific nature of the spatiotemporal processes that
connected the events of 9/11 in the United States to people in England.
For example, it would have to determine the extent to which information
about the events of 9/11 were conveyed to the English population through
radio, television, e-mail, word of mouth, and newspapers. Then, it would
have to trace the physical processes whereby this news influenced the
needed brain pathways, etc. I am not a Cartesian dualist, so do not misunder-
stand me here. I am not suggesting that in some ultimate way physical
processes were not needed to explain why the suicide rate declined in
England in September 2001. Instead, perhaps time spent figuring out the
physical means by which news of 9/11 arrived in England is the wrong level
on which to understand this process. Mechanistic models fail for psychiatry
for the same reasons that hard reductionist models fail. Critical causal pro-
cesses in psychiatric illnesses exist at multiple levels, only some of which are
best understood at a physical–mechanical level.
A third approach is the interventionist model (IM), which evolved out of
the counterfactual approach to causation. The two perspectives share the
fundamental idea that in thinking about causation we are ultimately asking
questions about what would have happened if things had been different.
While some counterfactual literature discusses issues around closest parallel
worlds, the IM approach is a good deal more general and can be considered
‘‘down to earth.’’
What is the essence of the IM? Consider a simple, idealized case.
Suppose we want to determine whether stress (S) increases the risk for
4 Causal Thinking in Psychiatry 75
major depression (MD). The ‘‘ideal experiment’’ here would be the unethical
one in which, in a given population, we randomly intervene on indivi-
duals, exposing them to a stressful experience such as severe public humili-
ation (H). This experience increases their level of S, and we heartlessly
observe if they subsequently suffer from an increased incidence of MD.
Our design is
H intervenes on S ! MD
1. In individuals who are and are not exposed to our intervention, H must
be the only systematic cause of S that is unequally distributed among
the exposed and the unexposed (so that all of the averaged differences
in level of S in each cohort of our exposed and unexposed subjects
result entirely from H).
2. H must not affect the risk for MD by any route that does not go
through S (e.g., by causing individuals to stop taking antidepressant
medication).
3. H is not itself influenced by any cause that affects MD via a route that
does not go through S, as might occur if individuals prone to depres-
sion were more likely to be selected for H.
In sum, the IM says that questions about whether X causes Y are ques-
tions about what would happen to Y if there were an intervention on X. One
great virtue of the IM is that it allows psychiatrists freedom to use whatever
family of variables seems appropriate to the characterization of a particular
problem. There is no assumption that the variables have to be capable of
figuring in quite general laws of nature, as with the deductive-nomological
approach, or that the variables have to relate to basic spatiotemporal pro-
cesses, as with the mechanistic approach. The fact is that the current evi-
dence points to causal roles for variables of many different types, and the
interventionist approach allows us to make explicit just what those roles are.
For all that, there is a sense in which the approach is completely rigorous. It
is particularly unforgiving in assuring that causation is distinguished from
correlation. Though our exposition here is highly informal, we are providing
an intuitive introduction to ideas whose formal development has been
76 Causality and Psychopathology
vigorously pursued by others (e.g., Spirtes, Glymour, & Scheines, 1993; Pearl,
2001; Woodward, 2003).
If I were to try to put the essence of the IM of causality into a verbal
description, it would be as follows:
Before closing, two points about the possible relationship between the IM
and mechanistic causal models are in order. First, it is in the nature of
science to want to move from findings of causality to a clarification of the
mechanisms involved—whether they are social, psychological, or molecular.
The IM can play a role in this process by helping scientists to focus on
the level at which causal mechanisms are most likely to be operative.
However, a word of caution is in order. Given the extraordinary complexity
of most psychiatric disorders, causal effects (and the mechanisms that
underlie them) may be occurring on several levels. For example, because
cognitive behavioral therapy works for MD and psychological mechanisms
are surely the level at which this process can be currently best understood,
this does not therefore mean that neurochemical interventions on MD
(via pharmacology) cannot also work. On the other hand, although pharma-
cological tools can impact on symptoms of eating disorders, cultural models
of female beauty, although operating at a very different level, can also impact
on risk.
Second, we should briefly ponder the following weighty question: Should
the plausibility of a causal mechanism impact on our interpretation of IMs?
Purists will say ‘‘No!’’ If we design the right study and the results are clear,
then causal imputations follow. Pragmatists, whose position is well
4 Causal Thinking in Psychiatry 77
Personally, I am a bit on the pragmatist’s side, but the purists have a point
well worth remembering.
Summary of the IM
Acknowledgements
This work was supported in part by grant DA-011287 from the US National
Institutes of Health. Much of my thinking in this area has been stimulated
78 Causality and Psychopathology
References
Introduction
Over the last decade, several large-scale randomized trials have reported results
that disagreed substantially with the motivating observational studies on the
value of various chronic disease–prevention strategies. One high-profile exam-
ple of these discrepancies was related to postmenopausal hormone therapy
(HT) use and its effects on cardiovascular disease and cancer. The Women’s
Health Initiative (WHI), a National Heart, Lung, and Blood Institute–spon-
sored program, was designed to test three interventions for the prevention of
chronic diseases in postmenopausal women, each of which was motivated by a
decade or more of analytic epidemiology. Specifically, the trials were testing the
potential for HT to prevent coronary heart disease (CHD), a low-fat eating
pattern to reduce breast and colorectal cancer incidence, and calcium and
vitamin D supplements to prevent hip fractures. Over 68,000 postmenopausal
women were randomized to one, two, or all three randomized clinical trial
(CT) components between 1993 and 1998 at 40 U.S. clinical centers (Anderson
et al., 2003a). The HT component consisted of two parallel trials testing the
effects of conjugated equine estrogens alone (E-alone) among women with
prior hysterectomy and the effect of combined estrogen plus progestin therapy
(E+P), in this case conjugated equine estrogens plus medroxyprogesterone
acetate, among women with an intact uterus, on the incidence of CHD and
overall health.
In 2002, the randomized trial of E+P was stopped early, based on an assess-
ment of risks exceeding benefits for chronic disease prevention, raising con-
cerns among millions of menopausal women and their care providers about
79
80 Causality and Psychopathology
their use of these medicines. The trial confirmed the benefit of HT for
fracture-risk reduction but the expected benefit for CHD, the primary study
end point, was not observed. Rather, the trial results documented increased
risks of CHD, stroke, venous thromboembolism (VTE), and breast cancer
with combined hormones (Writing Group for the Women’s Health Initiative
Investigators, 2002). Approximately 18 months later, the E-alone trial was also
stopped, based on the finding of an adverse effect on stroke rates and the like-
lihood that the study would not confirm the CHD-prevention hypothesis. The
results of this trial revealed a profile of risks and benefits that did not completely
coincide with either the E+P trial results or previous findings from observa-
tional studies (Women’s Health Initiative Steering Committee, 2004).
In conjunction with these trials, the WHI investigators conducted a par-
allel observational study (OS) of 93,676 women recruited from the same
population sources with similar data-collection protocols and follow-up. OS
enrollees were similar in many demographic and chronic disease risk factor
characteristics but were ineligible for or unwilling to be randomized into the
CT (Hays et al. 2003).
Because a substantial fraction of women in the OS were current or former
users of menopausal HT, joint analyses of the effects of HT use in the CT and OS
provide an opportunity to examine design and analysis methods that serve to
compare and contrast these two study designs, to identify some of the strengths
and weakness of each, and to determine the extent to which detailed data ana-
lysis provisions could bring these results into agreement and thereby explain the
discrepancies between these randomized trials and observational studies.
This chapter reviews the motivation for the hormone trials and describes
the major findings for chronic disease effects, with particular attention to the
results that differed from what was hypothesized. Then, the series of joint
analyses of CT and corresponding OS is presented. Finally, some discussion
about the implications of these analyses for the design and analysis of future
studies is provided.
Since the 1940s, women have been offered exogenous estrogens to relieve
menopausal symptoms. The use of unopposed estrogens grew until evidence
of an increased risk of endometrial cancer arose in the 1970s and tempered
enthusiasm for these medicines, at least for the majority of women who had
not had a hysterectomy. With the subsequent information that progestin
effectively countered the carcinogenic effects of estrogen in the endome-
trium, HT prescriptions again climbed (Wysowski, Golden, & Burke, 1995).
Observational studies found that use of HT was associated with lower risks
of osteoporosis and fractures; subsequently, the U.S. Food and Drug
5 Understanding the Effects of Menopausal Hormone Therapy 81
Trial Findings
The independent Data and Safety Monitoring Board terminated the E+P trial
after a mean 5.2 years of follow-up, when the breast-cancer statistic exceeded
the monitoring boundary defined for establishing this adverse effect; and this
statistic was supported by an overall assessment of harms exceeding benefits
for the designated outcomes. Reductions in hip-fracture and colorectal-cancer
incidence rates were observed, but these were outweighed by increases in the
risk of CHD, stroke, and VTE, particularly in the early follow-up period, in
addition to the adverse effect on breast cancer. A prespecified global index,
devised to assist in benefit versus risk monitoring and defined for each
woman as time to the first event for any of the designated clinical events
5 Understanding the Effects of Menopausal Hormone Therapy 83
Table 5.1 Hypothesized Effects of HT at the Time the WHI Began and the Final
Results of the Two HT Trials
To better understand the divergent findings and, if possible, to bring the two
types of studies into agreement, WHI investigators conducted a series of
analyses examining cardiovascular outcomes in the CT and OS data jointly
5 Understanding the Effects of Menopausal Hormone Therapy 85
(Prentice et al., 2005, 2006). The parallel recruitment and follow-up proce-
dures in the OS and CT components of the WHI make this a particularly
interesting exercise since differences in data sources and collection protocols
are minimized.
For both E+P and E-alone, the analogous user and nonuser groups from
the OS were selected for both HT trials. Specifically, for the E+P analyses, OS
women with a uterus who were using an estrogen plus progestin combina-
tion or were not using any HT at baseline were defined as the exposed (n =
17,503) and unexposed (n = 35,551) groups, respectively (Prentice et al.,
2005). Similarly, for E-alone analyses, 21,902 estrogen users and 21,902 nonu-
sers of HT in OS participants reported a prior hysterectomy at baseline
(Prentice et al., 2006). Failure times were defined as time since study enroll-
ment (OS) or randomization (CT). In the CT, follow-up was censored at the
time each intervention was stopped. In the OS, censoring was applied at a
time chosen to give a similar average follow-up time (5.5 years for OS/E+P
and 7.1 years for OS/E-alone). For CT participants, HT exposure was defined
by randomization and analyses were based on the intention-to-treat principle.
In parallel, OS participants’ HT exposure was defined by HT use at the time
of study enrollment.
In OS women, the ratio of age-adjusted event rates in E+P users to that in
nonusers was less than one for CHD (0.71) and stroke (0.77) and close to one
for VTE (1.06), but each was 40%–50% lower than the corresponding statis-
tics from the randomized trial (Table 5.2, upper panel) and therefore similar
to the motivating observational studies. For E-alone, the corresponding ratios
were all less than one (0.68 for CHD, 0.95 for stroke, and 0.78 for VTE) and
30%–40% lower than the CT estimates (Table 5.2, lower panel).
The cardiovascular risk profile (race/ethnicity, education, income, body
mass index [BMI], physical activity, current smoking status, history of cardi-
ovascular disease, and quality of life) among E+P users in the OS was some-
what better than that for OS nonusers (examples of these shown in
Figure 5.1). The distribution of these risk factors in the CT was balanced
across treatment arms but resembled that of the OS nonuser population
more than the corresponding HT user group. A similar pattern of healthy
user bias was observed for E-alone among OS participants.
Aspects of HT exposure also varied between the CT and OS. Among HT
users in the OS, the prevalence of long-term use, defined here as the pre-
enrollment exposure duration for the HT regimen reported at baseline, was
considerably higher than in the CT (Figure 5.2); but few were recent initia-
tors of HT in the OS. In the CT, most participants had never used HT before
or had used it only briefly. In terms of both duration and recency of each
regimen, the distributions in the CT more closely resembled those of the OS
nonusers (Prentice et al., 2005, 2006).
86 Causality and Psychopathology
Table 5.2 Hormone Therapy Hazard Ratios (95% Confidence Intervals) for CHD,1
Stroke, and VTE Estimated Separately in the WHI CT and OS and Jointly with a Ratio
Measure of Agreement (OS/CT) Between the Two Study Components
Estrogen alonec
Age-adjusted 0.96 0.68 0.71 1.37 0.95 0.69 1.33 0.78 0.59
Multivariate 0.97 0.74 0.77 1.35 1.00 0.74 1.39 0.88 0.63
adjusted
By time since
initiation
<2 years 1.07 1.20 1.12 1.69 0.37 0.22 2.36 1.48 0.63
2–5 years 1.13 1.09 0.96 1.14 0.89 0.78 1.31 0.91 0.69
5+ years 0.80 0.73 0.91 1.41 1.01 0.72 1.16 0.85 0.73
Combined 0.89 0.68 0.82
OS/CT
<2 years 1.11 (0.73–1.69) 1.48 (0.89–2.44) 2.18 (1.15–4.13)
2–5 years 1.17 (0.88–1.56) 1.18 (0.83–1.67) 1.22 (0.80–1.85)
5+ years 0.81 (0.62–1.06) 1.48 (1.06–2.06) 1.06 (0.72–1.56)
CHD, coronary heart disease; VTE, venous thromboembolism; CT, clinical trial; OS, observational
study.
a
From Prentice et al. (2005).
b
Adjusted for age, race, body mass index, education, smoking status, age at menopause, and
physical functioning. Hazard ratios accompanied by 95% confidence intervals in combined OS
and CT analyses.
c
From Prentice et al. (2006).
In the trials, the HT tested was conjugated equine estrogens (0.625 mg/day)
with or without medroxyprogesterone acetate (2.5 mg/day). OS women had
access to a broader range of regimens, including different formulations,
doses, and routes of administration; but the majority of HT use reported
Women with a uterus Women with prior hysterectomy
White
OS Non-user OS Non-user Black
Hispanic
CT E+P CT E-alone American Indian
Asian/Pacific Islander
Unknown
CT Placebo CT Placebo
Underweight
OS Non-user OS Non-user Normal
Overweight
CT E+P CT E-alone Obese I
Obese II
Extremely Obese
CT Placebo CT Placebo
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
Figure 5.1 Distribution of selected cardiovascular risk factors by hysterectomy status, study component and hormone use at baseline in the
Observational Study (OS) or randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone.
Derived from Prentice et al, 2005, 2006.
OS E+P user OS E-alone user
OS Non-user OS Non-user
Never smoked
Past smoked
CT E+P CT E-alone
Current smoker
CT Placebo CT Placebo
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
Figure 5.1 (Con’t).
Women with a uterus Women with a prior hysterectomy
Duration of prior E+P use
0% 20% 40% 60% 80% 100% 80% 85% 90% 95% 100%
0% 20% 40% 60% 80% 100% 80% 85% 90% 95% 100%
Figure 5.2 Hormone therapy exposure history by hysterectomy status, study component and hormone use at baseline in the Observational Study (OS) or
randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone. Derived from Prentice et al, 2005, 2006.
Duration of prior E-alone use
75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%
75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%
was of the same dose of the same oral estrogens and progestin as used in
the CT, with another substantial fraction taking different doses of the same
preparation (Prentice et al., 2005, 2006), suggesting that differences in the
medications used is an unlikely source of the discrepant results.
In combined analyses for E+P (Table 5.2, upper panel), the estimates of
the OS/CT ratios were not significantly different from one for CHD and VTE
(0.93 and 0.84, respectively), suggesting that this model provided reasonable
agreement between the two studies. For stroke there is evidence of residual
bias (OS/CT = 0.76). These combined analyses describe a pattern of early
risks for all three of these cardiovascular disease outcomes and continuing
risk for VTE.
Adherence-adjusted versions of these analyses tended to yield hazard
ratios with somewhat greater departures from unity than those shown in
Tables 5.1 and 5.2 but had little effect on comparative hazard ratios between
the CT and OS. Those analyses involved a particularly simple form of adher-
ence adjustment, with follow-up time censored six months after a change in
status from HT user to nonuser or vice versa. Inverse adherence probability-
weighted estimating procedures (e.g., Robbins & Finkelstein, 2000) can also
be recommended for this problem.
For E-alone, the joint analysis produced excellent agreement for CHD (ratio
of E-alone hazard ratios in the OS to that in the CT was OS/CT = 0.89) and some
improvement in the alignment for VTE (OS/CT = 0.82) (Table 5.2, lower
panel). The effects of E-alone on stroke risk also appeared to differ between
the OS and CT (OS/CT = 0.68). These OS/CT ratios were very similar to
those for E+P, but the pattern of E-alone effects was somewhat different.
These combined analyses provide evidence of an early increased risk for both
stroke and VTE, with the adverse effect on stroke rates continuing beyond
year 5 but no significant effects on CHD.
Additional analyses to examine the source of discrepancies between
the estimated hormone effects on breast cancer and colorectal cancer in
the two trials and the motivating literature can be understood by similar
comparisons of the OS and CT cohorts that have recently been published.
In the E+P trial, the elevated risk of breast cancer was generally expected but
the estimated effect size was considerably smaller than other investigators
have reported from observational studies (e.g., Million Women Study
Collaborators, 2003). The fact that the E-alone hazard ratio for breast
cancer was less than one, though not statistically significant, presents another
puzzle in that estrogen has long been considered to have a carcinogenic
effect in the breast. In joint OS/CT analyses, the discrepancy between the
trials and the OS regarding HT effects on breast-cancer risk could be
accounted for only by modeling the effects of the time between menopause
and first HT use as well as the time since initiation of current HT episode,
mammography use, and traditional confounding factors (Prentice et al., 2008a,
2008b). The contrasts for colorectal cancer are similarly interesting in
that a protective effect was observed with E+P but not with E-alone, even
though the literature had identified a potential beneficial role for estrogen.
5 Understanding the Effects of Menopausal Hormone Therapy 93
OS Adjustment 1 Adjustment 2 CT
Figure 5.3 HT hazard ratios in the Observational Study based on a simple multivariate
model (OS), with adjustment for the OS/CT hazard ratio estimated from the alternate
trial (Adjustment 1), and assuming proportional hazards for E-alone to E+P
(Adjustment 2), compared to the corresponding Clinical Trial hazard ratios (CT).
Derived from Prentice et al, 2005, 2006.
5 Understanding the Effects of Menopausal Hormone Therapy 95
OS Adjustment 1 Adjustment 2 CT
Figure 5.3 (Con’t).
OS/CT analyses suggest a reduced risk of CHD among these younger women
with prior hysterectomy.
Discussion
The stark contrasts between the results from a large number of observational
studies and the WHI randomized trials of menopausal HT provide impetus
for reflection on the role of observational studies in evaluating therapies.
96 Causality and Psychopathology
Despite the usual effort to control for potential confounders in most previous
observational studies, the replication of findings of CHD benefit and breast-
cancer risk with HT across different study populations and study designs, and
support from mechanistic studies, clinically relevant aspects of the relation-
ship between HT and risk for several chronic diseases were not appreciated
until the WHI randomized trial results were published. The reliance on
lower-level evidence may have exposed millions of women to small increases
in risk of several serious adverse effects.
Randomized trials have their own limitations. In this example, the WHI
HT trials tested two specific regimens in a population considered appropriate
for CHD prevention. As many have claimed, the trial design did not fully
reflect the way HT had been used in practice—prescribed near the time of
menopause, with possible tailoring of regimen to the individual. Also, while
the WHI tested HT in the largest number of women in the 50–59 year age
range ever studied, using the same agents and dosages used by the vast
majority of U.S. women, estimates of HT effects within this subgroup
remain imprecise because of the very low event rate.
This example raises many questions with regard in the public-health
research enterprise. When is it reasonable to rely on second-tier evidence
to test a hypothesis? Are there better methods to test these hypotheses?
Can we learn more from our trials, and can we use this to make observa-
tional studies more reliable?
There are insufficient resources to conduct full-scale randomized trials of
the numerous hypotheses of interest in public health and clinical medicine.
Observational studies will remain a mainstay of our research portfolio but
methods to increase the reliability of observational study results, through
better designs and analytic tools, are clearly needed. Nevertheless, when
assessing an intervention of public-health significance, the WHI
experience suggests that the evaluation needs to be anchored in a rando-
mized trial. It seems highly unlikely that the importance of the time-
dependent effect of HT on cardiovascular disease would have been recognized
without the Heart Estrogen–Progestin Replacement Study (Hulley et al.,
1998) and the WHI randomized trials. Neither observational studies con-
ducted before WHI nor the WHI OS itself would have observed these early
adverse cardiovascular disease effects without the direction from the trials to
look for it.
The statistical alignment of the OS and CT results relied on several other
factors. Detailed information on the history of HT use, an extensive database
of potential confounders, and meticulous modeling of these factors were
critical. For an exposure that is more complex, such as dietary intake or
physical activity, the measurement problems are likely too great to permit
such an approach. Less obvious but probably at least as important was the
5 Understanding the Effects of Menopausal Hormone Therapy 97
References
Adams, M. R., Kaplan, J. R., Manuck, S. B., Koritinik, D. R., Parks, J. S., Wolfe, M. S.,
et al. (1990). Inhibition of coronary artery atherosclerosis by 17-b estradiol in
ovariectomized monkeys: Lack of an effect of added progesterone. Arteriosclerosis,
10, 1051–1057.
Anderson, G. L., Manson, J. E., Wallace, R., Lund, B., Hall, D., Davis, S., et al. (2003a).
Implementation of the WHI design. Annals of Epidemiology, 13, S5–S17.
Anderson, G. L., Judd, H. L., Kaunitz, A. M., Barad, D. H., Beresford, S. A. A.,
Pettinger, M., et al. (2003b). Effects of estrogen plus progestin on gynecologic
cancers and associated diagnostic procedures: The Women’s Health Initiative ran-
domized trial. Journal of the American Medical Association, 290(13), 1739–1748.
Anderson, G. L., Kooperberg, C., Geller, N., Rossouw, J. E., Pettinger, M., & Prentice,
R. L. (2007). Monitoring and reporting of the Women’s Health Initiative randomized
hormone therapy trials. Clinical Trials, 4, 207–217.
Barrett-Connor, E., & Grady, D. (1998). Hormone replacement therapy, heart disease
and other considerations. Annual Review of Public Health, 19, 55–72.
Bush, T. L., Barrett-Connor, E., Cowan, L. D., Criqui, M. H., Wallace, R. B.,
Suchindran, C. M., et al. (1987). Cardiovascular mortality and noncontraceptive
use of estrogen in women: Results from the Lipid Research Clinics Program
Follow-up Study. Circulation, 75, 1102–1109.
Cauley, J. A., Robbins, J., Chen, Z., Cummings, S. R., Jackson, R. D., LaCroix, A. Z.,
et al. (2003). Effects of estrogen plus progestin on risk of fracture and bone mineral
density: The Women’s Health Initiative randomized trial. Journal of the American
Medical Association, 290, 1729–1738.
98 Causality and Psychopathology
Chlebowski, R. T., Hendrix, S. L., Langer, R. D., Stefanick, M. L., Gass, M., Lane, D.,
et al. (2003). Influence of estrogen plus progestin on breast cancer and mammo-
graphy in healthy postmenopausal women: The Women’s Health Initiative rando-
mized trial. Journal of the American Medical Association, 289, 3243–3253.
Chlebowski, R. T., Wactawski-Wende, J., Ritenbaugh, C., Hubbell, F. A., Ascensao, J.,
Rodabough, R. J., et al. (2004). Estrogen plus progestin and colorectal cancer in
postmenopausal women. New England Journal of Medicine, 350, 991–1004.
Clarkson, T. B., Anthony, M. S., & Klein, K. P. (1996). Hormone replacement therapy
and coronary artery atherosclerosis: The monkey model. British Journal of Obstetrics
and Gynaecology, 103(Suppl. 13), 53–58.
Curb, J. D., Prentice, R. L., Bray, P. F., Langer, R. D., Van Horn, L., Barnabei, V. M.,
et al. (2006). Venous thrombosis and conjugated equine estrogen in women without
a uterus. Archives of Internal Medicine, 166, 772–780.
Cushman, M., Kuller, L. H., Prentice, R., Rodabough, R. J., Psaty, B. M., Stafford, R. S.,
et al. (2004). Estrogen plus progestin and risk of venous thrombosis. Journal of the
American Medical Association, 292, 1573–1580.
Food and Nutrition Board and Board on Health Sciences Policy (1993). An assessment
of the NIH Women’s Health Initiative. S. Thaul and D. Hotra (Eds.). Washington, DC:
National Academy Press.
Grady, D., Rubin, S. M., Pettiti, D. B., Fox, C. S., Black, D, Ettinger, B., et al. (1992).
Hormone therapy to prevent disease and prolong life in postmenopausal women.
Annals of Internal Medicine, 117, 1016–1036.
Hays, J., Hunt, J. R., Hubbell, F. A., Anderson, G. L., Limacher, M., Allen, C., et al.
(2003). The Women’s Health Initiative recruitment methods and results. Annals of
Epidemiology, 13, S18–S77.
Hendrix, S. L., Wassertheil-Smoller, S., Johnson, K. C., Howard, B. V., Kooperberg, C.,
Rossouw, J. E., et al. (2006). Effects of conjugated equine estrogen on stroke in the
Women’s Health Initiative. Circulation, 113, 2425–2434.
Hersh, A. L., Stefanick, M., & Stafford, R. S. (2004). National use of postmenopausal
hormone therapy. Journal of the American Medical Association, 291, 47–53.
Hough, J. L., & Zilversmit, D. B. (1986). Effect of 17-b estradiol on aortic cholesterol
content and metabolism in cholesterol-fed rabits. Arteriosclerosis, 6, 57–64.
Hsia, J., Langer, R. D., Manson, J. E., Kuller, L., Johnson, K. C., Hendrix, S. L., et al.
(2006). Conjugated equine estrogens and coronary heart disease: The Women’s
Health Initiative. Archives of Internal Medicine, 166, 357–365.
Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B., et al. (1998).
Randomized trial of estrogen plus progestin for secondary prevention of coronary
heart disease in postmenopausal women. Journal of the American Medical Association,
280, 605–613.
Jackson, R. D., Wactawski-Wende, J., LaCroix, A. Z., Pettinger, M., Yood, R. A., Watts,
N. B., et al. (2006). Effects of conjugated equine estrogen on risk of fractures and
BMD in postmenopausal women with hysterectomy: Results from the Women’s
Health Initiative randomized trial. Journal of Bone and Mineral Research, 21,
817–828.
Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., et al.
(2003). Estrogen plus progestin and the risk of coronary heart disease. New England
Journal of Medicine, 349, 523–534.
Million Women Study Collaborators (2003). Breast cancer and hormone replacement
therapy in the Million Women Study. Lancet, 362, 419–427.
5 Understanding the Effects of Menopausal Hormone Therapy 99
Pick, R., Stamler, J., Robard, S., & Katz, L. N. (1952). The inhibition of coronary
atherosclerosis by estrogens in cholesterol-fed chicks. Circulation, 6, 276–280.
Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G.,
et al. (2005). Combined postmenopausal hormone therapy and cardiovascular
disease: Toward resolving the discrepancy between Women’s Health Initiative
clinical trial and observational study results. American Journal of Epidemiology, 162,
404–414.
Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G.,
et al. (2006). Combined analysis of Women’s Health Initiative observational and
clinical trial data on postmenopausal hormone treatment and cardiovascular disease.
American Journal of Epidemiology, 163, 589–599.
Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Pettinger, M.,
Hendrix, S. L., et al. (2008a). Estrogen plus progestin therapy and breast cancer
in recently postmenopausal women. American Journal of Epidemiology, 167,
1207–1216.
Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Langer, R. D.,
Pettinger, M., et al. (2008b). Conjugated equine estrogens and breast cancer risk
in the Women’s Health Initiative clinical trial and observational study. American
Journal of Epidemiology, 167, 1407–1415.
Prentice, R. L., Pettinger, M., Beresford, S. A., Wactawski-Wende, J., Hubbell, F. A.,
Stefanick, M. L., et al. (2009). Colorectal cancer in relation to postmenopausal
estrogen and estrogen plus progestin in the Women’s Health Initiative clinical
trial and observational study. Cancer Epidemiology, Biomarkers and Prevention, 18,
1531–1537.
Ritenbaugh, C., Stanford, J. L., Wu, L., Shikany, J. M., Schoen, R. E., Stefanick, M. L.,
et al. (2008). Conjugated equine estrogens and colorectal cancer incidence
and survival: The Women’s Health Initiative randomized clinical trial. Cancer
Epidemiology, Biomarkers and Prevention, 17, 2609–2618.
Robbins, J., & Finklestein, D. (2000). Correcting for non-compliance and dependent
censoring in an AIDS clinical trial with inverse probability of censoring weighted
(IPCW) log-rank tests. Biometrics, 56, 779–788.
Rossouw, J. E., Prentice, R. L., Manson, J. E., Wu, L., Barad, D., Barnabei, V. M., et al.
(2007). Postmenopausal hormone therapy and risk of cardiovascular disease by
age and years since menopause. Journal of the American Medical Association, 297,
1465–1477.
Stampfer, M. J., & Colditz, G. A. (1991). Estrogen replacement therapy and coronary
heart disease: A quantitative assessment of the epidemiologic evidence. Preventive
Medicine, 20, 47–63.
Stefanick, M. L., Anderson, G. L., Margolis, K. L., Hendrix, S. L., Rodabough, R. J.,
Paskett, E. D., et al. (2006). Effects of conjugated equine estrogens on breast cancer
and mammography screening in postmenopausal women with hysterectomy.
Journal of the American Medical Association, 295, 1647–1657.
Steinberg, K. K., Thacker, S. B., Smith, S. J., Stroup, D. F., Zack, M. M., Flanders, W. D.,
et al. (1991). A meta-analysis of the effect of estrogen replacement therapy on the
risk of breast cancer. Journal of the American Medical Association, 265, 1985–1990.
Wassertheil-Smoller, S., Hendrix, S. L., Limacher, M., Heiss, G., Kooperberg, C., Baird,
A., et al. (2003). Effect of estrogen plus progestin on stroke in postmenopausal
women: The Women’s Health Initiative: a randomized trial. Journal of the
American Medical Association, 289, 2673–2684.
100 Causality and Psychopathology
Innovations in Methods
This page intentionally left blank
6
Introduction
103
104 Causality and Psychopathology
agnostic approach does not require this. In fact, the counterfactual theory
logically subsumes the agnostic theory in the sense that the counterfactual
approach is logically an extension of the latter approach. In particular, for a
given graph the causal contrasts (i.e. parameters) that are well-defined under
the agnostic approach are also well-defined under the counterfactual
approach. This set of contrasts corresponds to the set of contrasts between
treatment regimes (strategies) which could be implemented in an experiment
with sequential treatment assignments (ideal interventions), wherein the
treatment given at stage m is a (possibly random) function of past covariates
on the graph. We refer to such contrasts or parameters as ‘manipulable with
respect to a given graph’. As discussed further in Section 1.8, the set of
manipulable contrasts for a given graph are identified under the associated
agnostic causal model from observational data with a positive joint distribu-
tion and no hidden (i.e. unmeasured) variables. A parameter is said to be
identified if it can be expressed as a known function of the distribution of the
observed data. A discrete joint distribution is positive if the probability of a
joint event is nonzero whenever the marginal probability of each individual
component of the event is nonzero.
Although the agnostic theory is contained within the counterfactual
theory, the reverse does not hold. There are causal contrasts that are well-
defined within the counterfactual approach that have no direct analog within
the agnostic approach. An example that we shall discuss in detail is the pure
direct effect (also known as a natural direct effect) introduced in Robins and
Greenland (1992). The pure direct effect (PDE) of a binary treatment X on Y
relative to an intermediate variable Z is the effect the treatment X would have
had on Y had (contrary to fact) the effect of X on Z been blocked. The PDE is
non-manipulable relative to X, Y and Z in the sense that, in the absence of
additional assumptions, the PDE does not correspond to a contrast between
treatment regimes of any randomized experiment performed via interven-
tions on X, Y and Z.
In this chapter, we discuss three counterfactual models, all of which
agree in two important respects: first they agree on the set of well-defined
causal contrasts; second they make the consistency assumption that the effect
of a (possibly joint) treatment on a given subject depends neither on whether
the treatment was freely chosen by, versus forced on, the subject nor on
the treatments received by other subjects. However the counterfactual
models do not agree as to the subset of these contrasts that can be identified
from observational data with a positive joint distribution and no hidden
variables. Identifiability of causal contrasts in counterfactual models is
obtained by assuming that (possibly conditional on prior history) the treat-
ment received at a given time is independent of some set of counter-
factual outcomes. Different versions of this independence assumption are
6 Alternative Graphical Causal Models 105
X Z
(ii) Vm ðv m1 Þ Vm ðpam Þ is a function of v m1 only through the values pam
of Vm’s parents on G.
For example, were the edge X ! Y missing in Figure 6.1, this assump-
tion would imply that Y(x, z) ¼ Y(z) for every subject and every z. That
is, the absence of the edge would imply that smoking X has no effect
on Y other than through its effect on Z.
(iii) Both the factual variables Vm and the counterfactuals Vm(r) for any
R V are obtained recursively from the one-step-ahead counter-
factuals Vj ðv j1 Þ, for j m. For example, V3 ¼ V3(V1, V2(V1)) and
V3(v1) ¼ V3(v1, V2(v1)).
Thus, in Figure 6.1, with the treatment R being smoking X, a sub-
ject’s possibly counterfactual MI status Y(x ¼ 1) ¼ V3(v1 ¼ 1) had he
been forced to smoke is Y(x ¼ 1, Z(x ¼ 1)) and, thus, is completely
determined by the one-step-ahead counterfactuals Z(x) and Y(x, z).
That is, Y(x ¼ 1) is obtained by evaluating the one-step-ahead counter-
factual Y(x ¼ 1, z) at z ¼ Z(x ¼ 1). Similarly, a subject’s factual X and
one-step-ahead counterfactuals determine the subject’s factual hyper-
tensive status Z and MI status Y as Z(X) and Y(X, Z(X)) where Z(X) is
the counterfactual Z(x) evaluated at x ¼ X and Y(X, Z(X)) is the coun-
terfactual Y(x, z) evaluated at (x, z)¼ (X, Z(X)).
6 Alternative Graphical Causal Models 111
where for a fixed v M1 , v k ¼ ðv1 ; . . . ;vk Þ, k < M 1 denotes the initial
subvector of v M1 .
and
Yðx ¼ 1; zÞ ?
? Zðx ¼ 1Þ j X ¼ 1; Yðx ¼ 0; zÞ ?
? Zðx ¼ 0Þ j X ¼ 0 ð6:3Þ
are true statements by assumption (iv). However, the model makes no claim
as to whether
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j X ¼ 0
and
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j X ¼ 1
are true because, for example, the value of x in Y(x ¼ 1, z) differs from the
value x ¼ 0 in Z(x ¼ 0). We shall see that all four of the above independence
statements are true by assumption under the NPSEM associated with the
graph in Figure 6.1.
112 Causality and Psychopathology
f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1; Vm ¼ vm Þ
¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ;
(b) Furthermore, the set of independences (6.6) is the same for any ordering of
the variables compatible with the descendant relationships in G.
ii) for j ¼ 1 to j ¼ M m,
=mþj;m :
V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ?
? Vmþj ðvmþj1 Þ j
V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; V m ðvm1 Þ¼ vm :
First, note that the set of independences in condition (6.1) is precisely =1.
Now, if the collection =m holds (for m < M 2) then =m+1 holds since (I) the
set =m+1 is precisely the set {=m+1,m , . . . , =M,m} except with Vm ðv m1 Þ removed
from all conditioning events and (II) =m,m licenses such removal. Thus,
beginning with =1, we recursively obtain that =m and thus =m,m holds for
m ¼ 1, . . . , M 1. The latter immediately implies that the variables Vm+1ðv m Þ,
m ¼ 0, . . . , M 1 are mutually independent.
(() The reverse implication is immediate upon noting that the con-
ditioning event V m1 ¼ v m1 in Equation (6.1) is the event V0 ¼ v0,
V1(v0) ¼ v1, . . . ,Vm1ðv m2 Þ ¼ vm1.
(a) The set of independences in condition (6.5) is satisfied if and only if for
each v M1 2 V M1 , and each m 2 {1, . . . , M 1},
Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ?? IðVm ðvm1 Þ ¼ vm Þ: ð6:7Þ
(b) Furthermore, the set of independences (6.7) is the same for any ordering of
the variables compatible with the descendant relationships in G.
Corollary 3 An MCM implies that for all v M1 2 V M1 , the random variables
IðVmþ1 ðv m Þ ¼ vmþ1 Þ, m ¼ 0, . . . , M 1 are mutually independent.
Proof of Theorem 2(a): ()) Given v M1 , the proof exactly follows that of the
previous theorem when we redefine:
i) =m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ?? IðVm ðvm1 Þ ¼ vm Þ; and
ii) for j ¼ 1 to j ¼ M m,
=mþj;m :
V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ?
? IðVmþj ðvmþj1 Þ ¼ vmþj Þ j
V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; Vm ðvm1 Þ¼ vm :
The reverse implication and (b) follows as in the proof of the previous
theorem. œ
That is, conditional on the factual past V m1 ¼ v m1 , the counterfactual
Vm ðv
m1 Þ is statistically independent of all future one-step ahead counterfac-
tuals. This implies that all four statements in Example 1 are true under an
NPSEM; see also Pearl (2000, Section 3.6.3).
Hence, in an MCM or FFRCISTG model, in contrast to an NPSEM, the
defining independences are those for which the value of v m1 in (a) the
conditioning event, (b) the counterfactual Vm at m and (c) the set of future
one-step-ahead counterfactuals fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg are equal. Thus, an
FFRCISTG assumes independence of fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg and
6 Alternative Graphical Causal Models 115
Vm ðv
m1 Þ given Vm1 ¼ v m1 only when v m1 ¼ v m1 ¼ v m1 . As mentioned
above, the MCM further weakens the independence by replacing Vm with
I(Vm ¼ vm).
In Appendix B we describe a data-generating process leading to a counter-
factual model that is an MCM/FFRCISTG model associated with Figure 6.1,
but not an NPSEM for this figure.
Understanding the implications of these additional counterfactual inde-
pendences assumed by an NPSEM compared to an MCM or FFRCISTG
model is one of the central themes of this chapter.
Lemma 4 In an MCM associated with DAG G, for all v such that f (v) > 0, the
density f (v) P(V ¼ v) of the factuals V satisfies the Markov factorization
Y
M
f ðvÞ ¼ f ðvj j paj Þ: ð6:9Þ
j¼1
Robins (1986) proved Lemma 4 for an FFRCISTG model; the proof applies
equally to an MCM. Equation (6.9) is equivalent to the statement that each
variable Vm is conditionally independent of its non-descendants given its
parents (Pearl, 1988).
(Q
j:Vj 62R f ðvj j paj Þ if v ¼ ðu; rÞ;
Pr ðV ¼ vÞ fr ðvÞ
0 if v ¼ ðu; r Þ with r 6¼ r:
and for Z V, fpR ðzÞ vnz fpR ðvÞ. Thus the marginal fpR(z) is obtained from
fpR(v) by summation in the usual way. Then, we have the following extension
of Lemma 6.
6 Alternative Graphical Causal Models 117
2 Direct Effects
Consider the following query: Do cigarettes (X) have a causal effect on MI (Y)
through a pathway that does not involve hypertension (Z)? This query is
often rephrased as whether X has a direct causal effect on Y not through
the intermediate variable Z. The concept of direct effect has been formalized
in three different ways in the literature. For notational simplicity we always
take X to be binary, except where noted in Appendix A.
on Y in the study population were, contrary to fact, all subjects to have Z set
to 1. It is possible for CDE(1) to be zero and CDE(0) to be nonzero or vice
versa. Whenever CDE(z) is nonzero for some level of Z, there will exist a
directed path from X to Y not through Z on the causal graph G, regardless of
the causal model.
Yðx ¼ 1; zÞ Yðx ¼ 0; zÞ
Robins (1986, Sec. 12.2) first proposed using PSDE(z) to define causal
effects. In his article, Y ¼ 1 denoted the indicator of death from a cause of
interest (subsequent to a time t), Z ¼ 0 denoted the indicator of survival until
t from competing causes, and the contrast PSDE(z) was used to solve the
problem of censoring by competing causes of death in defining the causal
effect of the treatment X on the cause Y. Rubin (1998) and Frangakis and
Rubin (1999, 2002) later used this same contrast to solve precisely the same
problem of ‘‘censoring by death.’’ Finally, the analysis of Rubin (2004) was
also based on this contrast, except that Z and Y were no longer assumed to be
failure-time indicators.
The argument given below in Sec. 4 to prove that E[Y(x ¼ 1, Z(x ¼ 0))] is
not a manipulable effect relative to the graph in Figure 6.1 also proves that
PSDE(z) is not a manipulable effect relative to this graph. Furthermore, the
PSDE(z) represents a causal contrast on a non-identifiable subset of the study
population — the subset with Z(1) ¼ Z(0) ¼ z. An even greater potential pro-
blem with the PSDE is that if X has an effect on every subject’s Z, then
PSDE(z) is undefined for every possible z. If Z is continuous and/or multi-
variate, it would not be unusual for X to have an effect on every subject’s Z.
Thus, Z is generally chosen to be univariate and discrete with few levels,
often binary when PSDE(z) is the causal contrast.
However, principal stratum direct effects have the potential advantage of
remaining well-defined even when controlled direct effects or pure direct
effects are ill-defined. Note that for a subject with Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z,
we have Y(x ¼ 1, z) ¼ Y(x ¼ 1, Z(x ¼ 1)) Y(x ¼ 1) and Y(x ¼ 0, z) ¼ Y(0,
Z(0)) Y(x ¼ 0), so the individual PSDE for this subject is Y(x ¼ 1)
Y(x ¼ 0). The average PSDE is given by:
Thus, PSDE’s can be defined in terms of the counterfactuals Y(x) and Z(x).
Now, in a trial where X is randomly assigned but the intermediate Z is
not, there will generally be reasonable agreement as to the hypothetical
intervention (i.e., closest possible world) which sets X to x so Y(x) and
Z(x) are well defined; however, there may not be reasonable agreement
122 Causality and Psychopathology
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ for all z; ð6:11Þ
then
X
int int
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ Ex¼1;z ½Y fx¼0 ðzÞ; ð6:12Þ
z
6 Alternative Graphical Causal Models 123
because
X
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; zÞ j Zðx ¼ 0Þ ¼ zP½Zðx ¼ 0Þ ¼ z
z
X
¼ E ½Yðx ¼ 1; zÞP½Zðx ¼ 0Þ ¼ z;
z
where the first equality is by the laws of probability and the second by (6.11).
Now, the right side of Equation (6.12) is non-parametrically identified from
int
f (v) under all four causal models since the intervention parameters Ex;z ½Y
int
and fx ðzÞ are identified by the g-functional. In particular, with Figure 6.1 as
the causal DAG,
X X
int int
Ex¼1;z ½Y fx¼0 ðzÞ ¼ E ½Y j X ¼ 1; Z ¼ zf ðz j X ¼ 0Þ: ð6:13Þ
z z
Hence, it remains only to show that (6.11) holds for an NPSEM corre-
sponding to the graph in Figure 6.1. Now, we noted in Example 1 that
Y(x ¼ 1, z) ?? Z(x ¼ 0) | X ¼ j held for j ¼ 0 and j ¼ 1 for the NPSEM (but
not for the FFRCISTG) associated with the DAG in Figure 6.1. Further, for
this NPSEM, {Y(x ¼ 1, z), Z(x ¼ 0)} ? ? X. Combining, we conclude that (6.11)
holds. In contrast, for an FFRCISTG model or MCM corresponding to Figure
6.1, E[Y(x ¼ 1, Z(x ¼ 0))] is not identified, because condition (6.11) need not
hold. In Appendix C we derive sharp bounds for the PDE under the assump-
tion that the FFRCISTG model or the MCM associated with graph G holds.
We find that these bounds may be quite informative, even though the PDE is
not (point) identified under this model.
X Z L X Z L
(a) Y (b) Y
Figure 6.2 An elaboration of the DAG in Figure 6.2 in which L is a (measured) common
cause of Z and Y.
6 Alternative Graphical Causal Models 125
This follows from the fact that under an NPSEM associated with the DAG in
Figure 6.2(a),
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j L for all z; ð6:14Þ
X
int int
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ Ex¼1;z ½Y j L ¼ l fx¼0 ðz j L ¼ lÞf ðlÞ: ð6:15Þ
z;l
The right side of (6.15) remains identified under all four causal models via
X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ: ð6:16Þ
z;l
X
E ½YjX ¼ 1; Z ¼ z; L ¼ l f ðzjX ¼ 0; L ¼ lÞf ðljX ¼ 0Þ:
z;l
Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0ÞÞ ¼ Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ: ð6:17Þ
Avin et al. (2005) prove that Equation (6.14) does not hold for this NPSEM.
Thus, even under an NPSEM, we cannot conclude that Equation (6.15) holds.
In fact, Avin et al. (2005) prove that for this NPSEM E[Y(x ¼ 1, Z(x ¼ 0))] is
not identified from data on V. This is because the expression on the right-
hand side of Equation (6.17) involves both L(x ¼ 1) and L(x ¼ 0), and there is
no way to eliminate either.
126 Causality and Psychopathology
Lðx ¼ 0Þ ?
? Lðx ¼ 1Þ ð6:18Þ
then we have
f ðZ ¼ zjX ¼ 0; L ¼ l Þ: ð6:19Þ
Here, the second and fourth equalities follow from the usual NPSEM inde-
pendence restrictions but the third requires condition (6.18).
One setting under which (6.18) holds is that in which the counterfactual
variables L(0) and L(1) result from a restrictive ‘minimal sufficient cause
model’ (Rothman, 1976) such as
where A0 and A1 are independent both of one another and of all other
counterfactuals. Note that (6.18) would not hold if the right-hand side of
Equation (6.20) was (1 x)A0 + xA1 + A2, even if the Ai’s were again assumed
to be independent (VanderWeele & Robins, 2007).
An alternative further assumption, sufficient to identify the PDE in the
context of the NPSEM associated with Figure 6.2(b), is that L(1) is a
6 Alternative Graphical Causal Models 127
deterministic function of L(0), i.e., L(1) ¼ g(L(0)) for some function g(). In
this case, we have:
f ðZ ¼ zjX ¼ 0; L ¼ l Þ:
where F() and F 1() indicate the cumulative distribution function (CDF)
and its inverse; the equality follows from the NPSEM assumptions; this
expression shows that gð Þ is identified. (Since L is continuous, the sums
over l, l* in Equation (6.21) are replaced by integrals.) A special case of this
example is a linear structural equation system, where it was already known
that the PDE is identified in the graph in Figure 6.2(b). Our analysis shows
that identification of the PDE in this graph merely requires rank preservation
and not linearity. Note that a linear structural equation model implies both
rank preservation and linearity.
We note that the identifying formula in Equation (6.21) differs from
Equation (6.19). Since neither identifying assumption imposes any restriction
on the distribution of the factual variables in the DAG in Figure 6.2(b), there
is no empirical basis for deciding which, if either, of the assumptions is true.
Consequently, we do not advocate blithely adopting such assumptions in
order to preserve identification of the PDE in contexts such as the DAG in
Figure 6.2(b).
128 Causality and Psychopathology
X N Z
Figure 6.3 An elaboration of the DAG in Figure 6.1; N and O are, respectively, the nicotine
and non-nicotine components of tobacco; thicker edges indicate deterministic relations.
130 Causality and Psychopathology
int
as true. Under such a supposition En¼0;o¼1 ½Y is identified if En¼0;o¼1 ½Y is a
well-defined function of f (v). Note, under f (v), data on (X, Z, Y) are equivalent
to data on V ¼ (X, N, O, Z, Y), since X completely determines O and N in the
factual data. We now show that, with Figure 6.3 as the causal DAG and
int
V ¼ (X, N, O, Z, Y), under all four causal models, En¼0;o¼1 ½Y is identified
simply by applying the g-formula density in standard fashion. This result
may seem surprising at first since no subject in the actual study data fol-
lowed the regime (n ¼ 0, o ¼ 1), so the standard positivity assumption
P[N ¼ 0, O ¼ 1] > 0 usually needed to make the g-formula density fn¼0,o¼1(v)
a function of f (v) (and thus identifiable) fails.
However, as we now demonstrate, even without positivity, the conditional
independences implied by the assumptions of no direct effect of N on Y and
no effect of O on Z encoded in the missing arrows from N to Y and O to Z in
Figure 6.3 along with the deterministic relationship between O, N, and X
under f (v) allow one to obtain identification. Specifically, under the DAG in
Figure 6.3,
fn¼0;o¼1 ðy; zÞ ¼ f ðy j O ¼ 1; zÞ f ðz j N ¼ 0Þ
¼ f ðy j O ¼ 1; N ¼ 1; zÞ f ðz j N ¼ 0; O ¼ 0Þ
¼ f ðy j X ¼ 1; zÞ f ðz j X ¼ 0Þ;
where the first equality is by definition of the g-formula density fn¼0,o¼1(y, z),
the second by the conditional independence relations encoded in the DAG in
Figure 6.3, and the last by the deterministic relationships between O, N, and
X under f (v) with V ¼ (X, N, O, Z, Y). Thus
X
En¼0;o¼1 ½Y y fn¼0;o¼1 ðy; zÞ
y;z
X
¼ y f ðy j X ¼ 1; zÞf ðz j X ¼ 0Þ
y;z
X
E ½Y j X ¼ 1; Z ¼ z f ðz j X ¼ 0Þ;
z
which is a function of f (v) with V ¼ (X, N, O, Z, Y). Note that this argument
goes through even if Z and/or Y are non-binary, continuous variables.
The identifying formula under all four causal models associated with the
DAG in Figure 6.3 is the identifying formula Pearl obtained when represent-
ing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under the NPSEM
associated with the DAG in Figure 6.1.
6 Alternative Graphical Causal Models 131
For Pearl, having at the outset assumed an NPSEM associated with the
DAG in Figure 6.1, the story did not contribute to identification; rather, it
served only to show that the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))]
of the NPSEM associated with the DAG in Figure 6.1 could, under the
scenario of our story, encode a substantively important parameter — the
manipulable causal effect of setting N to 0 and O to 1 on the extended
causal model associated with the DAG in Figure 6.3. However, from the
refutationist point of view, it is the story itself that make’s Pearl’s claim
that E½Yðx ¼ 1;Zðx ¼ 0ÞÞ ¼z E½YjX ¼ 1;z f ðzjO ¼ 1Þ refutable and, thus,
scientifically meaningful. Specifically, when nicotine-free cigarettes become
available, Pearl’s claim can be tested by an intervention that forces a
random sample of the population to smoke nicotine-free cigarettes.
For someone willing to entertain only an agnostic causal model, the infor-
mation necessary to identify the effect of nicotine-free cigarettes was con-
tained in the story as the parameter E[Y(x ¼ 1, Z(x ¼ 0))] is undefined without
the story. [Someone, such as Dawid (2000), opposed to counterfactuals and
thus wedded to the agnostic causal model, might then reasonably and appro-
int int int int
priately choose to define En¼0;o¼1 ½Y En¼0;o¼0 ½Y ¼ En¼0;o¼1 ½Y Ex¼0 ½Y to
be the natural or pure direct effect of X not through Z. This definition differs
from, and in our view is preferable to, the definition of DDG (2006) discussed
previously: The definition of DDG fails to correspond to the concept of the
PDE as used in the literature since its introduction in Robins and Greenland
(1992).]
For an analyst who had assumed that the MCM, but not necessarily the
NPSEM, associated with the DAG in Figure 6.1 was true, the information
contained in the above story licenses the assumption that the MCM asso-
ciated with Figure 6.3 holds. This latter assumption can be used in two
alternative ways, both leading to the same identifying formula. First, it
leads via Lemma 6 to the above g-functional analysis also used by the agnos-
tic model advocate. Second, as we next show, it can be used to prove that
:
(6.11) holds, allowing identification to proceed à la Pearl (2001).
Consider an MCM associated with the DAG in Figure 6.3 with node set
V ¼ (X, N, O, Z, Y). It follows from the fact that X ¼ N ¼ O with probability
(w.p.) 1 that the condition that N(x) ¼ O(x) ¼ x w.p. 1 also holds. However, for
pedagogic purposes, suppose for the moment that the condition
N(x) ¼ O(x) ¼ x w.p. 1 does not hold.
For expositional simplicity we assume all variables are binary
so our model is also an FFRCISTG model. Then, V0 ¼ X, V1(v0) ¼ N(x),
V2 ðv0 ;v1 Þ ¼ V2 ðv1 Þ ¼ OðxÞ;V3 ðv 2 Þ ¼ V3 ðv1 Þ ¼ ZðnÞ; and V4 ðv 3 Þ ¼ V4 ðv2 ;v3 Þ.
132 Causality and Psychopathology
By Theorem 1, {Y(o, z), Z(n), O(x), N(x)} are mutually independent. However,
because we are assuming an FFRCISTG model and not an NPSEM, we
cannot conclude that O(x) ? ? N(x*) for x 6¼ x*.
Consider the induced counterfactual models for the variables (X, Z, Y)
obtained from our FFRCISTG model by marginalizing over (N, O). Because
N and O each has only a single child on the graph in Figure 6.3, the counter-
factual model over (X, Z, Y) is the FFRCISTG associated with the complete
graph of Figure 6.1, where the one-step-ahead counterfactuals Z(1)(x), Y(1)
(x, z) associated with Figure 6.1 are obtained from the counterfactuals
{Y(o, z), Z(n), O(x), N(x)} associated with Figure 6.3 by Z(1)(x) ¼ Z(N(x)),
Y(1)(x, z) ¼ Y(O(x), z). Here, we have used the superscript ‘(1)’ to emphasize
the graph with respect to which Z(1)(x) and Y(1)(x, z) are one-step-ahead
counterfactuals. We cannot conclude that Z(1)(0) ¼ Z(N(0)) and Y(1)(1, z) ¼
Y(O(1), z) are independent, even though Z(n) and Y(o, z) are independent
because, as noted above, the FFRCISTG model associated with Figure 6.3
does not imply independence of O(1) and N(0).
Suppose now we re-instate the deterministic constraint that N(x) ¼
O(x) ¼ x w.p. 1. Then, we conclude that O(x) is independent of N(x*),
since both variables are constants. It then follows that Z(1)(0) and
Y(1)(1, z) are independent and, thus, that (6.11) holds and E[Y(1)(1, Z(1)(0))]
is identified.
In our argument that, under the deterministic constraint that N(x) ¼ O(x) ¼ x
w.p. 1, the FFRCISTG associated with the DAG in Figure 6.3 implied con-
dition (6.11), the crucial step was the following: By Theorem 1, the indepen-
dences in condition (6.1) that define an FFRCISTG imply that Y(o, z) and
Z(n) are independent for n ¼ 0 and o ¼ 1. In this section, we show that had
we modified (6.1), and thus our definition of an FFRCISTG, by restricting to
conditioning events V m1 ¼ v m1 that have a positive probability under f (v),
then Theorem 1 would not hold for non-positive densities f (v). Specifically, if
f (v) is not positive, the modified version of condition (6.1) does not imply
condition (6.6); furthermore, the set of independences implied by a modified
FFRCISTG associated with a graph G could differ for different orderings of
the variables consistent with the descendant relationships on the graph.
Specifically, we now show that for the modified FFRCISTG associated with
Figure 6.3 and the ordering (X, N, O, Z, Y), we cannot conclude that
Y(x, n, o, z) ¼ Y(o, z) and Z(x, n, o) ¼ Z(n) are independent for n ¼ 0 and o ¼ 1
and, thus, that condition (6.11) holds. However, the modified FFRCISTG with
the alternative ordering (X, N, Z, O, Y) does imply Y(o, z) ? ? Z(n). First, con-
sider the modified FFRCISTG associated with Figure 6.3 and ordering
6 Alternative Graphical Causal Models 133
Yðn; o; zÞ ?
? Zðn; oÞ j X ¼ x; NðxÞ ¼ n; OðxÞ ¼ o; for fz; x; n; o 2 f0; 1gg:
The modified condition (6.1) implies only the subset corresponding to {x, z 2
{0, 1}; n ¼ o ¼ x} since the event {N(x) ¼ j, O(x) ¼ 1 j, j 2 {0, 1}} has prob-
ability 0. As a consequence, we can only conclude that Y(n, o, z) ¼
? Z(n) for o ¼ n.
Y(o, z) ?
In contrast, for the modified FFRCISTG associated with Figure 6.3 and
the ordering V ¼ (X, N, Z, O, Y), the deterministic constraint N(x) ¼ O(x) ¼ x
w.p. 1 implies Y(o, z) ?
? Z(n) for n ¼ 0 and o ¼ 1 as follows: By Equation (6.1)
and the fact that Y(x, n, z, o) ¼ Y(o, z) and Z(x, n) ¼ Z(n), we have, without
having to condition on an event of probability 0, that
Yðo; zÞ; ZðnÞ ?? X for z; o; n 2 f0; 1g; ð6:23Þ
Yðo; zÞ ?
? ZðnÞ j X ¼ x; NðxÞ ¼ n for x; z; o 2 f0; 1g and n ¼ x: ð6:24Þ
A similar elaboration may be given for the causal DAG in Figure 6.2(a). The
extended causal DAG represented by our story would then be the DAG in
Figure 6.4. Under any of our four causal models,
Hence,
X
En¼0;o¼1 ½Y ¼ E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the pro-
blem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM associated
with the DAG in Figure 6.2(a).
X N Z L
Figure 6.4 The graph from Figure 6.3 with, in addition, a measured common cause (L) of
the intermediate Z and the final response Y.
6 Alternative Graphical Causal Models 135
Summary
X D Rn Z
L
M
Rh
Y
X
fx¼2;c¼1 ðy; z; lÞ f ðy j rh ; zÞ f ðrh j m; lÞ f ðz j rn Þ f ðm j x ¼ 2Þ
m;d;rh ;rn
where the first equality uses the fact that D ¼ 0 and M ¼ 1 when x ¼ 2 and
c ¼ 1 and the second uses the fact that, since in the observed data C ¼ 0
w.p. 1, D ¼ 0 if and only if X ¼ 0, and M ¼ 1 if and only if X ¼ 1 (since
X 6¼ 2 w.p. 1). Thus,
X
Ex¼2;c¼1 ½Y ¼ E½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the pro-
blem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM based on the
DAG in Figure 6.2(a).
As noted in the Introduction, the exercise of trying to construct a story to
provide an interventionist interpretation for a non-manipulable causal para-
meter of an NPSEM often helps one devise explicit, and sometimes even
practical, interventions which can then be represented as a manipulable
causal effect relative to an extended deterministic causal DAG model such
as Figure 6.3.
5 Path-Specific Effects
fn¼0;o¼1 ðy; z; lÞ
¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j N ¼ 0Þ
¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðl j N ¼ 0; O ¼ 0Þ
¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ;
so
X
int
En¼0;o¼1 ½Y ¼ EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ: ð6:25Þ
l;z
X N Z L X N Z L
O O
(a) Y (b) Y
X N Z L X N Z L
O O
(c) Y (d) Y
Figure 6.6 Elaborations of the graph in Figure 6.2(b), with additional variables as
described in the text; thicker edges indicate deterministic relations.
6 Alternative Graphical Causal Models 139
Similarly, under all four causal models associated with graph in Figure 6.6(b),
int
En¼0;o¼1 ½Y is identified from factual data on V ¼ (X, L, Z, Y): On the DAG in
Figure 6.6(b) we have
Let Y(x, l, z), Z(x, l ) and L(x) denote the one-step-ahead counterfactuals
associated with the graph in Figure 6.2(b). Then, it is clear from the assumed
deterministic counterfactual relation N(x) ¼ O(x) ¼ x that the parameter
int
En¼0;o¼1 ½Y ¼ E ½Yðo ¼ 1; Lðn ¼ 0Þ; Zðn ¼ 0; Lðn ¼ 0ÞÞÞ
associated with the graph in Figure 6.6(a) can be written in terms of the
counterfactuals associated with the graph in Figure 6.2(b) as
E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ:
associated with the graph in Figure 6.2(b) is not identified under any of the
four causal models associated with any of the three graphs in Figure 6.6(a),
(b) and (c); see Section 3.4.
Thus, in summary, under an MCM or FFRCISTG model associated with
the DAG in Figure 6.6(a), the extension of Pearl’s original story encoded in
that DAG allows the identification of the causal effect E[Y{x ¼ 1, L(x ¼ 0),
Z(x ¼ 0)}] associated with the DAG in Figure 6.2(b). Similarly, under an
MCM or FFRCISTG model associated with the DAG in Figure 6.2(b) the
extension of Pearl’s original story encoded in this graph allows the identifica-
tion of the causal effect E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1))] associated with
the DAG in Figure 6.2(b).
We now compare these results to those obtained under the assumption that
the NPSEM associated with the DAG in Figure 6.2(b) held. Under this model
Avin et al. (2005) proved, using their theory of path-specific effects, that while
E[Y(x ¼ 1, Z(x ¼ 0))] is unidentified, both
E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ and E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ
ð6:27Þ
are identified (without requiring any additional story) by Equations (6.25) and
(6.26) respectively.
From the perspective of the FFRCISTG models associated with the graphs
in Figure 6.6(a) and (b) if N and O represent, as we have been assuming, the
substantive variables Nicotine and Other components of cigarettes (rather
than merely formal mathematical constructions), these graphs will generally
represent mutually exclusive causal hypotheses. As a consequence, at most
one of the two FFRCISTG models will be true; thus, from this perspective,
only one of the two parameters in (6.27) will be identified.
Avin et al. (2005) refer to E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] as the effect of X ¼ 1
on Y when the paths from X to L and from X to Z are both blocked (inacti-
vated) and to E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] as the effect of X ¼ 1 on
Y when the paths from X to Z are blocked. They refer to
as the effect of X ¼ 1 on Y when both the path from X to Z and (X’s effect on)
the path from L to Z are blocked.
142 Causality and Psychopathology
Y
fn¼0;o¼1 ðvnxÞ ¼ f ðvj j paj Þ
f j:Vj is not a child of X on Gg
Y
f ðvj j paj nx; X ¼ 1Þ
f j:Vj is a child of O on Gex g
Y
f ðvj j paj nx; X ¼ 0Þ:
f j:Vj is a child of N on Gex g
Note that if X has p children on G, there exist 2p different graphs Gex. The
identifying formula for fn¼0,o¼1(vnx) in terms of f (v) depends on the graph
Gex. It follows that, under the assumption that a particular Gex is associated
int
with one of our four causal models, the intervention distribution fn¼0;o¼1 ðvnxÞ
corresponding to that Gex is identified under any of the four associated
models.
We now discuss the relationship with path-specific effects. Avin et al.
(2005) first define, for any counterfactual model associated with G, the
path-specific effect on the density of V nX when various paths on graph G
have been blocked. Avin et al. (2005) further determine which path-specific
densities are identified under the assumption that the NPSEM associated
with G is true and provide the identifying formulae.
The results of Avin et al. (2005) imply that the path-specific effect corre-
sponding to the set of blocked paths on G being the paths from X to the
subset of its children who were the children of N on any given Gex is
identified under the NPSEM assumption for G. Their identifying formula
is precisely our fn¼0,o¼1(vnx) corresponding to this Gex. In fact, our derivation
int
implies that this path-specific effect on G is identified by fn¼0;o¼1 ðvnxÞ for this
Gex under the assumption that any of our four causal models associated with
this Gex holds, even without assuming that the NPSEM associated with the
original graph G is true. Again, under the NPSEM assumption for G, all 2p
int
effects fn¼0;o¼1 ðvnxÞ as Gex varies are identified, each by the formula
fn¼0,o¼1(vnx), specific to the graph Gex.
6 Alternative Graphical Causal Models 143
6 Conclusion
The results presented here, which are summarized in Table 6.1, appear to
present a clear trade-off between the agnostic causal DAG, MCM, and
FFRCISTG model frameworks and that of the NPSEM.
Table 6.1 Relations between causal models and estimands associated with the DAG
shown in Figure 6.1; column ‘D’ indicates if the contrast is defined in the model; ‘I’
whether it is identified.
In the NPSEM approach the PDE is identified, even though the result
cannot be verified by a randomized experiment without making further
assumptions. In contrast, the PDE is not identified under an agnostic
causal DAG model or under an MCM/FFRCISTG model. Further, in
Appendix A we show that the ETT can be identified under an MCM/
FFRCISTG model even though the ETT cannot be verified by a randomized
experiment without making further assumptions.
Our analysis of Pearl’s motivation for the PDE suggests that these dichoto-
mies may not be as stark as they may at first appear. We have shown that in
certain cases where one is interested in a prima facie non-manipulable causal
parameter then the very fact that it is of interest implies that there also exists
an extended DAG in which the same parameter is manipulable and identifi-
able in all the causal frameworks.
Inevitably, such cases will be interpreted differently by NPSEM ‘skeptics’
and ‘advocates.’ Advocates may argue that if our conjecture holds, then we
can work with NPSEMs and have some reassurance that in important cases
of scientific interest we will have the option to go back to an agnostic causal
DAG. Conversely, skeptics may conclude that if we are correct then this
shows that it is advisable to avoid the NPSEM framework: Agnostic causal
DAGs are fully ‘‘testable’’ (with the usual caveats) and many non-manipul-
able NPSEM parameters that are of interest, but not identifiable within a
non-NPSEM framework, can be identified in an augmented agnostic causal
DAG.
Undoubtedly, this debate is set to run and run . . .
The primary focus of this chapter has been various contrasts assessing
the direct effect of X on Y relative to an intermediate Z. In this appendix
we discuss another non-manipulable parameter, the effect of treatment on
the treated, in order to further clarify the differences among the agnostic,
the MCM and the FFRCISTG models. For our purposes, we shall only
require the simplest possible causal model based on the DAG X ! Y,
obtained by marginalizing over Z in the graph in Figure 6.1. Let Y(0)
denote the counterfactual Y(x) evaluated at x ¼ 0. In a counterfactual
causal model, the average effect of treatment on the treated is defined to be
Hence the ETT(x) is identified iff the second term on the right is identified.
First, note that
ETTð0Þ ¼ E ½Y j X ¼ 0 E ½Yð0Þ j X ¼ 0 ¼ 0:
Now, by consistency condition (iii) in Section 1.1 and the MCM assumption,
Equation (6.4), we have
E ½Y j X ¼ 0 ¼ E ½Yð0Þ j X ¼ 0 ¼ E ½Yð0Þ:
Note that even under the MCM with jX j > 2, the non-manipulable
(relative to {X, Y}) contrast E[Y(0) | X 6¼ 0] E[Y | X 6¼ 0], the effect of receiv-
ing X ¼ 0 on those who did not receive X ¼ 0, is identified since E[Y(0) |
X 6¼ 0] is identified by the left-hand side of Equation (6.28).
We now turn to the agnostic causal model for the DAG X ! Y. Although
Exint ½Y is identified by the g-functional as E[Y | X ¼ x], nonetheless, as
expected for a non-manipulable causal contrast, the effect of treatment on
the treated is not formally defined within the agnostic causal model, without
further assumptions, even for binary X. Of course, the g-functional
(see Definition 5) does define a joint distribution fx(x*, y) for (X, Y ) under
which X takes the value x with probability 1. However, in spite of apparent
notational similarities, the conditional density fx(y | x*) expresses a
different concept from that occurring in the definition of
Exint ½Y j X ¼x E½YðxÞjX ¼ x in the counterfactual theory. The former
relates to the distribution over Y among those individuals who (after the
intervention) have the value X ¼ x*, under an intervention which sets every
unit’s value to x and thus fx(y | x*) ¼ f (y | x) if x* ¼ x and is undefined if
x* 6¼ x ; the latter is based on the distribution of Y under an intervention
fixing X ¼ x among those people who would have had the value X ¼ x* had
we not intervened.
The minimality of the MCM among all counterfactual models that both
satisfy the consistency assumption (iii) in Section 1.1 and identify the inter-
vention distributions f fpint R
ðzÞg can be seen as follows. For binary X, the above
argument for identification of the non-manipulable contrast ETT(1) under an
MCM as the difference E[Y | X ¼ 1] E[Y | X ¼ 0] follows directly, via the laws
of probability, from the consistency assumption (iii) in Section 1.1 and the
minimal independence assumption (6.5) required to identify the intervention
distributions f fpintR
ðzÞg. In contrast, the additional independence assumptions
(6.8) used to identify the PDE under the NPSEM for the DAG in Figure 6.1
or the additional independence assumptions used to identify ETT(1) for non-
binary X under an FFRCISTG model for the DAG X ! Y are not needed to
identify intervention distributions.
Of course, as we have shown, it may be the case that the PDE is identified
as an intervention contrast in an extended causal DAG containing additional
variables; but identification in this extended causal DAG requires additional
assumptions beyond those in the original DAG and hence does not follow
merely from application of the laws of probability.
Similarly, the ETT(1) for the causal DAG X ! Y can be re-interpreted as an
intervention contrast in an extended causal DAG containing additional vari-
ables, regardless of the dimension of X’s state space. Specifically, Robins,
VanderWeele, and Richardson (2007) showed that the ETT(x) parameter is
defined and identified via the extended agnostic causal DAG in Figure 6.7
6 Alternative Graphical Causal Models 147
X* X Y
X j
Multinomial ð1; Þ:
148 Causality and Psychopathology
Y (x=0)
π 0 π1
Y (x=1)
π2
Y (x=2)
X Y X Y
(a) (b)
Figure 6.8 (a) A simple graph; (b) A graph describing a confounding structure that leads to
a counterfactual model that corresponds to the MCM but not the FFRCISTG associated
with the DAG (a); thicker red edges indicate deterministic relations.
Now suppose that the response Y is binary and that the counterfactual out-
comes Y(x) are as follows:
Yðx ¼ 0Þ j
Bernoulli ð1 =ð1 þ 2 ÞÞ;
Yðx ¼ 1Þ j
Bernoulli ð2 =ð2 þ 3 ÞÞ;
Yðx ¼ 2Þ j
Bernoulli ð0 =ð0 þ 1 ÞÞ:
Yðx ¼ iÞ ?
? IðX ¼ iÞ for all i;
but not the FFRCISTG independence restriction (6.1), since
Yðx ¼ iÞ ?
6 ? IðX ¼ jÞ for i 6¼ j:
We note that we have:
j X ¼ i
Dirichlet ði þ 1; ½iþ1 ; ½iþ2 Þ; ð6:30Þ
Yðx ¼ iÞ j X ¼ i
Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ; ð6:31Þ
Y j X ¼ i
Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ: ð6:32Þ
6 Alternative Graphical Causal Models 149
Equation (6.30) follows from standard Bayesian updating (since the Dirichlet
distribution is conjugate to the multinomial). It follows that the vector of
parameters (0, 1, 2) is identified only up to a scale factor since the like-
lihood for the observed variables f (x, y | ) ¼ f (x, y | ) for any > 0, by
Equations (6.29) and (6.32). We note that since E(Y(x)) ¼ E(Y | X ¼ x),
ACEX!Y ðxÞ EðYðxÞÞ EðYð0ÞÞ ¼ EðYjX ¼ xÞ EðYjX ¼ 0Þ and thus is
identified. However since
Yðx ¼ 0Þ j X ¼ 1
Bernoulli ðð1 þ 1Þ=ð1 þ 1 þ 2 ÞÞ;
Yðx ¼ 0Þ j X ¼ 2
Bernoulli ð1 =ð1 þ 2 þ 1ÞÞ;
R ≡ (1−N)U E ≡ OU
X N Z
Figure 6.9 An example leading to the FFRCISTG associated with the DAG in Figure 6.1
but not an NPSEM; thicker edges denote deterministic relations.
where the first equality used the fact that E is a deterministic function of U
and O and that R is a deterministic function of N and U. The second equality
int
used d-separation and the third, determinism. Thus, En¼0;o¼1 ½Y is not a
function of the density of the observed data on (X, Z, Y ) because u occurs
both in the term E[Y | X ¼ 1, U ¼ u, Z ¼ z] where we have conditioned on
X ¼ 1, and in the term f (z | U ¼ u, X ¼ 0), where we have conditioned on
X ¼ 0. As a consequence, we do not obtain a function of the density of the
observed data when we marginalize over U.
6 Alternative Graphical Causal Models 151
Since under all three counterfactual models associated with the extended
int
DAG of Figure 6.9 En¼0;o¼1 ½Y is equal to the parameter E[Y(x ¼ 1, Z(x ¼ 0))]
of Figure 6.1, we conclude that E[Y(x ¼ 1, Z(x ¼ 0))], and thus, the PDE is not
identified. Hence, the induced counterfactual model for the DAG in
Figure 6.1 cannot be an NPSEM (as that would imply that the PDE would
be identified).
int
Furthermore, En¼0;o¼1 ½Y is a manipulable parameter with respect to the
DAG in Figure 6.3, since this DAG is obtained from marginalizing over U in
int
the graph in Figure 6.9. However, as we showed above, En¼0;o¼1 ½Y is not
identified from the law of the factuals X, Y, Z, N, O, which are the variables in
Figure 6.3. From this we conclude that none of the four causal models
associated with the graph in Figure 6.3 can be true. Note that prima facie
one might have thought that if the agnostic causal DAG in Figure 6.1 is true,
then this would always imply that the agnostic causal DAG in Figure 6.3 is
also true. This example demonstrates that such a conclusion is fallacious.
Similar remarks apply to the MCM and FFRCISTG models.
Additionally, for z ¼ 0, 1, by applying the g-formula to the graph in
int
Figure 6.9, we obtain that the joint effect of smoking and z, En¼1;o¼1;z ½Y,
int
and the joint effect of not smoking and z, En¼0;o¼0;z ½Y, are identified by
E½YjX ¼ 1;Z ¼ z and E½YjX ¼ 0;Z ¼ z, respectively, under all four causal
int int
models for Figure 6.9. Since En¼0;o¼0;z ½Y and En¼1;o¼1;z ½Y are equal to the
int int
parameters Ex¼0;z ½Y and Ex¼1;z ½Y under all four associated causal models
associated with the graph in Figure 6.1 we conclude that CDE(z) is also
identified under all four causal models associated with Figure 6.1.
The results obtained in the last two paragraphs are consistent with the
FFRCISTG model and the MCM associated with the graph in Figure 6.1
holding but not the NPSEM. In what follows we prove such is the case.
Before doing so, we provide a simpler and more intuitive way to under-
stand the above results by displaying in Figure 6.10 the subgraphs of Figure
6.9 corresponding to U, Z, Y when the variables N and O are set to each of
their four possible joint values. We see that only when we set N ¼ 0 and
O ¼ 1 is it the case that U is a common cause of both Z and Y (as setting
N ¼ 0, O ¼ 1 makes R ¼ E ¼ U). Thus, we have
int int
En¼0;o¼0;z ½Y ¼ En¼0;o¼0 ½YjZ ¼ z
¼ E ½YjO ¼ 0; N ¼ 0; Z ¼ z ¼ E ½YjX ¼ 0; Z ¼ z; and
int int
En¼1;o¼1;z ½Y ¼ En¼1;o¼1 ½YjZ ¼ z
¼ E ½YjO ¼ 1; N ¼ 1; Z ¼ z ¼ E ½YjX ¼ 1; Z ¼ z
Z Z Z Z
Y Y Y Y
Figure 6.10 An example leading to the FFRCISTG associated with the DAG in Figure 6.1
holding but not the NPSEM: Causal subgraphs on U, Z, Y implied by the graph in Figure
6.9 when we intervene and set (a) N ¼ 0, O ¼ 0; (b) N ¼ 1, O ¼ 0; (c) N ¼ 0, O ¼ 1; (d)
N ¼ 1, O ¼ 1.
int int
when we set N ¼ 0, O ¼ 1. It is because En¼0;o¼1;z ½Y 6¼ En¼0;o¼1 ½Yjz that
int
En¼0;o¼1 ½Y is not identified. If, contrary to Figure 6.9, there was no confound-
ing between Y and Z when N is set to 0 and O is set to 1, then we would have
int int
En¼0;o¼1;z ½Y ¼En¼0;o¼1 ½Yjz. It would then follow that
X
int int int
En¼0;o¼1 ½Y ¼ En¼0;o¼1 ½Yjz fn¼0;o¼1 ½z
z
X
int int
¼ En¼0;o¼1;z ½Y fn¼0;o¼1 ½z
z
X
int int
¼ En¼1;o¼1;z ½Y fn¼0;o¼0 ½z
z
X
¼ E ½YjX ¼ 1; Z ¼ z f ½zjX ¼ 0;
z
where the third equality is from the fact that we suppose N has no direct
effect on Y not through Z and O has no effect on Z.
We conclude by showing that the MCM and FFRCISTG models associated
with Figure 6.1 are true, but the NPSEM is not, if any of the three counter-
factual models associated with Figure 6.9 are true. Specifically, the DAG in
Figure 6.11 represents the DAG of Figure 6.1 with the counterfactuals for
Z(x) and Y(x, z), the variable U of Figure 6.9, and common causes U1 and U2
of the Z(x) and the Y(x, z) added to the graph. Note that U being a common
cause of Z and Y in Figures 6.9 and 6.10 only when we set N ¼ 0 and O ¼ 1
implies that U is only a common cause of Z(0), Y(1, 0), and Y(1, 1) in Figure
6.11. One can check using d-separation that the counterfactual indepen-
dences in Figure 6.11 satisfy those required of an MCM or FFRCISTG
model, but not those of an NPSEM, as Z(0) and Y(1, z) are dependent.
However, Figure 6.11 contains more independences than are required for
the FFRCISTG condition (6.1) applied to the DAG in Figure 6.1. In particu-
lar, in Figure 6.11 Z(1) and Y(0, z) are independent, which implies that
E[Y(0, Z(1))] is identified by z E½YjX ¼ 0;Z ¼ z f ðzjX ¼ 1Þ and, thus, the
the so-called total direct effect E[Y(1, Z(1))] E[Y(0, Z(1))] is also identified.
6 Alternative Graphical Causal Models 153
Figure 6.11 An example leading to an FFRCISTG corresponding to the DAG in Figure 6.1
but not an NPSEM: potential outcome perspective. Counterfactuals for Y are indexed
Y(x, z). U, U1, and U2 indicate hidden confounders. Thicker edges indicate deter-
ministic relations.
Finally, we note that we could easily modify our example to eliminate the
independence of Z(1) and Y(0, z).
It then follows from the analysis in Richardson and Robins (2010, Section
2.2) that the set of possible values for the pair
Hence, we have the following upper and lower bounds on the PDE:
To describe the FRCISTG model for V ¼ (V1, . . . , VM), we suppose that each
Vm ¼ (Lm, Am) is actually a composite of variables Lm and Am, one of which
can be the empty set. The causal effects of intervening on any of the Lm
variables is not defined. However, we assume that for any subset R of
A ¼ A M ¼ ðA1 ; . . . ;AM Þ, the counterfactuals Vm(r) are well-defined for any
r 2 R.
Specifically, we assume that the one-step-ahead counterfactuals
Vm ða m1 Þ ¼(Lm ða m1 Þ;Am ða m1 Þ) exist for any setting of a m1 2 A m1 . Note
that it is implicit in this definition that Lk precedes Ak for all k. Next, we
make the consistency assumption that the factual variables Vm and the coun-
terfactual variables Vm(r) are obtained recursively from the Vm ða m1 Þ. We do
not provide a graphical characterization of parents. Rather, we say that the
parents Pam of Vm consist of the smallest subset of A m1 such that, for all
a m1 2 A m1 ,Vm ða m1 Þ ¼ Vm ðpam Þ where pam is the sub-vector of a m1 corre-
sponding to Pam. One can then view the parents Pam of Vm as the direct
causes of Vm relative to the variables prior to Vm on which we can perform
interventions. Finally, an FRCISTG model imposes the following
independences:
Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ?? Am ðam1 Þ j Lm ¼ lm ; Am1 ¼ am1 ;
ð6:33Þ
for all m; aM1 ; lm :
Theorem 8 An FRCISTG model for V ¼ðV1 ; . . . ;VM Þ;Vm ¼ðLm ;Am Þ implies
that for all m, a M1 ;l m ,
Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ?? Am ðam1 Þ j Lm ðam1 Þ ¼ lm :
Note that the theorem would not be true had we substituted the factual L m
for L m ða m1 Þ.
References
Avin, C., Shpitser, I., & Pearl, J. (2005). Identifiability of path-specific effects. In L. P.
Kaelbling & A. Saffiotti (Eds.), IJCAI-05, Proceedings of the nineteenth international
joint conference on artificial intelligence (pp. 357–363). Denver: Professional Book
Center.
Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American
Statistical Association, 95(450), 407–448.
Didelez, V., Dawid, A., & Geneletti, S. (2006). Direct and indirect effects of sequential
treatments. In R. Dechter & T. S. Richardson (Eds.), UAI-06, Proceedings of the 22nd
annual conference on uncertainty in artificial intelligence (pp. 138–146). Arlington, VA:
AUAI Press.
Frangakis, C. E., & Rubin, D. B. (1999). Addressing complications of intention-to-treat
analysis in the combined presence of all-or-none treatment-noncompliance and sub-
sequent missing outcomes. Biometrika, 86(2), 365–379.
Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics, 58(1), 21–29.
Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the
treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology
and Public Health,.
Gill, R. D., & Robins, J. M. (2001). Causal inference for complex longitudinal data: The
continuous case. Annals of Statistics, 29(6), 1785–1811.
Hafeman, D., & VanderWeele, T. (2010). Alternative assumptions for the identification
of direct and indirect effects. Epidemiology. (Epub ahead of print)
6 Alternative Graphical Causal Models 157
Heckerman, D., & Shachter, R. D. (1995). A definition and graphical representation for
causality. In P. Besnard & S. Hanks (Eds.), UAI-95: Proceedings of the eleventh annual
conference on uncertainty in artificial intelligence (pp. 262–273). San Francisco: Morgan
Kaufmann.
Imai, K., Keele, L., & Yamamoto, T. (2009). Identification, inference, and sensitivity
analysis for causal mediation effects (Tech. Rep.). Princeton University, Department
of Politics.
Kaufman, S., Kaufman, J. S., & MacLehose, R. F. (2009). Analytic bounds on causal
risk differences in directed acyclic graphs involving three observed binary variables.
Journal of Statistical Planning and Inference, 139(10), 3473–3487.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan
Kaufmann.
Pearl, J. (2000). Causality. Cambridge: Cambridge University Press.
Pearl, J. (2001). Direct and indirect effects. In J. S. Breese & D. Koller (Eds.), UAI-01,
Proceedings of the 17th annual conference on uncertainty in artificial intelligence
(pp. 411–42). San Francisco: Morgan Kaufmann.
Pearl, J. (2010). An introduction to causal inference. The International Journal of
Biostatistics, 6(2). (DOI: 10.2202/1557-4679.1203)
Petersen, M., Sinisi, S., & Laan, M. van der. (2006). Estimation of direct causal effects.
Epidemiology, 17(3), 276–284.
Richardson, T. S., & Robins, J. M. (2010). Analysis of the binary instrumental variable
model. In R. Dechter, H. Geffner, & J. Halpern (Eds.), Heuristics, probability and
causality: A tribute to Judea Pearl (pp. 415–444). London: College Publications.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with
sustained exposure periods – applications to control of the healthy worker survivor
effect. Mathematical Modeling, 7, 1393–1512.
Robins, J. M. (1987). Addendum to ‘‘A new approach to causal inference in
mortality studies with sustained exposure periods – applications to control of the
healthy worker survivor effect’’. Computers and Mathematics with Applications, 14,
923–945.
Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct
and indirect effects. In P. Green, N. Hjort, & S. Richardson (Eds.), Highly structured
stochastic systems (pp. 70–81). Oxford: Oxford University Press.
Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and
indirect effects. Epidemiology, 3, 143–155.
Robins, J. M., & Greenland, S. (2000). Comment on ‘‘Causal inference without coun-
terfactuals’’. Journal of the American Statistical Association, 95(450), 431–435.
Robins, J. M., Richardson, T. S., & Spirtes, P. (2009). Identification and inference for
direct effects (Tech. Rep. No. 563). University of Washington, Department of
Statistics.
Robins, J. M., Rotnitzky, A., & Vansteelandt, S. (2007). Discussion of ‘‘Principal stra-
tification designs to estimate input data missing due to death’’ by Frangakis, C.E.,
Rubin D.B., An, M., MacKenzie, E. Biometrics, 63(3), 650–653.
Robins, J. M., VanderWeele, T. J., & Richardson, T. S. (2007). Discussion of ‘‘Causal
effects in the presence of non compliance: a latent variable interpretation’’ by
Forcina, A. Metron, LXIV(3), 288–298.
Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592.
Rubin, D. B. (1998). More powerful randomization-based ‘‘p-values’’ with the p itali-
cized in double-blind trials with non-compliance. Statistics in Medicine, 17, 371–385.
158 Causality and Psychopathology
Rubin, D. B. (2004). Direct and indirect causal effects via potential outcomes.
Scandinavian Journal of Statistics, 31(2), 161–170.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction and Search
(No. 81). New York: Springer-Verlag.
VanderWeele, T., & Robins, J. (2007). Directed acyclic graphs, sufficient causes, and the
properties of conditioning on a common effect. American Journal of Epidemiology,
166(9), 1096–1104.
7
Introduction
159
160 Causality and Psychopathology
and in equal numbers. GMM can identify the placebo responder class in the
medication group. Having identified the placebo responder and placebo non-
responder classes in both the placebo and medication groups, medication
effects can more clearly be identified. In one approach, the medication
effect is formulated in terms of an effect of medication on the trajectory
slopes after the treatment phase has begun. This medication effect is allowed
to be different for the nonresponder and responder trajectory classes.
Another approach formulates the medication effect as increasing the prob-
ability of membership in advantageous trajectory classes and decreasing the
probability of membership in disadvantageous trajectory classes.
This section gives a brief description of the GMM in the context of the
current study. A two-piece, random effect GMM is applied to the Hamilton
Depression Rating Scale outcomes at the 11 time points y1–y11. The first
piece refers to the two time points y1 and y2 before randomization, and the
second piece refers to the nine postrandomization time points y3–y11. Given
only two time points, the first piece is by necessity taken as a linear model
with a random intercept, defined at baseline, and a fixed effect slope. An
exploration of each individual’s trajectory suggests a quadratic trajectory
shape for the second piece. The growth model for the second piece is cen-
tered at week 8, defining the random intercept as the systematic variation at
that time point. All random effect means are specified as varying across
latent trajectory classes. The medication effect is captured by a regression
of the linear and quadratic slopes in the second piece on a medication
dummy variable. These medication effects are allowed to vary across the
latent trajectory classes. The model is shown in diagrammatic form at the
top of Figure 7.1.1
The statistical specification is as follows. Consider the depression outcome
yit for individual i, let c denote the latent trajectory class variable, let g denote
random effects, let at denote time, and let 2t denote residuals containing
measurement error and time-specific variation. For the first, prerandomiza-
tion piece, conditional on trajectory class k (k = 1, 2 . . . K),
1. In Figure 7.1 the observed outcomes are shown in boxes and the random effects in circles.
Here, i, s, and q denote intercept, linear slope, and quadratic slope, respectively. In the follow-
ing formulas, these random effects are referred to as g0, g1, and g2. The treatment dummy
variable is denoted x.
162 Causality and Psychopathology
i1 s1 i2 s2 q2
i2 s2 q2
pre
g11i ‰ci ¼k ¼ 11k þ 11i ; (3)
with 11 = 0, defining g0i as the week 8 depression status. The remaining t
values are set according to the distance in timing of measurements. Assume
for simplicity a single drug and denote the medication status for individual i
by the dummy variable xi (x = 0 for the placebo group and x = 1 for the
medication group).2 The random effects are allowed to be influenced by
2. In the application three dummy variables are used to represent the three different medications.
7 General Approaches to Analysis of Course 163
The residuals i in the first and second pieces have a 4 4 covariance matrix
k, here taken to be constant across classes k. For both pieces the residuals
2it have a T T covariance matrix k, here taken to be constant across
classes. For simplicity, k and k are assumed to not vary across treatment
groups. As seen in equations 5–7, the placebo group (xi = 0) consists of
subjects from the two different trajectory classes that vary in the means of
the growth factors, which in the absence of covariate w are represented by
0k, 1k, and 2k. This gives the average depression development in the
absence of medication. Because of randomization, the placebo and medica-
tion groups are assumed to be statistically equivalent at the first two time
points. This implies that x is assumed to have no effect on g10i or g11i in the
first piece of the development. Medication effects are described in the second
piece by g01k, g11k, and g21k as a change in average growth rate that can be
different for the classes.
This model allows the assessment of medication effects in the presence of
a placebo response. A key parameter is the medication-added mean of the
intercept random effect centered at week 8. This is the g01k parameter of
equation 5. This indicates how much lower or higher the average score is at
week 8 for the medication group relative to the placebo group in the trajec-
tory class considered. In this way, the medication effect is specific to classes
of individuals who would or would not have responded to placebo. The
modeling will be extended to allow for the three drugs of this study to
have different g parameters in equations 5–7.
Class membership can be influenced by baseline covariates as expressed
by a logistic regression (e.g., with two classes),
random starting values need to be used and the best log-likelihood value
needs to be replicated several times. In the present analyses, between 500
and 4,000 random starts were used depending on the complexity of the
model.
In this section the depression data are analyzed in three steps using GMM.
First, the placebo group is analyzed alone. Second, the medication group is
analyzed alone. Third, the placebo and medication groups are analyzed jointly
according to the GMM just presented in order to assess the medication effects.
3. The maximum log-likelihood value for the two-class GMM of Figure 7.2 is 1,055.974, which is
replicated across many random starts, with 28 parameters and a BIC value of 2,219. The
classification based on the posterior class probabilities is not clear-cut in that the classification
entropy value is only 0.66.
166 Causality and Psychopathology
24
23
22
21
20
19
18
17
16
15
14
13
HamD
12
11
10
9
8
7 Class 1, 32.4%
6
5 Class 2, 67.6%
4
3
2
1
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
missing observations is considerably lower for weeks 5–7 than other weeks,
reducing the weight of these time points. The individual with the second
lowest score at week 8 deviates from the mean curve for week 5 but has
missing data for weeks 6 and 7. This person is also ambiguously classified in
terms of his or her posterior probability of class membership.
To further explore the data, a three-class GMM was also fitted to the 45
placebo subjects. Figure 7.4a shows the mean curves for this solution. This
solution no longer shows a clear-cut responder class. Class 2 (49%) declines
early, but the mean score does not go below 14. Class 1 (22%) ends with a
mean score of 10.7 but does not show the expected responder trajectory
shape of an early decline.4 A further analysis was made to investigate if
the lack of a clear responder class in the three-class solution is due to the
sample size of n = 45 being too small to support three classes. In this
analysis, the n = 45 placebo group subjects were augmented by the medica-
tion group subjects but using only the two prerandomization time points
from the medication group. Because of randomization, subjects are statisti-
cally equivalent before randomization, so this approach is valid. The first,
prerandomization piece of the GMM has nine parameters, leaving only
25 parameters to be estimated in the second, postrandomization piece by
4. The log-likelihood value for the model in Figure 7.4a is 1,048,403, replicated across several
random starts, with 34 parameters and a BIC value of 2,226. Although the BIC value is slightly
worse than for the two-class solution, the classification is better, as shown by the entropy value
of 0.85.
7 General Approaches to Analysis of Course 167
36
34
32
30
28
26
24
22
20
18
HamD
16
14
12
10
8
6
4
2
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
(a) Time
36
34
32
30
28
26
24
22
20
18
HamD
16
14
12
10
8
6
4
2
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
(b) Time
Figure 7.3 Individual trajectories for placebo subjects classified into (a) the responder
class and (b) the non-responder class.
the n = 45 placebo subjects alone. Figure 7.4b shows that a responder class
(class 2) is now found, with 21% of the subjects estimated to be in this class.
High (class 3) and low (class 1) nonresponder classes are found, with 18%
and 60% estimated to be in these classes, respectively. Compared to Figure 7.3,
the observed individual trajectories within class are somewhat less hetero-
geneous (trajectories not shown).5
5. The log-likelihood value for the model in Figure 7.4b is 1,270.030, replicated across several
random starts, with 34 parameters and a BIC value of 2,695. The entropy value is 0.62. Because
a different sample size is used, these values are not comparable to the earlier ones.
168 Causality and Psychopathology
28
26
24
22
20
18
16
HamD
14
12
10
8
6
4
2
0
week 4
week 5
week 6
week 7
week 8
week 1
week 2
week 3
baseline
lead-in
48 hrs
Time
28
26
24
22
20
18
16
HamD
14
12
10
8
6
4
2
0
baseline
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
lead-in
48 hrs
Time
Figure 7.4 (a) Three-class GMM for placebo group. (b) Three-class GMM for placebo
group and pre-randomization medication group individuals.
As a first step, two- and three-class analyses of the nine postrandomization time
points were performed, not allowing for differences across the three drugs. This
gave solutions that were very similar to those of Figures 7.5 and 7.6. The
similarity in mean trajectory shape held up also when allowing for class prob-
abilities to vary as a function of drug. Figure 7.7 shows the estimated mean
curves for this latter model. The estimated class probabilities for the three drugs
show that in the responder class (class 2, 63%) 21% of the subjects are on
fluoxetine, 29% are on venlafaxine IR, and 50% are on venlafaxine XR. For
the nonresponder class that shows an initial improvement and a later wor-
sening (class 3, 19%), 25% are on fluoxetine, 75% are on venlafaxine IR, and
0% are on venlafaxine XR. For the nonresponder class that shows no
improvement at any point (class 1, 19%), 58% are on fluoxetine, 13% are
on venlafaxine IR, and 29% are on venlafaxine XR. Judged across all three
trajectory classes, this suggests that venlafaxine XR has the better outcome,
followed by venlafaxine IR, with fluoxetine last. Note, however, that for these
data subjects were not randomized to the different medications; therefore,
comparisons among medications are confounded by subject differences.8
6. The log-likelihood value for the model in Figure 7.5 is –1,084.635, replicated across many
random starts, with 28 parameters and a BIC value of 2,278. The entropy value is 0.90.
7. The log-likelihood value for the model in Figure 7.6 is –1,077.433, replicated across many
random starts, with 34 parameters and a BIC value of 2,287. The BIC value is worse than
for the two-class solution. The entropy value is 0.85.
8. The log-likelihood value for the model of Figure 7.7 is –873.831, replicated across many
random starts, with 27 parameters and a BIC value of 1,853. The entropy value is 0.79.
170 Causality and Psychopathology
26
24
22
20
18
16
HamD
14
12
10
8
Class 1, 84.7%
6
Class 2, 15.3%
4
2
0
baseline
lead-in
48 hrs
1 week
2 weeks
3 weeks
4 weeks
5 weeks
6 weeks
7 weeks
8 weeks
Time
26
24
22
20
18
16
HamD
14
12
10
8
Class 1, 16.9%
6 Class 2, 14.9%
4 Class 3, 68.2%
0
baseline
lead-in
48 hrs
1 week
2 weeks
3 weeks
4 weeks
5 weeks
6 weeks
7 weeks
8 weeks
Time
26
25
24
23
22
21
20
19
18
17
16
15
14
HamD
13
12
11
10
9
8
7 Class 1, 18.6%
6 Class 2, 62.9%
5
4 Class 3, 18.6%
3
2
1
0
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
9. The log-likelihood value for the model of Figure 7.8 is –859.577, replicated in only a few
random starts, with 45 parameters and a BIC value of 1, 894. The entropy value is 0.81. It
is difficult to choose between the model of Figure 7.7 and the model of Figure 7.8 based on
statistical indices. The Figure 7.7 model has the better BIC value, but the improvement in the
log-likelihood of the Figure 7.8 model is substantial.
172 Causality and Psychopathology
(a) 32
30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
32
(b) 30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
32
(c) 30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.8 Three-class GMM for (a) fluoxetine subjects, (b) venlafaxine IR subjects, and
(c) venlafaxine XR subjects.
medication, in line with the Figure 7.7 model. Here, the class probabilities
are different for the placebo group and the three medication groups so that
medication effect is quantified in terms of differences across groups in class
probabilities.
7 General Approaches to Analysis of Course 173
For the analysis based on the earlier model (see Growth Mixture Modeling), a
three-class GMM will be used, given that three classes were found to be
interpretable for both the placebo and the medication groups. Figure 7.9
shows the estimated mean curves for the three-class solution for the placebo
group, the fluoxetine group, the venlafaxine IR group, and the venlafaxine XR
group. It is interesting to note that for the placebo group the Figure 7.9a
mean curves are similar in shape to those of Figure 7.4b, although the
responder class (class 3) is now estimated to be 34%. Note that for this
model the class percentages are specified to be the same in the medication
groups as in the placebo group. The estimated mean curves for the three
medication groups shown in Figure 7.9b–d are similar in shape to those of
the medication group analysis shown in Figure 7.8a–c. These agreements
with the separate-group analyses strengthen the plausibility of the modeling.
This model allows the assessment of medication effects in the presence of
a placebo response. A key parameter is the medication-added mean of the
intercept random effect centered at week 8. This is the g01k parameter of
equation 5. For a given trajectory class, this indicates how much lower or
higher the average score is at week 8 for the medication group in question
relative to the placebo group. In this way, the medication effect is specific to
classes of individuals who would or would not have responded to placebo.
The g01k estimates of the Figure 7.9 model are as follows. The fluoxetine
effect for the high nonresponder class 1 at week 8 as estimated by the GMM
is significantly positive (higher depression score than for the placebo group),
7.4, indicating a failure of this medication for this class of subjects. In the
low nonresponder class 2 the fluoxetine effect is small but positive, though
insignificant. In the responder class, the fluoxetine effect is significantly
negative (lower depression score than for the placebo group), –6.3. The ven-
lafaxine IR effect is insignificant for all three classes. The venlafaxine XR
effect is significantly negative, –11.7, for class 1, which after an initial slight
worsening turns into a responder class for venlafaxine XR. For the nonre-
sponder class 2 the venlafaxine XR effect is insignificant, while for the
responder class it is significantly negative, –7.8. In line with the medication
group analysis shown in Figure 7.7, the joint analysis of placebo and medica-
tion subjects indicates that venlafaxine XR has the most desirable outcome
relative to the placebo group. None of the drugs is significantly effective for
the low nonresponder class 2.10
10. The log-likelihood value for the model shown in Figure 7.9 is –2,142.423, replicated across a
few random starts, with 61 parameters and a BIC value of 4,562. The entropy value is 0.76.
32 32
(a) (b)
174
30 30
28 28
26 26
24 24
22 22
20 20
HamD 18 18
HamD
16 16
14 14
12 12
10 10
8 8
placebo, Class 1, 20.5% fluax, Class 1, 20.5%
6 6
placebo, Class 2, 45.9% fluax, Class 2, 45.9%
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time Time
32 32
(c) 30
(d) 30
28 28
26 26
24 24
22 22
20 20
18 18
HamD
HamD
16 16
14 14
12 12
10 10
8 8
6 ven IR, Class 1, 20.5% 6 ven XR, Class 1, 20.5%
4 ven IR, Class 2, 45.9% 4 ven XR, Class 2, 45.9%
2 ven IR, Class 3, 33.6% 2 ven XR, Class 3, 33.6%
0 0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time Time
Figure 7.9 Three-class GMM of both groups: (a) Placebo subjects, (b) fluoxetine subjects, (c) venlafaxine IR subjects, and (d) venlafaxine XR subjects.
7 General Approaches to Analysis of Course 175
As a final analysis, the placebo and medication groups were analyzed together
for the postrandomization time points. Figure 7.10 displays the estimated
three-class solution, which again shows a responder class, a nonresponder
class which initially improves but then worsens (similar to the placebo response
class found in the placebo group), and a high nonresponder class.11 As a first
step, it is of interest to compare the joint placebo–medication group analysis of
Figure 7.10 to the separate placebo group analysis of Figure 7.4b and the
separate medication group analysis of Figure 7.6.
Comparing the joint analysis in Figure 7.10 to that of the placebo group
analysis of Figure 7.4b indicates the improved outcome when medication
group individuals are added to the analysis. In the placebo group analysis
of Figure 7.4b 78% are in the two highest, clearly nonresponding trajectory
classes, whereas in the joint analysis of Figure 7.10 only 36% are in the
highest, clearly nonresponding class. In this sense, medication seems to
have a positive effect in reducing depression. Furthermore, in the placebo
analysis, 21% are in the placebo-responding class which ultimately worsens,
whereas in the joint analysis 21% are in this type of class and 43% are in a
clearly responding class.
Comparing the joint analysis in Figure 7.10 to that of the medication
group analysis of Figure 7.6 indicates the worsened outcome when placebo
group individuals are added to the analysis. In the medication group analysis
of Figure 7.6 only 17% are in the nonresponding class compared to 36% in
the joint analysis of Figure 7.10. Figure 7.6 shows 15% in the initially
improving but ultimately worsening class compared to 21% in Figure 7.10.
Figure 7.6 shows 68% in the responding class compared to 43% in Figure 7.10.
All three of these comparisons indicate that medication has a positive effect
in reducing depression.
As a second step, it is of interest to study the medication effects for each
medication separately. The joint analysis model allows this because the class
probabilities differ between the placebo group and each of the three medica-
tion groups, as expressed by equation 8. The results are shown in Figure
7.11. For the placebo group, the responder class (class 3) is estimated to be
26%, the initially improving nonresponder class (class 1) to be 22%, and the
high nonresponder class (class 2) to be 52%. In comparison, for the fluox-
etine group the responder class is estimated to be 48% (better than placebo),
the initially improving nonresponder class to be 0% (better than placebo),
and the high nonresponder class to be 52% (same as placebo). For the
11. The log-likelihood value for the model shown in Figure 7.10 is –1,744.999, replicated across
many random starts, with 29 parameters and a BIC value of 3,621. The entropy value is 0.69.
176 Causality and Psychopathology
26
25
24
23
22
21
20
19
18
17
16
15
HamD
14
13
12
11
10
9
8
7 Class 1, 21.0%
6
5 Class 2, 35.8%
4
3 Class 3, 43.1%
2
1
0
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.10 Three-class GMM analysis of both groups using post-randomization time
points.
50 47 100
46 90
45 90
40 80
35 70
30 60
25 50
20 40
15 30
10 7 20 10
5 10 0
0 0
R IINR HNR R IINR HNR
R = Responder Class
IINR = Initially Improving Non-Responder Class
HNR = High Non-Responder Class
Figure 7.11 Medication effects in each of 3 trajectory classes.
7 General Approaches to Analysis of Course 177
Conclusions
The growth mixture analysis presented here demonstrates that, unlike con-
ventional repeated measures analysis, it is possible to estimate medication
effects in the presence of placebo effects. The analysis is flexible in that the
medication effect is allowed to differ across trajectory classes. This approach
should therefore have wide applicability in clinical trials. It was shown that
medication effects could be expressed as causal effects. The analysis also
produces a classification of individuals into trajectory classes.
Medication effects were expressed in two alternative ways, as changes in
growth slopes and as changes in class probabilities. Related to the latter
approach, a possible generalization of the model is to include two latent
class variables, one before and one after randomization, and to let the med-
ication influence the postrandomization latent class variable as well as transi-
tions between the two latent class variables. Another generalization is
proposed in Muthén and Brown (2009) considering four classes of subjects:
(1) subjects who would respond to both placebo and medication, (2) subjects
who would respond to placebo but not medication, (3) subjects who would
respond to medication but not placebo, and (4) subjects who would respond
to neither placebo nor medication. Class 3 is of particular interest from a
pharmaceutical point of view.
Prediction of class membership can be incorporated as part of the model
but was not explored here. Such analyses suggest interesting opportunities
for designs of trials. If at baseline an individual is predicted to belong to a
nonresponder class, a different treatment can be chosen.
References
Leuchter, A. F., Cook, I. A., Witte, E. A., Morgan, M., & Abrams, M. (2002). Changes
in brain function of depressed subjects during treatment with placebo. American
Journal of Psychiatry, 159, 122–129.
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related
techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative meth-
odology for the social sciences (pp. 345–368). Newbury Park, CA: Sage Publications.
178 Causality and Psychopathology
Muthén, B., & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-
Gaussian random effects. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G.
Molenberghs (Eds.), Longitudinal data analysis (pp. 143–165). Boca Raton, FL:
Chapman & Hall/CRC Press.
Muthén, B. & Brown, H. (2009). Estimating drug effects in the presence of placebo
response: Causal inference using growth mixture modeling. Statistics in Medicine,
28, 3363–3385.
Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002).
General growth mixture modeling for randomized preventive interventions.
Biostatistics, 3, 459–475.
Muthén, B., & Muthén, L. (2000). Integrating person-centered and variable-centered
analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical
and Experimental Research, 24, 882–891.
Muthén, B., & Muthén, L. (1998–2008). Mplus user’s guide (5th ed.) Los Angeles:
Muthén & Muthén.
Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes
using the EM algorithm. Biometrics, 55, 463–469.
Quitkin, F. M., Rabkin, J. G., Ross, D., & Stewart, J. W. (1984). Identification of true
drug response to antidepressants. Use of pattern analysis. Archives of General
Psychiatry, 41, 782–786.
8
Introduction
The past two decades have brought new pharmacotherapies as well as beha-
vioral therapies to the field of drug-addiction treatment (Carroll & Onken,
2005; Carroll, 2005; Ling & Smith, 2002; Fiellin, Kleber, Trumble-Hejduk,
McLellan, & Kosten, 2004). Despite this progress, the treatment of addiction
in clinical practice often remains a matter of trial and error. Some reasons for
this difficulty are as follows. First, to date, no one treatment has been found
that works well for most patients; that is, patients are heterogeneous in
response to any specific treatment. Second, as many authors have pointed
out (McLellan, 2002; McLellan, Lewis, O’Brien, & Kleber, 2000), addiction is
often a chronic condition, with symptoms waxing and waning over time.
Third, relapse is common. Therefore, the clinician is faced with, first, finding
a sequence of treatments that works initially to stabilize the patient and, next,
deciding which types of treatments will prevent relapse in the longer
term. To inform this sequential clinical decision making, adaptive treatment
strategies, that is, treatment strategies shaped by individual patient
characteristics or patient responses to prior treatments, have been proposed
(Greenhouse, Stangl, Kupfer, & Prien, 1991; Murphy, 2003, 2005; Murphy,
Lynch, Oslin, McKay, & Tenhave, 2006; Murphy, Oslin, Rush, & Zhu, 2007;
Lavori & Dawson, 2000; Lavori, Dawson, & Rush, 2000; Dawson & Lavori,
2003).
Here is an example of an adaptive treatment strategy for prescription
opioid dependence, modeled with modifications after a trial currently in
progress within the Clinical Trials Network of the National Institute on
Drug Abuse (Weiss, Sharpe, & Ling, 2010).
179
180 Causality and Psychopathology
Initial Treatment
4 week treatment
During the initial 4 week treatment During the initial 4 week treatment
Treat untill 16 weeks have elapsed Treat untill 16 weeks have elapsed
from the beginning of initial treatment from the beginning of initial treatment
Example
First, provide all patients with a 4-week course of buprenorphine/nalox-
one (Bup/Nx) plus medical management (MM) plus individual
drug counseling (IDC) (Fiellin, Pantalon, Schottenfeld, Gordon, &
O’Connor, 1999), culminating in a taper of the Bup/Nx. If at any
time during these 4 weeks the patient meets the criterion for nonre-
sponse,1 a second, longer treatment with Bup/Nx (12 weeks) is pro-
vided, accompanied by MM and cognitive behavior therapy (CBT).
However, if the patient remains abstinent2 from opioid use during
those 4 weeks, that is, responds to initial treatment, provide 12 addi-
tional weeks of relapse prevention therapy (RPT).
1. Response to initial treatment is abstinence from opioid use during these first 4 weeks.
Nonresponse is defined as any opioid use during these first 4 weeks
2. Abstinence might be operationalized using a criterion based on self-report of opioid use and
urine screens.
8 SMART Design in the Development of Adaptive Treatment Strategies 181
Table 8.1 Potential Strategies to Consider for the Treatment of Prescription Opioid
Dependence
to the initial treatment and RPT to responders to the initial treatment, Which
is the best initial behavioral treatment: MM+IDC or MM? This is a
comparison of strategies A and C. Alternately, we might wish to identify
which of the four strategies results in the best long-term outcome
(here, the highest number of days abstinent). Note that the behavioral
therapies and pharmacotherapies are illustrative and were selected to
enhance the concreteness of this example; of course, other selections are
possible.
These research questions can be classified into one of four general types,
as summarized in Table 8.2. The SMART experimental design discussed
in the next section is particularly suited to addressing these types of
questions.
8 SMART Design in the Development of Adaptive Treatment Strategies 183
Initial treatment:
Randomization
+MM+CBT +MM
12 wks Bup/Nx 12 wks Bup/Nx Relapse 12 wks Bup/Nx 12 wks Bup/Nx Relapse
+MM+CBT +MM+CBT Prevention +MM+CBT +MM+CBT Prevention
Measure days Measure days Measure days Measure days Measure days Measure days
abstinent over abstinent over abstinent over abstinent over abstinent over abstinent over
wks 1-16 wks 1-16 wks 1-16 wks 1-16 wks 1-16 wks 1-16
Figure 8.2 SMART study design to develop adaptive treatment strategies for prescrip-
tion opioid dependence.
8 SMART Design in the Development of Adaptive Treatment Strategies 185
pooled outcome data of subgroups 2 and 5. This is the main effect of the
secondary behavioral treatment among those not abstinent during the initial
4-week treatment.
An example of the third type question would be to test whether strategies
A and C in Table 8.1 result in different outcomes; to form this test, we use
appropriately weighted outcomes from subgroups 1 and 3 to form an average
outcome for strategy A and appropriately weighted outcomes from subgroups
4 and 6 to form an average outcome for strategy C (an alternate example
would concern strategies B and D; see the next section for formulae).
Note that to compare strategies, we require outcomes from both initial
responders as well as initial nonresponders (e.g., subgroup 3 in addition to
subgroup 1 and subgroup 6 in addition to subgroup 4). The fourth type of
question concerns the estimation of the best of the strategies. To choose the
best strategy overall, we follow a similar ‘‘weighting’’ process to form the
average outcome for each of the four strategies (A, B, C, D) and then desig-
nate as the best strategy the one that is associated with the highest average
outcome.
In this section, we provide the test statistics and sample size formulae for the
four types of research questions summarized in Table 8.2. We assume that
subjects are randomized equally to the two treatment options at each step.
We use the following notation: A1 is the indicator for initial treatment, R
denotes the response to the initial treatment (response = 1 and nonresponse
= 0), A2 is the treatment indicator for secondary treatment, and Y is the
outcome. In our prescription opioid dependence example, the values for
these variables are as follows: A1 is 1 if the initial treatment uses
MM+IDC and 0 otherwise, A2 is 1 if the secondary treatment for nonrespon-
ders uses MM+CBT and 0 otherwise, and Y is the number of days the subject
remained abstinent over the 16-week study period.
where NA1=i denotes the number of subjects who received i as the initial
treatment
2a
ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ
Z ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S2R¼0; A2¼1 S2R¼0; A2¼0
NR¼0; A2¼1 þ NR¼0; A2¼0
3b pffiffiffiffi
N ð
^ A1¼1; A2¼a2
^ A1¼0; A2¼b2 Þ
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ 2A1¼1; A2¼a2 þ ^ 2A1¼0; A2¼b2
where N is the total number of subjects and a2 and b2 are the secondary
treatments in the two prespecified strategies being compared
4 Choose largest of
^ A1¼1; A2¼1 ;
^ A1¼0; A2¼1 ;
^ A1¼1; A2¼0 ;
^ A1¼0; A2¼0
a
The subscripts on Y and S2 denote groups of subjects. For example YR¼0;A2¼1 is the average
outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. S2R¼0;A2¼1 is the
sample variance of the outcome for subjects who do not respond initially (R = 0) and are assigned
A2 = 1. Similarly, the subscript on N denotes the group of subjects.
b
^ is an estimator of the mean outcome and^ 2 is the associated variance estimator for a
^ and ^ 2 are in
particular strategy. Here, the subscript denotes the strategy. The formulae for
Table 8.4.
normally distributed (with mean zero under the null hypothesis of no effect).
In Tables 8.3, 8.4, and 8.5, specific values of Ai are denoted by ai and bi,
where i indicates the initial treatment (i = 1) or secondary treatment (i = 2);
these specific values are either 1 or 0.
Table 8.4 Estimators for Strategy Means and for Variance of Estimator of Strategy
Means
^ A1¼a1; A2¼a2 ¼
i¼1 ^ 2A1¼a1; A2¼a2 ¼ Wi ða1 ; a2 Þ2
XN N i¼1
Wi ða1 ; a2 Þi ðYi
^ A1¼a1; A2¼a2 Þ2
i¼1
A1i A2i
(1, 1) Wi ð1; 1Þ ¼ ð1 Ri Þ þ Ri
:5 :5
A1i ð1 A2i Þ
(1, 0) Wi ð1; 0Þ ¼ ð1 Ri Þ þ Ri
:5 :5
ð1 A1i Þ A2i
(0, 1) Wi ð0; 1Þ ¼ ð1 Ri Þ þ Ri
:5 :5
ð1 A1i Þ ð1 A2i Þ
(0, 0) Wi ð0; 0Þ ¼ ð1 Ri Þ þ Ri
:5 :5
Data for subject i are of the form (A1i, Ri, A2i, Yi), where A1i, Ri, A2i, and Yi are defined as in the
section Test Statistics and Sample Size Formulae and N is the total sample size.
In order to calculate the sample size, one must also input the desired
detectable standardized effect size. We denote the standardized effect size by
and use the definition found in Cohen (1988). The standardized effect sizes
for the various research questions we are considering are summarized in
Table 8.5.
The sample size formulae for questions 1 and 2 are standard formulae
(Jennison & Turnbull, 2000) and assume an equal number in each of the two
groups being compared. Given desired levels of size, power, and standardized
effect size, the total sample size required for question 1 is
N1 ¼ 2 2 ðz=2 þ z Þ2 ð1=Þ2
The sample size formula for question 2 requires the user to postulate the
initial response rate, which is used to provide the number of subjects who
will be randomized to secondary treatments. The sample size formula uses
the working assumption that the initial response rates are equal; that is,
subjects respond to initial treatment at the same rate regardless of the parti-
cular initial treatment, p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0]. This working
assumption is used only to size the SMART and is not used to analyze the
188 Causality and Psychopathology
Table 8.5 Standardized Effect Sizes for Addressing the Four Questions in Table 8.2
E½Y j A1 ¼ 1 E½Y j A1 ¼ 0
1 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j A1 ¼ 1 þ Var½Y j A1 ¼ 0
2
E½Y j R ¼ 0; A2 ¼ 1 E½Y j R ¼ 0; A2 ¼ 0
2 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j R ¼ 0; A2 ¼ 0 þ Var½Y j R ¼ 0; A2 ¼ 0
2
E½Y j A1 ¼ 1; A2 ¼ a2 E½Y j A1 ¼ 0; A2 ¼ b2
3 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j A1 ¼ 1; A2 ¼ a2 þ Var½Y j A1 ¼ 0; A2 ¼ b2
2
data from it, as can be seen from Table 8.3. The formula for the total
required sample size for question 2 is
When calculating the sample sizes to test question 3, two different sample
size formulae can be used: one that inputs the postulated initial response rate
and one that does not. The formula that uses a guess of the initial response
rate makes two working assumptions. First, the response rates are equal for
both initial treatments (denoted by p), and second, the variability of the out-
come Y around the strategy mean (A1 = 1, A2 = a2), among either initial
responders or nonresponders, is less than the variance of the strategy mean
and similarly for strategy (A1 = 0, A2 = b2). This formula is
The second formula does not require either of these two working assump-
tions; it specifies the sample size required if the response rates are both 0, a
‘‘worst-case scenario.’’ This conservative sample size formula for addressing
question 3 is
We will compare the performance of these two sample size formulae for
addressing question 3 in the next section. See the Appendix for a derivation
of these formulae.
The method for finding the sample size for question 4 relies on an algo-
rithm rather than a formula; we will refer to the resulting sample size as N4.
Since question 4 is not a hypothesis test, instead of specifying power to detect
a difference in two means, the sample size is based on the desired probability
to detect the strategy that results in the highest mean outcome. The standar-
dized effect size in this case involves the difference between the two highest
strategy means. This algorithm makes the working assumption that
2 = Var[Y|A1 = a1, A2 = a2] is the same for all strategies. The algorithm uses
an idea similar to the one used to derive the sample size formula for question
3 that is invariant to the response rate. Given a desired level of probability
for selecting the correct treatment strategy with the highest mean and a
desired treatment strategy effect, the algorithm for question 4 finds the
sample sizes that correspond to the range of response probabilities and
then chooses the largest sample size. Since it is based on a worst-case sce-
nario, this algorithm will result in a conservative sample size formula. See
the Appendix for a derivation of this algorithm. The online sample size
calculator for question 4 can be found at http://methodologymedia.
psu.edu/smart/samplesize.
Example sample sizes are given in Table 8.6. Note that as the response
rate decreases, the required sample sizes for question 3 (e.g., comparing two
strategies that have different initial treatments) increases. To see why this
must be the case, consider two extreme cases, the first in which the response
rate is 90% for both initial treatments and the second in which the nonre-
sponse rate is 90%. In the former case, if n subjects are assigned to treatment
1 initially and 90% respond (i.e., 10% do not respond), then the resulting
sample size for strategy (1, 1) is 0.9 * n + ½ * 0.1 * n = 0.95 * n. The ½ occurs
due to the second randomization of nonresponders between the two second-
ary treatments. On the other hand, if only 10% respond (i.e., 90% do not
respond), then the resulting sample size for strategy (1, 1) is 0.1 * n + ½ *
0.9 * n = 0.55 * n, which is less than 0.95 * n. Thus, the lower the expected
response rate, the larger the initial sample size required for a given power
to differentiate between two strategies. This result occurs because the
number of treatment options (two options) for nonresponders is greater
than the number of treatment options for responders (only one).
Consider the prescription opioid dependence example. Suppose we are
particularly interested in investigating whether MM+CBT or MM+IDC is
best for subjects who do not respond to their initial treatment. This is a
question of type 2. Thus, in order to ascertain the sample size for the
SMART design in Figure 8.2, we use formula N2. Suppose we decide to
190 Causality and Psychopathology
Table 8.6 Example Sample Sizes: All Entries Are for Total Sample Size
size the trial to detect a standardized effect size of 0.2 between the two
secondary treatments with the power and size of the (two-tailed) test at
0.80 and 0.05, respectively. After surveying the literature and discussing the
issue with colleagues, suppose we decide that the response rate for the two
initial treatments will be approximately 0.10 (p = 0.10). The number of sub-
jects required for this trial is then N2 ¼ 2 2 ðz=2 þ z Þ2 ð1=Þ2 =ð1 pÞ ¼
4 ðz0:05=2 þ z0:2 Þ2 ð1=0:2Þ2 =0:9 ¼ 871. Furthermore, as secondary objectives,
suppose we are interested in comparing strategy A:—Begin with MM+IDC; if
nonresponse, provide MM+CBT; if response, provide RPT—with D—Begin
with MM; if nonresponse, provide MM+IDC; if response, provide RPT—
(corresponding to a specific example of question 3) and in choosing the
best strategy overall (question 4). Using the same input values for the para-
meters and looking at Table 8.6, we see that the sample size required
for question 3 is about twice as much as that required for question 2.
Thus, unless we are willing and able to double our sample size, we realize
that a comparison of strategies A and D will have low power. However, the
sample size for question 4 is only 358 (using desired probability of 0.80),
so we will be able to answer the secondary objective of choosing the best
strategy with 80% probability.
Suppose that we conduct the trial with 871 subjects. The hypothetical data
3
set and SAS code for calculating the following values can be found at http://
www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/. For question 2, the
value of the z-statistic is
which has a two-sided p value of 0.0332. Using the formulae in Table 8.4, we
get the following estimates for the strategy means:
½
^ ð1;1Þ ; ^ ð1;0Þ ;
^ ð0;1Þ ;
^ ð0;0Þ ¼ ½7:1246 4:9994 6:3285 5:6364:
3. We generated this hypothetical data so that the true underlying effect size for question 2 is 0.2,
the true effect size for question 3 is 0.2, and the strategy with the highest mean in truth is
(1, 1), with an effect size of 0.1. Furthermore, the true response rates for the initial treatments
are 0.05 for A1 = 0 and 0.15 for A1 = 1. When we considered 1,000 similar data sets, we found
that the analysis for question 2 led to significant results 78% of the time and the analysis for
question 3 led to significant results 54% of the time. The latter result and the fact that we did
not detect an effect for question 3 in the analysis is unsurprising, considering that we have half
the sample size required to detect an effect size of 0.2. Furthermore, across the 1,000 similar
simulated data sets the best strategy (1, 1) was detected 86% of the time.
192 Causality and Psychopathology
The corresponding estimates for the variances of the estimates of the strategy
means are
which has a two-sided p value of 0.1291, which leads us not to reject the null
hypothesis that the two strategies are equal. For question 4, we choose (1, 1)
as the best strategy, which corresponds to the strategy:
2. For those who respond, provide RPT. For those who do not respond,
continue the Bup/Nx treatment for 12 weeks but switch the accompa-
nying behavioral treatment to MM+CBT.
Simulation Designs
The sample sizes used for the simulations were chosen to give a power level
of 0.90 and a Type I error of 0.05 when one of questions 1–3 is used to size
the trial and a 0.90 probability of choosing the best strategy for question 4
when it is used to size the trial; these sample sizes are shown in Table 8.6.
For questions 1–3, power is estimated by the proportion of times out of 1,000
simulations that the null hypothesis is correctly rejected; for question 4, the
probability of choosing the best strategy is estimated by the proportion of
times out of 1,000 simulations that the correct strategy with the highest
8 SMART Design in the Development of Adaptive Treatment Strategies 193
• the variances relevant to the question are unequal (for question 4 only)
We also assess the power of question 4 when it is not used in sizing the trial.
For each of the types of research questions in Table 8.2, we generate a data
set that follows the working assumptions for the sample size formula for that
question (e.g., use N2 to size the study to test the effect of the second
treatment on the mean outcome) and then perform question 4 on the data
and estimate the probability of choosing the correct strategy with the highest
mean outcome.
The descriptions of the simulation designs for each of questions 1–4 as
well as the parameters for all of the different generative models can be found
at http://www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/.
194 Causality and Psychopathology
Table 8.8a The Probabilitya of Choosing the Correct Strategy for Question 4
When Sample Size Is Calculated to Reject the Null Hypothesis for Question 1
(for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05)
Table 8.8b The Probabilitya of Choosing the Correct Strategy for Question 4
When Sample Size Is Calculated to Reject the Null Hypothesis for Question 2
(for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05)
strategy mean as the maximum when sizing for question 3 is generally very
good, as can be seen from Table 8.8c. This is due to the fact that the sample
sizes required to test the differences between two strategy means (each
beginning with a different initial treatment) are much larger than those
needed to detect the maximum of four strategy means with a specified
degree of confidence. For a z-test of the difference between two strategy
means with a two-tailed Type I error rate of 0.05, power of 0.90, and stan-
dardized effect size of 0.20, the sample size requirements range 1,584–2,112.
The sample size required for a 0.90 probability of selecting the correct strat-
egy mean as a maximum when the standardized effect size between it and
the next highest strategy mean is 0.2 is 608. It is therefore not surprising that
the selection rates for the correct strategy mean are generally high when
8 SMART Design in the Development of Adaptive Treatment Strategies 197
Table 8.8c The Probabilitya of Choosing the Correct Strategy for Question 4
When Sample Size Is Calculated to Reject the Null Hypothesis for Question 3
(for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05)
Summary
Overall, the sample size formulae perform well even when the working
assumptions are violated. Additionally, the performance of question 4 is
consistently good when sizing for all other research questions; this is most
likely due to question 4 requiring smaller sample sizes than the other
research questions to achieve good results.
When planning a SMART similar to the one considered here, if one is
primarily concerned with testing differences between prespecified strategy
means, we would recommend using the less conservative formula N3a if
one has confidence in knowledge of the initial response rates. We recom-
mend this in light of the considerable cost savings that can be accrued by
using this approach, in comparison to the more conservative formula N3b.
We comment further on this topic in the Discussion.
Discussion
Appendix
H0 :
ð1;a2Þ
ð1;b2Þ ¼ 0
H1 :
ð1;a2Þ
ð1;b2Þ ¼
qffiffiffiffiffiffiffiffiffi
ffi
2 þ
2
where
¼ 1 2 0 . (Note that is the standardized effect size.)
As presented in Statistics for Addressing the Different Research
Questions, the test statistic for this hypothesis is
pffiffiffiffi
N
^ ð1; a2Þ
^ ð0; b2Þ
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ 2ð1; a2Þ þ ^ 2ð0; b2Þ
where
^ ða1;a2Þ and ^ 2ða1;a2Þ are as defined in Table 8.5; in large samples, this
test statistic has a standard normal distribution under the null hypothesis
200 Causality and Psychopathology
(Murphy, Van Der Laan, Robins, & Conduct Problems Prevention Group,
2001). Recall that N is the total sample size for the trial. To find the
required sample size N for a two-sided test with power 1– and size , we
solve
for N where z=2 is the standard normal (1–z=2 ) percentile. Thus, we have
and
ð1; a2Þ
ð0; b2Þ , so we have
2 3
pffiffiffiffi
pffiffiffiffi
6 N
^ ð1; a2Þ
^ ð0; b2Þ
N 7
Pr6 4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > z=2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
7¼1
5
2ð1; a2Þ þ 2ð0; b2Þ 2 þ 2
ð1; a2Þ ð0; b2Þ
N
^ ð1; a2Þ
^ ð0; b2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ð1; a2Þ þ 2ð0; b2Þ
Now, using equation 10 in Murphy (2005) for k = 2 steps1 (initial and sec-
ondary) of treatment,
" #
2 ðY
ða1; a2Þ Þ2
ða1; a2Þ ¼ Ea1;a2
Prða1 Þ Prða2 j R; a1 Þ
" #
ðY
ða1; a2Þ Þ2
¼ Ea1;a2 R ¼ 1 Pra1 ½R ¼ 1
Prða1 Þ Prða2 j 1; a1 Þ
" #
ðY
ða1; a2Þ Þ2
þ Ea1;a2 R ¼ 0 Pra1 ½R ¼ 0
Prða1 Þ Prða2 j 0; a1 Þ
for all values of a1, a2; the subscripts on E and Pr (namely, Ea1,a2 and Pra1)
indicate expectations and probabilities calculated as if all subjects were
assigned a1 as the initial treatment and then, if nonresponse, assigned treat-
ment a2. If we are willing to make the assumption (*) that
Ea1;a2 ½ðY
ða1; a2Þ Þ2 jR Ea1;a2 ½ðY
ða1; a2Þ Þ2
for both R = 1 and R = 0 (i.e., the variability of the outcome around the strat-
egy mean among either responders or nonresponders is no more than the
variance of the strategy mean), then
Pra1 ½R ¼ 1
2ða1; a2Þ Ea1;a2 ½ðY
ða1; a2Þ Þ2
Prða1 Þ Prða2 j 1; a1 Þ
Pra1 ½R ¼ 0
þ Ea1;a2 ½ðY
ða1; a2Þ Þ2 :
Prða1 Þ Prða2 j 0; a1 Þ
Thus, we have
Pra1 ½R ¼ 1 Pra1 ½R ¼ 0
2ða1; a2Þ
2ða1; a2Þ þ ð2Þ
Prða1 Þ Prða2 j 1; a1 Þ Prða1Þ Prða2 j 0; a1 Þ
which is the sample size formula given in Sample Size Calculations that
depends on the response rate p.
Going through the arguments once again, we see that we do not need
either of the two working assumptions (*) or (**) to obtain the conservative
sample size formula, N3b:
• The marginal variances of the final outcome given the strategy are
all equal, and we denote this variance by
2. This means that
2 =
Var[Y|A1 = a1, A2 = a2] for all (a1, a2) in {(1,1), (1,0), (0,1), (0,0)}.
• The correlation between the estimated mean outcome for strategy (1, 1)
and the estimated mean outcome for strategy (1, 0) is the same as the
correlation between the estimated mean outcome for strategy (0, 1) and
the estimated mean outcome resulting for strategy (0, 0); we denote
this identical correlation by .
8 SMART Design in the Development of Adaptive Treatment Strategies 203
• the desired probability that the strategy estimated to have the largest
mean outcome does in fact have the largest mean,
We assume that three of the strategies have the same mean and the one
remaining strategy produces the largest mean; this is an extreme scenario in
which it is most difficult to detect the presence of an effect. Without loss of
generality, we choose strategy (1, 1) to have the largest mean.
Consider the following algorithm as a function of N:
1. For every value of in {0, 0.01, 0.02, . . . , 0.99, 1} perform the following
simulation:
Generate K = 20,000 samples of ½
^ ð1;1Þ
^ ð1;0Þ
^ ð0;1Þ
^ ð0;0Þ T from
a multivariate normal with
2 3 2 3
ð1;1Þ =2
6
ð1;0Þ 7 6 0 7
mean M ¼ 6 7 6 7
4
ð0;1Þ 5 ¼ 4 0 5 and
ð0;0Þ 0
2 3
1 0 0
16
6 1 0 077
covariance matrix ¼ 4
N 0 0 1 5
0 0 1
^ ð1; 1Þ;k
^ ð1; 0Þ;k
^ ð0; 1Þ;k
^ ð0; 0Þ;k .
204 Causality and Psychopathology
The online calculator for the sample size for question 4 can be found at
http://methodologymedia.psu.edu/smart/samplesize.
References
Lavori, P.W., & Dawson, R. (2000). A design for testing clinical strategies: Biased
adaptive within-subject randomization. Journal of the Royal Statistical Association,
163, 29–38.
Lavori, P. W., Dawson, R., & Rush, A. J. (2000). Flexible treatment strategies in chronic
disease: Clinical and research implications. Biological Psychiatry, 48, 605–614.
Ling, W., Amass, L., Shoptow, S., Annon, J. J., Hillhouse, M., Babcock, D., et al.
(2005). A multi-center randomized trial of buprenorphine-naloxone versus clonidine
for opioid detoxification: Findings from the National Institute on Drug Abuse
Clinical Trials Network. Addiction, 100, 1090–1100.
Ling, W., & Smith, D. (2002). Buprenorphine: Blending practice and research. Journal
of Substance Abuse Treatment, 23, 87–92.
McLellan, A. T. (2002). Have we evaluated addiction treatment correctly? Implications
from a chronic care perspective. Addiction, 97, 249–252.
McLellan, A. T., Lewis, D. C., O’Brien, C. P., & Kleber, H. D. (2000). Drug dependence,
a chronic medical illness. Implications for treatment, insurance, and outcomes
evaluation. Journal of the American Medical Association, 284(13), 1689–1695.
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal
Statistical Society, 65, 331–366.
Murphy, S. A. (2005). An experimental design for the development of adaptive treat-
ment strategies. Statistics in Medicine, 24, 1455–1481.
Murphy, S. A., Lynch, K. G., Oslin, D.A., McKay, J. R., & Tenhave, T. (2006).
Developing adaptive treatment strategies in substance abuse research. Drug and
Alcohol Dependence. doi:10.1016/j.drugalcdep.2006.09.008.
Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges
in constructing effective treatment sequences for chronic psychiatric disorders.
Neuropsychopharmacology, 32, 257–262.
Murphy, S. A., Van Der Laan, M. J., Robins, J. M., & Conduct Problems Prevention
Group (2001). Marginal mean models for dynamic regimes. Journal of the American
Statistical Association, 96(456), 1410–1423.
Rush, A. J., Crismon, M. L., Kashner, T. M., Toprac, M. G., Carmody, T. J., Trivedi, M.
H., et al. (2003). Texas medication algorithm project, phase 3 (TMAP-3): Rationale
and study design. J. Clin. Psychiatry, 64(4), 357–369.
Stroup, T. S., McEvoy, J. P., Swartz, M. S., Byerly, M. J., Glick, I. D, Canive, J. M., et al.
(2003). The National Institute of Mental Health Clinical Antipsychotic Trials of
Intervention Effectiveness (CATIE) project: Schizophrenia trial design and protocol
development. Schizophrenia Bulletin, 29(1), 15–31.
Weiss, R., Sharpe, J. P., & Ling, W. A. (2010). Two-phase randomized controlled
clinical trial of buprenorphine/naloxone treatment plus individual drug counseling
for opioid analgesic dependence. National Institute on Drug Abuse Clinical Trials
Network. Retrieved June 14, 2020 from http://www.clinicaltrials.gov/ct/show/
NCT00316277?order=1
9
206
9 Obtaining Robust Causal Evidence From Observational Studies 207
RR 2
1.5
0.5
0
0-1 year 2-4 years 5-9 years >10 years
Figure 9.1 Observed effect of duration of vitamin E use compared to no use on coronary
heart disease events in the Health Professional Follow-Up Study. From ‘‘Vitamin E con-
sumption and the risk of coronary heart disease in men,’’ by E. B. Rimm, M. J. Stampfer,
A. Ascherio, E. Giovannucci, G. A. Colditz, & W. C. Willett, 1993, New England Journal of
Medicine, 328, 1450–1456.
40
Male Female
30
Percent
20
10
0
Multivitamins/multimineral Vitamin E Vitamin C
Figure 9.2 Use of vitamin supplements in the past month among U.S. adults, 1999–
2000. From ‘‘Dietary supplement use by US adults: Data from the National Health and
Nutrition Examination Survey, 1999–2000,’’ by K. Radimer, B. Bindewald, J. Hughes,
B. Ervin, C. Swanson, & M. F. Picciano, 2004, American Journal of Epidemiology, 160,
339–349.
208 Causality and Psychopathology
30
10
0
Multivitamins Vitamin E Vitamin C
Figure 9.3 Use of vitamin supplements in U.S. adults, 1987–2000. From ‘‘Use of vitamin,
mineral, nonvitamin, and nonmineral supplements in the United States: The 1987,
1992, and 2000 National Health Interview Survey results.’’ by A. E. Millen, K. W.
Dodd, & A. F. Subar, 2004, Journal of the American Dietetic Association, 104, 942–950.
1.1
1.0
0.9
0.7
0.5
0.3
Stampfer 1993 Rimm 1993 RCTs
Figure 9.4 Vitamin E supplement use and risk of coronary heart disease in two obser-
vational studies (Rimm et al., 1993; Stampfer et al., 1993) and in a meta-analysis of
randomized controlled trials (Eidelman, Hollar, Hebert, Lamas, & Hennekens, 2004).
9 Obtaining Robust Causal Evidence From Observational Studies 209
1.06 (0.95,1.16)
Heart
Protection
Study
EPIC m 0.72 (0.61,0.86)
0.70 (0.51,0.95)
EPIC m*
0.63 (0.49,0.84)
EPIC w
0.63 (0.45,0.90)
EPIC w*
.4 .6 .8 1 1.2
Relative risk
Figure 9.5 Estimates of the effects of an increase of 15.7 mmol/l plasma vitamin C on
coronary heart disease 5-year mortality estimated from the observational epidemiolo-
gical European Prospective Investigation Into Cancer and Nutrition (EPIC) (Khaw
et al., 2001) and the randomized controlled Heart Protection Study (Heart
Protection Study Collaborative Group, 2002). EPIC m, male, age-adjusted; EPIC m*,
male, adjusted for systolic blood pressure, cholesterol, body mass index, smoking,
diabetes, and vitamin supplement use; EPIC f, female, age-adjusted; EPIC f*,
female, adjusted for systolic blood pressure, cholesterol, body mass index, smoking,
diabetes, and vitamin supplement use.
cholesterol levels, rather than low cholesterol levels increasing the risk
of cancer. Similarly, studies of inflammatory markers such as C-reactive
protein and cardiovascular disease risk have shown that early stages of
atherosclerosis—which is an inflammatory process—may lead to elevation
in circulating inflammatory markers; and since people with atherosclerosis
are more likely to experience cardiovascular events, a robust, but noncausal,
association between levels of inflammatory markers and incident cardiovas-
cular disease is generated. Reverse causation can also occur through beha-
vioral processes—for example, people with early stages and symptoms of
cardiovascular disease may reduce their consumption of alcohol, which
would generate a situation in which alcohol intake appears to protect against
cardiovascular disease. A form of reverse causation can also occur through
reporting bias, with the presence of disease influencing reporting disposition.
In case–control studies people with the disease under investigation may
report on their prior exposure history in a different way from controls, per-
haps because the former will think harder about potential reasons that
account for why they have developed the disease.
Table 9.2a Means or proportions of blood pressure, pulse pressure, hypertension and
potential confounders by quarters of C-reactive protein (CRP) N = 3,529 (from Davey
Smith et al 2005)
Table 9.2b Means or proportions of CRP systolic blood pressure, hypertension and
potential confounders by 1059G/C genotype (from Davey Smith et al 2005)
Mendelian Randomization
box 9.1
Phenocopy, Genocopy, and Mendelian Randomization
9.3 (continued)
Box 9.1 (continued)
Finally, a genetic variant will indicate long-term levels of exposure and if the
variant is taken as a proxy for such exposure, it will not suffer from the mea-
surement error inherent in phenotypes that have high levels of variability.
For example, groups defined by cholesterol level–related genotype will, over
a long period, experience the cholesterol difference seen between the groups.
For individuals, blood cholesterol is variable over time, and the use of single
measures of cholesterol will underestimate the true strength of association
between cholesterol and, say, CHD. Indeed, use of the Mendelian randomi-
zation approach predicts a strength of association that is in line with RCT
findings of the effects of cholesterol lowering when the increasing benefits
seen over the relatively short trial period are projected to the expectation for
differences over a lifetime (Davey Smith & Ebrahim, 2004), which will be
discussed further.
The term Mendelian randomization has now become widely used (see
Box 9.2), with a variety of meanings. This partly reflects the fact that there
are several categories of inference that can be drawn from studies utilizing
the Mendelian randomization approach. In the most direct forms, genetic
variants can be related to the probability or level of exposure (‘‘exposure
propensity’’) or to intermediate phenotypes believed to influence disease
risk. Less direct evidence can come from genetic variant–disease associations
that indicate that a particular biological pathway may be of importance,
perhaps because the variants modify the effects of environmental exposures.
Several examples of these categories have been given elsewhere (Davey Smith
& Ebrahim, 2003, 2004; Davey Smith, 2006; Ebrahim & Davey Smith, 2008);
here, a few illustrative cases are briefly outlined.
Exposure Propensity
Alcohol Intake and Health
box 9.2
Why ‘‘Mendelian Randomization’’?
with osteoporosis, low bone mineral density, or fracture risk thus provide
evidence that milk drinking reduces the risk of these conditions (Birge,
Keutmann, Cuatrecasas, & Whedon, 1967; Newcomer, Hodgson, Douglas,
& Thomas, 1978). In a related vein, it was proposed in 1979 that as N-
acetyltransferase pathways are involved in the detoxification of arylamine,
a potential bladder carcinogen, the observation of increased bladder-
cancer risk among people with genetically determined slow-acetylator phe-
notype provided evidence that arylamines are involved in the etiology of
the disease (Lower et al., 1979).
Since these early studies various commentators have pointed out that
the association of genetic variants of known function with disease out-
comes provides evidence about etiological factors (McGrath, 1999; Ames,
1999; Rothman et al., 2001; Brennan, 2002; Kelada, Eaton, Wang, Rothman,
& Khoury, 2003). However, these commentators have not emphasized the
key strengths of Mendelian randomization- the avoidance of confounding,
the avoidance of bias due to reverse causation and reporting tendency,
and correction for the underestimation of risk associations due to variability
in behaviors and phenotypes (Davey Smith & Ebrahim, 2004).
These key concepts were present in Martijn Katan’s 1986 Lancet letter,
in which he suggested that genetic variants related to cholesterol level
could be used to investigate whether the observed association between
low cholesterol and increased cancer risk was real, and by Honkanen and
colleagues’ (1996) understanding of how lactase persistence could better
characterize the difficult-to-measure environmental influence of calcium
intake than could direct dietary reports. Since 2000 there have been several
reports using the term Mendelian randomization in the way it is used here
(Youngman et al., 2000; Fallon, Ben-Shlomo, & Davey Smith, 2001;
Clayton & McKeigue, 2001; Keavney, 2002; Davey Smith & Ebrahim,
2003), and its use is becoming widespread.
30
20
10
0
2*2/2*2 2*2/2*1 1*1/1*1
ALDH2 Genotype
Age 70 Smoker
70
60
Percentage
60 50
Years
40
50
30
40 20
2*2/2*2 2*2/2*1 1*1/1*1 2*2/2*2 2*2/2*1 1*1/1*1
65
60
55
HDL mg/dl
50
45
40
35
2*2/2*2 2*2/2*1 1*1/1*1
Figure 9.6 a Relationship between alcohol intake and ALDH2 genotype. b Relationship
between characteristics and ALDH2 genotype. c Relationship between HDL cholesterol
and ALDH2 genotype. From ‘‘Aldehyde dehydrogenase 2 gene is a risk factor for myo-
cardial infarction in Japanese men,’’ by S. Takagi, N. Iwai, R. Yamauchi, S. Kojima,
S. Yasuno, T. Baba, et al., 2002, Hypertension Research, 25, 677–681.
9 Obtaining Robust Causal Evidence From Observational Studies 221
Odds radio in
Study ID Hypertension (95% CI)
12vs22 (Male)
Amamoto et al, 2002 [18] 1.67 (0.92, 3.03)
Iwai et al, 2004 [31] 1.57 (0.90, 2.72)
Saito et al, 2003 [28] 2.84 (0.79, 10.15)
Subtotal (I2=0.0%, p=0.701) 1.72 (1.17, 2.52)
11vs22 (Male)
Amamoto et al, 2002 [18] 2.50 (1.38, 4.54)
Iwai et al, 2004 [31] 2.02 (1.17, 3.47)
Saito et al, 2003 [28] 4.62 (1.31, 16.25)
Subtotal (I2 =0.0%, p =0.482) 2.42 (1.66, 3.55)
.6 .8 1 2 4 8 16
Figure 9.7 Forest plot of studies of ALDH2 genotype and hypertension. From L Chen
et al. 2008.
Intermediate Phenotypes
Genetic variants can influence circulating biochemical factors such as cho-
lesterol, homocysteine, and fibrinogen levels. This provides a method for
assessing causality in associations between these measures (intermediate phe-
notypes) and disease and, thus, whether interventions to modify the inter-
mediate phenotype could be expected to influence disease risk.
Mean differnce
Study ID DBP in mmHg (95% CI)
12vs22 (Male)
Amamoto et al., 2002 [18] 2.70 (−0.30, 5.70)
Saito et al., 2003 [28] 3.90 (−0.95, 8.75)
Takagi et al., 2001 [19] 1.00 (−0.75, 2.75)
Tsuritani et al., 1995 [20] 2.10 (−2.29, 6.49)
Yamada et al., 2002 [29] 0.70 (−3.17, 4.57)
Subtotal (I2 =0.0%, p=0.720) 1.58 (0.29, 2.87)
11vs22 (Male)
Amamoto et al, 2002 [18] 4.40 (1.36, 7.44)
Saito et al., 2003 [28] 7.10 (2.36, 11.84)
Takagi et al., 2001 [19] 3.10 (1.35, 4.85)
Tsuritani et al., 1995 [20] 5.80 (1.50, 10.10)
Yamada et al., 2002 [29] 3.80 (0.01, 7.59)
Subtotal (I2 =0.0%, p =0.492) 3.95 (2.66, 5.24)
−4 −2 0 4 8 12
Mean differnce of
Study ID SBP in mmHg (95% CI)
12vs22 (Male)
Amamoto et al., 2002 [18] 6.00 (1.30, 10.70)
Saito et al., 2003 [28] 9.40 (2.75, 16.05)
Takagi et al., 2001 [19] 2.20 (−1.05, 5.45)
Tsuritani et al., 1995 [20] 3.10 (−3.18, 9.38)
Yamada et al., 2002 [29] 4.80 (0.21, 9.39)
Subtotal (I2 =12.1%, p=0.336) 4.24 (2.18, 6.31)
11vs22 (Male)
Amamoto et al., 2002 [18] 8.40 (3.67, 13.13)
Saito et al., 2003 [28] 13.90 (7.35, 20.45)
Takagi et al., 2001 [19] 5.90 (2.65, 9.15)
Tsuritani et al., 1995 [20] 6.80 (0.54, 13.06)
Yamada et al., 2002 [29] 6.80 (2.32, 11.28)
Subtotal (I2 =18.0%, p =0.300) 7.44 (5.39, 9.49)
−4 −2 0 4 8 12 16 20
Figure 9.8 Forest plot of studies of ALDH2 genotype and blood pressure. L Chen, G.
Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
odds ratio
Study (95% CI) % Weight
.1 .2 .5 1 2 5
odds ratio
Figure 9.9 Risk of esophageal cancer in individuals with the ALDH2*2*2 vs. ALDH2*1*1
genotype. From ‘‘Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illus-
trates the potentials and limitations of a Mendelian randomization approach,’’ by S. Lewis
& G. Davey Smith, 2005, Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971.
cholesterol had high risk for CHD should have been powerful and convincing
evidence of the causal nature of elevated blood cholesterol in the general
population.
With the advent of effective means of reducing blood cholesterol through
statin treatment, there remains no serious doubt that the cholesterol–CHD
relationship is causal. Among people without CHD, reducing total cholesterol
levels with statin drugs by around 1–1.5 mmol/l reduces CHD mortality
by around 25% over 5 years. Assuming a linear relationship between
blood cholesterol and CHD risk and given the difference in cholesterol of
3.0 mmol/l between people with familial hypercholesterolemia and the gen-
eral population, the RCT evidence on lowering total cholesterol and reducing
CHD mortality would predict a relative risk for CHD of around 2, as opposed
to 3.9, for people with familial hypercholesterolemia. However, the trials also
demonstrate that the relative reduction in CHD mortality increases over time
from randomization—and thus time with lowered cholesterol—as would be
expected if elevated levels of cholesterol operate over decades to influence the
development of atherosclerosis. People with familial hypercholesterolemia
will have had high total cholesterol levels throughout their lives, and this
would be expected to generate a greater risk than that predicted by the results
of lowering cholesterol levels for only 5 years. Furthermore, ecological studies
relating cholesterol levels to CHD demonstrate that the strength of
224 Causality and Psychopathology
Mendelian randomization studies can provide unique insights into the causal
nature of intrauterine environment influences on later disease outcomes. In
such studies, maternal genotype is taken to be a proxy for environmentally
modifiable exposures mediated through the mother that influence the intrau-
terine environment. For example, it is now widely accepted that neural tube
defects can in part be prevented by periconceptual maternal folate supple-
mentation (Scholl and Johnson, 2000. RCTs of folate supplementation have
provided the key evidence in this regard (MRC Vitamin Study Research
Group, 1991; Czeizel & Dudás, 1992). However, could we have reached the
same conclusion before the RCTs were carried out if we had access to evi-
dence from genetic association studies? Studies have looked at the MTHFR
677C!T polymorphism (a genetic variant that is associated with methylte-
trahydrofolate reductase activity and circulating homocysteine levels, the TT
genotype being associated with higher homocysteine levels) in newborns with
neural tube defects compared to controls and have found an increased risk in
TT vs. CC newborns, with a relative risk of 1.75 (95% CI 1.41–2.18) in a
meta-analysis of all such studies (Botto & Yang, 2000). Studies have also
looked at the association between this MTHFR variant in parents and the
risk of neural tube defect in their offspring. Mothers who have the TT
genotype have an increased risk of 2.04 (95% CI 1.49–2.81) of having an
offspring with a neural tube defect compared to mothers who have the CC
genotype (Roseboom et al., 2000). For TT fathers, the equivalent relative risk
is 1.18 (95% CI 0.65–2.12) (Scholl & Johnson, 2000). This pattern of associa-
tions suggests that it is the intrauterine environment—influenced by mater-
nal TT genotype—rather than the genotype of offspring that is related to
disease risk (Figure 9.10). This is consistent with the hypothesis that mater-
nal folate intake is the exposure of importance.
In this case, the findings from observational studies, genetic association
studies, and an RCT are closely similar. Had the technology been available,
the genetic association studies, with the particular influence of maternal
versus paternal genotype on neural tube defect risk, would have provided
strong evidence of the beneficial effect of folate supplementation before the
results of any RCT had been completed, although trials would still have been
necessary to confirm that the effect was causal for folate supplementation.
Certainly, the genetic association studies would have provided better evidence
than that given by conventional epidemiological studies, which would have
had to cope with the problems of accurately assessing diet and the consider-
able confounding of maternal folate intake with a wide variety of lifestyle
and socioeconomic factors that may also influence neural tube defect risk.
226 Causality and Psychopathology
Foetus – TT – inherits 50% from mother and 50% from father – hence
intermediate risk: RR 1.75
The association of genotype with neural tube defect risk does not suggest that
genetic screening is indicated; rather, it demonstrates that an environmental
intervention may benefit the whole population, independent of the genotype
of individuals receiving the intervention.
Studies utilizing maternal genotype as a proxy for environmentally mod-
ifiable influences on the intrauterine environment can be analyzed in a
variety of ways. First, the mothers of offspring with a particular outcome
can be compared to a control group of mothers who have offspring without
the outcome in a conventional case–control design but with the mother as
the exposed individual (or control) rather than the offspring with the parti-
cular health outcome (or the control offspring). Fathers could serve as a
control group when autosomal genetic variants are being studied. If the
exposure is mediated by the mother, maternal genotype, rather than offspring
genotype, will be the appropriate exposure indicator. Clearly, maternal and
offspring genotypes are associated but conditional on each other; it should be
the maternal genotype that shows the association with the health outcome
among the offspring. Indeed, in theory it would be possible to simply com-
pare genotype distributions of mothers and offspring, with a higher preva-
lence among mothers providing evidence that maternal genotype, through an
intrauterine pathway, is of importance. However, the statistical power of such
an approach is low, and an external control group, whether fathers or women
who have offspring without the health outcome, is generally preferable.
The influence of high levels of alcohol intake by pregnant women on the
health and development of their offspring is well recognized for very high
levels of intake, in the form of fetal alcohol syndrome (Burd, 2006). However,
the influence outside of this extreme situation is less easy to assess, particu-
larly as higher levels of alcohol intake will be related to a wide array of
potential sociocultural, behavioral, and environmental confounding factors.
Furthermore, there may be systematic bias in how mothers report alcohol
intake during pregnancy, which could distort associations with health out-
comes. Therefore, outside of the case of very high alcohol intake by mothers,
9 Obtaining Robust Causal Evidence From Observational Studies 227
RCTs are clearly the definitive means of obtaining evidence on the effects of
modifying disease risk processes. There are similarities in the logical struc-
ture of RCTs and Mendelian randomization, however. Figure 9.11 illustrates
this, drawing attention to the unconfounded nature of exposures proxied for
by genetic variants (analogous to the unconfounded nature of a randomized
intervention), the lack of possibility of reverse causation as an influence on
exposure–outcome associations in both Mendelian randomization and RCT
settings, and the importance of intention-to-treat analyses—that is, analysis
by group defined by genetic variant, irrespective of associations between the
genetic variant and the proxied for exposure within any particular individual.
The analogy with RCTs is also useful with respect to one objection that
has been raised for Mendelian randomization studies. This is that the envir-
onmentally modifiable exposure proxied for by the genetic variants (such as
alcohol intake or circulating CRP levels) is influenced by many other factors
in addition to the genetic variants (Jousilahti & Salomaa, 2004). This is, of
course, true. However, consider an RCT of blood pressure–lowering medica-
tion. Blood pressure is influenced mainly by factors other than taking blood
pressure–lowering medication—obesity, alcohol intake, salt consumption and
Mendelian Randomized
randomization controlled trial
Random segregation of
Randomization method
alleles
Confounders Confounders
equal between equal between
groups groups
other dietary factors, smoking, exercise, physical fitness, genetic factors, and
early-life developmental influences are all of importance. However, the ran-
domization that occurs in trials ensures that these factors are balanced
between the groups that receive the blood pressure–lowering medication
and those that do not. Thus, the fact that many other factors are related to
the modifiable exposure does not vitiate the power of RCTs; neither does it
vitiate the strength of Mendelian randomization designs.
A related objection is that the genetic variants often explain only a trivial
proportion of the variance in the environmentally modifiable risk factor that
is being proxied for (Glynn, 2006). Again, consider an RCT of blood pressure–
lowering medication where 50% of participants received the medication
and 50% received a placebo. If the antihypertensive therapy reduced blood
pressure (BP) by a quarter of a standard deviation (SD), which is approxi-
mately the situation for such pharmacotherapy, then within the whole study
group treatment assignment (i.e., antihypertensive use vs. placebo) will
explain less than 2% of the variance in blood pressure. In the example of
CRP haplotypes used as instruments for CRP levels, these haplotypes explain
1.66% of the variance in CRP levels in the population (Lawlor et al., 2008).
As can be seen, the quantitative association of genetic variants as instru-
ments can be similar to that of randomized treatments with respect to the
biological processes that such treatments modify. Both logic and quantifica-
tion fail to support criticisms of the Mendelian randomization approach
based on either the obvious fact that many factors influence most pheno-
types of interest or the fact that particular genetic variants account for only a
small proportion of variance in the phenotype.
Geneotype
Exposure Outcome
Confounders; reverse
causation; bias
Alcohol-BP
effect (95% CI)
Diastolic:
Amamoto et al., 2002 [18] 0.17 (0.06, 0.28)
Takagi et al., 2001 [19] 0.15 (0.08, 0.22)
Tsuritani et al., 1995 [20] 0.16 (0.07, 0.26)
Subtotal (I2 = 0.0%, p = 0.970) 0.16 (0.11, 0.21)
Systolic:
Amamoto et al., 2002 [18] 0.29 (0.12, 0.47)
Takagi et al., 2001 [19] 0.28 (0.16, 0.40)
Tsuritani et al., 1995 [20] 0.18 (0.05, 0.31)
Subtotal (I2 = 0.0%, p = 0.439) 0.24 (0.16, 0.32)
0 .1 .2 .3 .4 .5
Figure 9.13 Instrumental variable estimates of difference in systolic and diastolic blood
pressure produced by 1g per day hyper alcohol intake.
the medical literature (Colhoun, McKeigue, & Davey Smith, 2003). The pre-
sence or absence of statistical interactions depends upon the scale (e.g., linear
or logarithmic with respect to the exposure–disease outcome), and the mean-
ing of observed deviation from either an additive or a multiplicative model is
not clear. Furthermore, the biological implications of interactions (however
defined) are generally uncertain (Thompson, 1991). Mendelian randomization
is most powerful when studying modifiable exposures that are difficult to
measure and/or considerably confounded, such as dietary factors. Given mea-
surement error—particularly if this is differential with respect to other factors
influencing disease risk—interactions are both difficult to detect and often
misleading when, apparently, they are found (Clayton & McKeigue, 2001).
The situation is perhaps different with exposures that differ qualitatively
rather than quantitatively between individuals. Consider the issue of the
influence of smoking tobacco on bladder-cancer risk. Observational studies
suggest an association, but clearly confounding and a variety of biases could
generate such an association. The potential carcinogens in tobacco smoke of
relevance to bladder-cancer risk include aromatic and heterocyclic amines,
which are detoxified by N-acetyltransferase 2 (NAT2). Genetic variation in
NAT2 enzyme levels leads to slower or faster acetylation states. If the carci-
nogens in tobacco smoke do increase the risk of bladder cancer, then it would
be expected that slow acetylators, those who have a reduced rate of detoxifica-
tion of these carcinogens, would be at an increased risk of bladder cancer if
they were smokers, whereas if they were not exposed to these carcinogens
232 Causality and Psychopathology
(and the major exposure route for those outside of particular industries is
through tobacco smoke), then an association of genotype with bladder-cancer
risk would not be anticipated. Table 9.3 tabulates findings from a large study
reported in a way that allows analysis of this simple hypothesis (Gu, Liang,
Wang, Lu, & Wu, 2005). As can be seen, the influence of the NAT2 slow-
acetylation genotype is appreciable only among those also exposed to heavy
smoking. Since the genotype will be unrelated to confounders, it is difficult
to reason why this situation should arise unless smoking is a causal factor
with respect to bladder cancer. Thus, the presence of a sizable effect of
genotype in the exposed group but not in the unexposed group provides
evidence as to the causal nature of the environmentally modifiable risk
factor—in this example, smoking. It must be recognized, however, that
gene by environment interactions interpreted within the Mendelian randomi-
zation framework as evidence regarding the causal nature of environmentally
modifiable exposures are not protected from confounding to the extent that
main genetic effects are. In the NAT2/smoking/bladder cancer example any
factor related to smoking—such as social class—will tend to show a greater
association with bladder cancer within NAT2 slow acetylators than within
NAT2 rapid acetylators. Because there is not a one-to-one association of
social class with smoking, this will not produce the qualitative interaction
of essentially no effect of the genotype in one exposure stratum and an effect
in the other, as in the NAT2/smoking interaction, but rather a qualitative
interaction of a greater effect of NAT2 in the poorer social classes (among
whom smoking is more prevalent) and a smaller (but still evident) effect
in the better-off social classes, among whom smoking is less prevalent.
Thus, situations in which both the biological basis of an expected interaction
is well understood and a qualitative (effect vs. no effect) interaction may be
anticipated are the ones that are most amenable to interpretations related to
the general causal nature of the environmentally modifiable risk factor.
Table 9.3 NAT2 (Slow vs. Fast Acetylator) risk, stratified by smoking status and Bladder
Cancer
Linkage Disequilibrium
and this may mean they proxy for more than one environmentally modifi-
able risk factor. This can be the case through multiple effects mediated by
their RNA expression or protein coding, through alternative splicing, where
one polymorphic region contributes to alternative forms of more than one
protein (Glebart, 1998), or through other mechanisms. The most robust
interpretations will be possible when the functional polymorphism appears
to directly influence the level of the intermediate phenotype of interest (as in
the CRP example), but such examples are probably going to be less common
in Mendelian randomization than cases where the polymorphism can influ-
ence several systems, with different potential interpretations of how the
effect on outcome is generated.
60
Women
Men
50
Alcohol g/day
40
30
20
10
0
*1*1 *1*2 *2*2
Figure 9.14 ALDH2 genotype by alcohol consumption (g/day): five studies, n = 6,815.
From ‘‘Alcohol intake and blood pressure: A systematic review implementing
Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, &
S. Lewis, 2008, PLoS Medicine, 5, e52.
236 Causality and Psychopathology
22vs11 (Male)
Saito et al., 2003 −13.90 (−20.45, −7.35)
Tsuritani et al., 1955 −6.80 (−13.06, −0.54)
Amamoto et al., 2002 −8.40 (−13.13, −3.67)
Yamada et al., 2002 −6.80 (−11.28, −2.32)
Takagi et al., 2001 −5.90 (−9.15, −2.65)
Subtotal (I-squared = 18.0%, p = 0.300) −7.44 (−9.49, −5.39)
22vs11 (Female)
Amamoto et al., 2002 0.90 (-3.33, 5.13)
Takagi et al., 2001 0.10 (-3.07, 3.27)
Subtotal (I-squared = 0.0%, p = 0.767) 0.39 (-2.15, 2.93)
Figure 9.15 ALDH2 genotype and systolic blood pressure. From ‘‘Alcohol intake and blood
pressure: A systematic review implementing Mendelian randomization approach,’’ by L.
Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
The interpretation of findings from studies that appear to fall within the
Mendelian randomization remit can often be complex, as has been previously
discussed with respect to MTHFR and folate intake (Davey Smith & Ebrahim,
2003). As a second example, consider the association of extracellular super-
oxide dismutase (EC-SOD) and CHD. EC-SOD is an extracellular scavenger
of superoxide anions, and thus, genetic variants associated with higher cir-
culating EC-SOD levels might be considered to mimic higher levels of anti-
oxidants. However, findings are dramatically opposite to this—bearers of such
variants have an increased risk of CHD (Juul et al., 2004). The explanation of
this apparent paradox may be that the higher circulating EC-SOD levels
associated with the variant arises from movement of EC-SOD from arterial
walls; thus, the in situ antioxidative properties of these arterial walls are lower
in individuals with the variant associated with higher circulating EC-SOD.
The complexity of these interpretations—together with their sometimes spec-
ulative nature—detracts from the transparency that otherwise makes
Mendelian randomization attractive.
box 9.3
Meiotic Randomization in Animal Studies
There are many cogent critiques of genetic reductionism and the overselling
of ‘‘discoveries’’ in genetics that reiterate obvious truths so clearly (albeit
somewhat repetitively) that there is no need to repeat them here (e.g.,
Berkowitz, 1996; Baird, 2000; Holtzman, 2001; Strohman, 1993; Rose,
1995). Mendelian randomization does not depend upon there being ‘‘genes
for’’ particular traits and certainly not in the strict sense of a gene for a trait
being one that is maintained by selection because of its causal association
with that trait (Kaplan & Pigliucci, 2001). The association of genotype and
the environmentally modifiable factor that it proxies for will be like most
genotype–phenotype associations, one that is contingent and cannot be
reduced to individual-level prediction but within environmental limits will
pertain at a group level (Wolf, 1995). This is analogous to an RCT of anti-
hypertensive agents, where at the collective level the group randomized to
active medication will have lower mean blood pressure than the group ran-
domized to placebo but at the individual level many participants randomized
to active treatment will have higher blood pressure than many individuals
randomized to placebo. Indeed, in the phenocopy/genocopy example of pel-
lagra and Hartnup disease discussed in Box 9.1, only a minority of the
Hartnup gene carriers develop symptoms but at the group level they have
a much greater tendency for such symptoms and a shift in amino acid levels
that reflects this (Scriver, Mahon, & Levy, 1987; Scriver, 1988). These group-
level differences are what creates the analogy between Mendelian randomiza-
tion and RCTs, outlined in Figure 9.11.
Finally, the associations that Mendelian randomization depend upon do
need to pertain to a definable group at a particular time but do not need to be
immutable. Thus, ALDH2 variation will not be related to alcohol consump-
tion in a society where alcohol is not consumed, and the association will vary
by gender and by cultural group and may change over time (Higuchi et al.,
1994; Hasin et al., 2002). Within the setting of a study of a well-defined
group, however, the genotype will be associated with group-level differences
in alcohol consumption and group assignment will not be associated with
confounding variables.
involved in only a small subset of all cases of such diseases’’ and (2) that in
any case ‘‘while the concept of attributable risk is an important one for
evaluating the impact of removable environmental factors, for non-removable
genetic risk factors, it is a moot point.’’ These evaluations of the role of
genetic epidemiology are not relevant when considering the potential contri-
butions of Mendelian randomization. This approach is not concerned with
the population attributable risk of any particular genetic variant but the
degree to which associations between the genetic variant and disease out-
comes can demonstrate the importance of environmentally modifiable factors
as causes of disease, for which the population attributable risk is of relevance
to public-health prioritization. Consider, for example, the case of familial
hypercholesterolemia or familial defective apo B. The genetic mutations asso-
ciated with these conditions will account for only a trivial percentage of cases
of CHD within the population (i.e., the population attributable risk will be
low). For example, in a Danish population, the frequency of familial defective
apo B is 0.08% and, despite its sevenfold increased risk of CHD, will generate
a population attributable risk of only 0.5% (Tybjaerg-Hansen, Steffensen,
Meinertz, Schnohr, & Nordestgaard, 1998). However, by identifying blood
cholesterol levels as a causal factor for CHD, the triangular association
between genotype, blood cholesterol, and CHD risk identifies an environmen-
tally modifiable factor with a very high population attributable risk—assuming
that 50% of the population have raised blood cholesterol above 6.0 mmol/l
and this is associated with a relative risk of twofold, a population attributable
risk of 33% is obtained. The same logic applies to the other examples—the
attributable risk of the genotype is low, but the population attributable risk of
the modifiable environmental factor identified as causal through the geno-
type–disease associations is large. The same reasoning applies when consid-
ering the suggestion that since genotype cannot be modified, genotype–
disease associations are not of public-health importance (Terwilliger &
Weiss, 2003). The point of Mendelian randomization approaches is not to
attempt to modify genotype but to utilize genotype–disease associations to
strengthen inferences regarding modifiable environmental risks for disease
and then reduce disease risk in the population through applying this
knowledge.
Mendelian randomization differs from other contemporary approaches to
genetic epidemiology in that its central concern is not with the magnitude of
genetic variant influences on disease but, rather, with what the genetic asso-
ciations tell us about environmentally modifiable causes of disease. Many
years ago, in this Noble Prize acceptance speech, the pioneering geneticist
Thomas Hunt Morgan contrasted his views with the then popular genetic
approach to disease, eugenics. He thought that ‘‘through public hygiene and
protective measures of various kinds we can more successfully cope with
9 Obtaining Robust Causal Evidence From Observational Studies 243
some of the evils that human flesh is heir to. Medical science will here take
the lead—but I hope that genetics can at times offer a helping hand’’
(Morgan, 1935). More than seven decades later, it might now be time
that genetic research can directly strengthen the knowledge base of public
health.
References
Ames, B. N. (1999). Cancer prevention and diet: Help from single nucleotide poly-
morphisms. Proceedings of the National Academy of Sciences USA, 96, 12216–12218.
Baird, P. (2000). Genetic technologies and achieving health for populations.
International Journal of Health Services, 30, 407–424.
Baron, D. N., Dent, C. E., Harris, H., Hart, E. W., & Jepson, J. B. (1956). Hereditary
pellagra-like skin rash with temporary cerebellar ataxia, constant renal amino-
aciduria, and other bizarre biochemical features. Lancet, 268, 421–429.
Berkowitz, A. (1996). Our genes, ourselves? Bioscience, 46, 42–51.
Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital
data. Biometric Bulletin, 2, 47–53.
Bhatti, P., Sigurdson, A. J., Wang, S. S., Chen, J., Rothman, N., Hartge, P., et al.
(2005). Genetic variation and willingness to participate in epidemiological research:
Data from three studies. Cancer Epidemiology, Biomarkers and Prevention, 14,
2449–2453.
Birge, S. J., Keutmann, H. T., Cuatrecasas, P., & Whedon, G. D. (1967). Osteoporosis,
intestinal lactase deficiency and low dietary calcium intake. New England Journal of
Medicine, 276, 445–448.
Bolon, B., & Galbreath, E. (2002). Use of genetically engineered mice in drug discovery
and development: Wielding Occam’s razor to prune the product portfolio.
International Journal of Toxicology, 21, 55–64.
Botto, L. D., & Yang, Q. (2000). 5,10-Methylenetetrahydrofolate reductase gene variants
and congenital anomalies: A HuGE review. American Journal of Epidemiology, 151,
862–877.
Bovet, P., & Paccaud, F. (2001). Alcohol, coronary heart disease and public health:
Which evidence-based policy? International Journal of Epidemiology, 30, 734–737.
Brennan, P. (2002). Gene environment interaction and aetiology of cancer: What does
it mean and how can we measure it? Carcinogenesis, 23(3), 381–387.
Broer, S., Cavanaugh, J. A., & Rasko, J. E. J. (2004). Neutral amino acid transport in
epithelial cells and its malfunction in Hartnup disorder. Transporters, 33, 233–236.
Burd, L. J. (2006). Interventions in FASD: We must do better. Child: Care, Health, and
Development, 33, 398–400.
Burr, M. L., Fehily, A. M., Butland, B. K., Bolton, C. H., & Eastham, R. D. (1986).
Alcohol and high-density-lipoprotein cholesterol: A randomized controlled trial.
British Journal of Nutrition, 56, 81–86.
Cardon, L. R., & Bell, J. I. (2001). Association study designs for complex diseases.
Nature Reviews: Genetics, 2, 91–99.
Casas, J. P., Shah, T., Cooper, J., Hawe, E., McMahon, A. D., Gaffney, D., et al. (2006).
Insight into the nature of the CRP–coronary event association using Mendelian
randomization. International Journal of Epidemiology, 35, 922–931.
244 Causality and Psychopathology
Chao, Y.-C., Liou, S.-R., Chung, Y.-Y., Tang, H.-S., Hsu, C.-T., Li, T.-K., et al. (1994).
Polymorphism of alcohol and aldehyde dehydrogenase genes and alcoholic cirrhosis
in Chinese patients. Hepatology, 19, 360–366.
Chen, L., Davey Smith, G., Harbord, R., & Lewis, S. (2008). Alcohol intake and blood
pressure: A systematic review implementing Mendelian randomization approach.
PLoS Medicine, 5, e52.
Cheverud, J. M. (1988). A comparison of genetic and phenotypic correlations.
Evolution, 42, 958–968.
Clayton, D., & McKeigue, P. M. (2001). Epidemiological methods for studying genes
and environmental factors in complex diseases. Lancet, 358, 1356–1360.
Colhoun, H., McKeigue, P. M., & Davey Smith, G. (2003). Problems of reporting
genetic associations with complex outcomes. Lancet, 361, 865–872.
Correns, C. (1900). G. Mendel’s Regel über das Verhalten der Nachkommenschaft
der Bastarde. Berichte der Deutschen Botanischen Gesellschaft, 8, 158–168. (English
translation, Correns, C. [1966]. G. Mendel’s law concerning the behavior of
progeny of varietal hybrids. In Stern and Sherwood [pp. 119–132]. New York:
W. H. Freeman.)
Czeizel, A. E., & Dudás, I. (1992). Prevention of the first occurrence of neural-tube
defects by periconceptional vitamin supplementation. New England Journal of
Medicine, 327, 1832–1835.
Danesh, J., Wheller, J. B., Hirschfield, G. M., Eda, S., Eriksdottir, G., Rumley, A., et al.
(2004). C-reactive protein and other circulating markers of inflammation in the predic-
tion of coronary heart disease. New England Journal of Medicine, 350, 1387–1397.
Davey Smith, G. (2006). Cochrane Lecture. Randomised by (your) god: Robust infer-
ence from an observational study design. Journal of Epidemiology and Community
Health, 60, 382–388.
Davey Smith, G., & Ebrahim, S. (2002). Data dredging, bias, or confounding
[Editorial]. British Medical Journal, 325, 1437–1438.
Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic
epidemiology contribute to understanding environmental determinants of disease?
International Journal of Epidemiology, 32, 1–22.
Davey Smith, G., & Ebrahim, S. (2004). Mendelian randomization: Prospects, poten-
tials, and limitations. International Journal of Epidemiology, 33, 30–42.
Davey Smith, G., & Ebrahim, S. (2005). What can Mendelian randomization tell us
about modifiable behavioural and environmental exposures. British Medical Journal,
330, 1076–1079.
Davey Smith, G., Harbord, R., Milton, J., Ebrahim, S., & Sterne, J. A. C. (2005a). Does
elevated plasma fibrinogen increase the risk of coronary heart disease? Evidence
from a meta-analysis of genetic association studies. Arteriosclerosis, Thrombosis, and
Vascular Biology, 25, 2228–2233.
Davey Smith, G., & Hart, C. (2002). Lifecourse socioeconomic and behavioural influ-
ences on cardiovascular disease mortality: The Collaborative study. American Journal
of Public Health, 92, 1295–1298.
Davey Smith, G., Lawlor, D. A., Harbord, R., Timpson, N. J., Day, I., & Ebrahim, S.
(2008). Clustered environments and randomized genes: A fundamental distinction
between conventional and genetic epidemiology. PLoS Medicine, 4, 1985–1992.
Davey Smith, G., Lawlor, D., Harbord, R., Timpson, N., Rumley, A., Lowe, G., et al.
(2005b). Association of C-reactive protein with blood pressure and hypertension:
Lifecourse confounding and Mendelian randomization tests of causality.
Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 1051–1056.
9 Obtaining Robust Causal Evidence From Observational Studies 245
Davey Smith, G., & Phillips, A. N. (1996). Inflation in epidemiology: ‘‘The proof and
measurement of association between two things’’ revisited. British Medical Journal,
312, 1659–1661.
Davey Smith, G., Timpson, N. & Ebrahim, S. (2008). Strengthening causal inference
in cardiovascular epidemiology through Mendelian randomization. Annals of
Medicine, 40, 524–541.
Debat, V., & David, P. (2001). Mapping phenotypes: Canalization, plasticity and devel-
opmental stability. Trends in Ecology and Evolution, 16, 555–561.
Delanghe, J., Langlois, M., Duprez, D., De Buyzere, M., & Clement, D. (1999).
Haptoglobin polymorphism and peripheral arterial occlusive disease. Atherosclerosis,
145, 287–292.
Ebrahim, S., & Davey Smith, G. (2008). Mendelian randomization: Can genetic epi-
demiology help redress the failures of observational epidemiology? Human Genetics,
123, 15–33.
Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A., & Hennekens, C. H. (2004).
Randomized trials of vitamin E in the treatment and prevention of cardiovascular
disease. Archives of Internal Medicine, 164, 1552–1556.
Enomoto, N., Takase, S., Yasuhara, M., & Takada, A. (1991). Acetaldehyde metabolism
in different aldehyde dehydrogenase-2 genotypes. Alcoholism, Clinical and
Experimental Research, 15, 141–144.
Erichsen, H. C., Eck, P., Levine, M., & Chanock, S. (2001). Characterization of the
genomic structure of the human vitamin C transporter SVCT1 (SLC23A2). Journal of
Nutrition, 131, 2623–2627.
Færgeman, O. (2003). Coronary artery disease: Genes drugs and the agricultural connec-
tion. Amsterdam: Elsevier.
Fallon, U. B., Ben-Shlomo, Y., & Davey Smith, G. (2001, March 14). Homocysteine and
coronary heart disease. Heart. http://heart.bmjjournals.com/cgi/eletters/85/2/153
Garry, D. J., Ordway, G. A., Lorenz, J. N., Radford, E. R., Chin, R. W., Grange, R., et al.
(1998). Mice without myoglobulin. Nature, 395, 905–908.
Gause, G. F. (1942). The relation of adaptability to adaption. Quarterly Review of
Biology, 17, 99–114.
Gemma, S., Vichi, S., & Testai, E. (2007). Metabolic and genetic factors contributing to
alcohol induced effects and fetal alcohol syndrome. Neuroscience and Biobehavioral
Reviews, 31, 221–229.
Gerlai, R. (2001). Gene targeting: Technical confounds and potential solutions in
behavioural and brain research. Behavioural Brain Research, 125, 13–21.
Gibson, G., & Wagner, G. (2000). Canalization in evolutionary genetics: A stabilizing
theory? BioEssays, 22, 372–380.
Glebart, W. M. (1998). Databases in genomic research. Science, 282, 659–661.
Glynn, R. K. (2006). Genes as instruments for evaluation of markers and causes
[Commentary]. International Journal of Epidemiology, 35, 932–934.
Goldschmidt, R. B. (1938). Physiological genetics. New York: McGraw-Hill.
Gray, R., & Wheatley, K. (1991). How to avoid bias when comparing bone
marrow transplantation with chemotherapy. Bone Marrow Transplantation, 7(Suppl.
3), 9–12.
Gu, J., Liang, D., Wang, Y., Lu, C., & Wu, X. (2005). Effects of N-acetyl transferase 1
and 2 polymorphisms on bladder cancer risk in Caucasians. Mutation Research, 581,
97–104.
Gu, Z., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W., & Li, W.-H. (2003). Role of
duplicate genes in genetic robustness against null mutations. Nature, 421:63–66.
246 Causality and Psychopathology
Gutjahr, E., Gmel, G., & Rehm, J. (2001). Relation between average alcohol consump-
tion and disease: An overview. European Addiction Research, 7, 117–127.
Guy, J. T. (1993). Oral manifestations of systematic disease. In C. W. Cummings, J.
Frederick, L. Harker, C. Krause, & D. Schuller (Eds.), Otolaryngology—head and neck
surgery (Vol. 2). St. Louis: Mosby Year Book.
Han, T. S., Sattar, N., Williams, K., Gonzalez-Villalpando, C., Lean, M. E., & Haffner,
S. M. (2002). Prospective study of C-reactive protein in relation to the development
of diabetes and metabolic syndrome in the Mexico City Diabetes Study. Diabetes
Care, 25, 2016–2021.
Hart, C., Davey Smith, G., Hole, D., & Hawthorne, V. (1999). Alcohol consumption
and mortality from all causes, coronary heart disease, and stroke: Results from a
prospective cohort study of Scottish men with 21 years of follow up. British Medical
Journal, 318, 1725–1729.
Hartman, J. L., Garvik, B., & Hartwell, L. (2001). Principles for the buffering of genetic
variation. Science, 291, 1001–1004.
Hasin, D., Aharonovich, E., Liu, X., Mamman, Z., Matseoane, K., Carr, L., et al. (2002).
Alcohol and ADH2 in Israel: Ashkenazis, Sephardics, and recent Russian immi-
grants. American Journal of Psychiatry, 159(8), 1432–1434.
Haskell, W. L., Camargo, C., Williams, P. T., Vranizan, K. M., Krauss, R. M., Lindgren,
F. T., et al. (1984). The effect of cessation and resumption of moderate alcohol
intake on serum high-density-lipoprotein subfractions. New England Journal of
Medicine, 310, 805–810.
Heart Protection Study Collaborative Group. (2002). MRC/BHF Heart Protection Study
of antioxidant vitamin supplementation in 20536 high-risk individuals: A rando-
mised placebo-controlled trial. Lancet, 360, 23–33.
Higuchi, S., Matsuushita, S., Imazeki, H., Kinoshita, T., Takagi, S., & Kono, H. (1994).
Aldehyde dehydrogenase genotypes in Japanese alcoholics. Lancet, 343, 741–742.
Hirschfield, G. M., & Pepys, M. B. (2003). C-reactive protein and cardiovascular dis-
ease: New insights from an old molecule. Quarterly Journal of Medicine, 9, 793–807.
Holtzman, N. A. (2001). Putting the search for genes in perspective. International
Journal of Health Services, 31, 445.
Honkanen, R., Pulkkinen, P., Järvinen, R., Kröger, H., Lindstedt, K., Tuppurainen, M.,
et al. (1996). Does lactose intolerance predispose to low bone density? A population-
based study of perimenopausal Finnish women. Bone, 19, 23–28.
Hornstein, E., & Shomron, N. (2006). Canalization of development by microRNAs.
Nature Genetics, 38, S20–S24.
Hu, F. B., Meigs, J. B., Li, T. Y., Rifai, N., & Manson, J. E. (2004). Inflammatory
markers and risk of developing type 2 diabetes in women. Diabetes, 53, 693–700.
Jablonka-Tavory, E. (1982). Genocopies and the evolution of interdependence.
Evolutionary Theory, 6, 167–170.
Jacobson, S. W., Carr, L. G., Croxford, J., Sokol, R. J., Li, T. K., & Jacobson, J. L. (2006).
Protective effects of the alcohol dehydrogenase-ADH1B allele in children exposed to
alcohol during pregnancy. Journal of Pediatrics, 148, 30–37.
Jousilahti, P., & Salomaa, V. (2004). Fibrinogen, social position, and Mendelian rando-
misation. Journal of Epidemiology and Community Health, 58, 883.
Juul, K., Tybjaerg-Hansen, A., Marklund, S., Heegaard, N. H. H., Steffensen, R.,
Sillesen, H., et al. (2004). Genetically reduced antioxidative protection and increased
ischaemic heart disease risk: The Copenhagen City Heart Study. Circulation,
109, 59–65.
9 Obtaining Robust Causal Evidence From Observational Studies 247
Kaplan, J. M., & Pigliucci, M. (2001). Genes ‘‘for’’ phenotypes: A modern history view.
Biology and Philosophy, 16, 189–213.
Katan, M. B. (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet,
I, 507–508 (reprinted International Journal of Epidemiology, 2004, 34, 9).
Kathiresan, S., Melander, O., Anevski, D., Guiducci, C., Burtt, N. P., Roos, C., et al.
(2008). Polymorphisms associated with cholesterol and risk of cardiovascular events.
New England Journal of Medicine, 358, 1240–1249.
Keavney, B. (2002). Genetic epidemiological studies of coronary heart disease.
International Journal of Epidemiology, 31, 730–736.
Keavney, B., Danesh, J., Parish, S., Palmer, A., Clark, S., Youngman, L., et al.;
International Studies of Infarct Survival (ISIS) Collaborators. (2006). Fibrinogen
and coronary heart disease: Test of causality by ‘‘Mendelian randomization.’’
International Journal of Epidemiology, 35, 935–943.
Kelada, S. N., Eaton, D. L., Wang, S. S., Rothman, N. R., & Khoury, M. J. (2003). The
role of genetic polymorphisms in environmental health. Environmental Health
Perspectives, 111, 1055–1064.
Khaw, K.-T., Bingham, S., Welch, A., Luben, R., Wareham, N., Oakes, S., et al. (2001).
Relation between plasma ascorbic acid and mortality in men and women in EPIC-
Norfolk prospective study: A prospective population study. Lancet, 357, 657–663.
Kitami, T., & Nadeau, J. H. (2002). Biochemical networking contributes more to
genetic buffering in human and mouse metabolic pathways than does gene duplica-
tion. Nature Genetics, 32, 191–194.
Klatsky, A. L. (2001). Could abstinence from alcohol be hazardous to your health
[Commentary]? International Journal of Epidemiology, 30, 739–742.
Kraut, J. A., & Sachs, G. (2005). Hartnup disorder: Unravelling the mystery. Trends in
Pharmacological Sciences, 26, 53–55.
Langlois, M. R., Delanghe, J. R., De Buyzere, M. L., Bernard, D. R., & Ouyang, J.
(1997). Effect of haptoglobin on the metabolism of vitamin C. American Journal of
Clinical Nutrition, 66, 606–610.
Lawlor, D. A., Davey Smith, G., Kundu, D., Bruckdorfer, K. R., & Ebrahim, S. (2004).
Those confounded vitamins: what can we learn from the differences between obser-
vational versus randomised trial evidence? Lancet, 363, 1724–1727.
Lawlor, D. A., Ebrahim, S., Kundu, D., Bruckdorfer, K. R., Whincup, P. H., & Davey
Smith, G. (2005). Vitamin C is not associated with coronary heart disease risk once
life course socioeconomic position is taken into account: Prospective findings from
the British Women’s Heart and Health Study. Heart, 91, 1086–1087.
Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., & Davey Smith, G.
(2008). Mendelian randomization: Using genes as instruments for making causal
inferences in epidemiology. Statistics in Medicine, 27, 1133–1163.
Leimar, O., Hammerstein, P., & Van Dooren, T. J. M. (2006). A new perspective on
developmental plasticity and the principles of adaptive morph determination.
American Naturalist, 167, 367–376.
Lenz, W. (1973). Phenocopies. Journal of Medical Genetics, 10, 34–48.
Lewis, S., & Davey Smith, G. (2005). Alcohol, ALDH2 and esophageal cancer:
A meta-analysis which illustrates the potentials and limitations of a Mendelian
randomization approach. Cancer Epidemiology, Biomarkers and Prevention, 14,
1967–1971.
Li, R., Tsaih, S. W., Shockley, K., Stylianou, I. M., Wergedal, J., Paigen, B., et al. (2006).
Structural model analysis of multiple quantative traits. PLoS Genetics, 2, 1046–1057.
248 Causality and Psychopathology
Lipp, H. P., Schwegler, H., Crusio, W. E., Wolfer, D. P., Leisinger-Trigona, M. C.,
Heimrich, B., et al. (1989). Using genetically-defined rodent strains for the identi-
fication of hippocampal traits relevant for two-way avoidance behaviour: A non-
invasive approach. Experientia, 45, 845–859.
Little, J., & Khoury, M. J. (2003). Mendelian randomization: A new spin or real pro-
gress? Lancet, 362, 930–931.
Lower, G. M., Nilsson, T., Nelson, C. E., Wolf, H., Gamsky, T. E., & Bryan, G. T. (1979).
N-Acetylransferase phenotype and risk in urinary bladder cancer: Approaches in
molecular epidemiology. Environmental Health Perspectives, 29, 71–79.
MacMahon S, Peto R, Collins R, Godwin J, MacMahon S, Cutler J et al. (1990). Blood
pressure, stroke, and coronary heart disease. The Lancet, 335, 765-774.
Marks, D., Thorogood, M., Neil, H. A. W., & Humphries, S. E. (2003). A review on
diagnosis, natural history and treatment of familial hypercholesterolaemia.
Atherosclerosis, 168, 1–14.
Marmot, M. (2001). Reflections on alcohol and coronary heart disease. International
Journal of Epidemiology, 30, 729–734.
McGrath, J. (1999). Hypothesis: Is low prenatal vitamin D a risk-modifying factor for
schizophrenia? Schizophrenia Research, 40, 173–177.
Memik, F. (2003). Alcohol and esophageal cancer, is there an exaggerated accusation?
Hepatogastroenterology, 54, 1953–1955.
Mendel, G. (1866). Experiments in plant hybridization. Retrieved from http://
www.mendelweb.org/archive/Mendel.Experiments.txt
Millen, A. E., Dodd, K. W., & Subar, A. F. (2004). Use of vitamin, mineral, nonvitamin,
and nonmineral supplements in the United States: The 1987, 1992, and 2000
National Health Interview Survey results. Journal of the American Dietetic
Association, 104, 942–950.
Morange, M. (2001). The misunderstood gene. Cambridge, MA: Harvard University
Press.
Morgan, T. H. (1913). Heredity and sex. New York: Columbia University Press.
Morgan, T. H. (1919). Physical basis of heredity. Philadelphia: J. B. Lippincott.
Morgan, T. H. (1935). The relation of genetics to physiology and medicine. Scientific
Monthly, 41, 5–18.
MRC Vitamin Study Research Group. (1991). Prevention of neural tube defects:
Results of the Medical Research Council vitamin study. Lancet, 338, 131–137.
Mucci, L. A., Wedren, S., Tamimi, R. M., Trichopoulos, D., & Adami, H. O. (2001).
The role of gene–environment interaction in the aetiology of human cancer:
Examples from cancers of the large bowel, lung and breast. Journal of Internal
Medicine, 249, 477–493.
Newcomer, A. D., Hodgson, S. F., Douglas, M. D., & Thomas, P. J. (1978). Lactase
deficiency: Prevalence in osteoporosis. Annals of Internal Medicine, 89, 218–220.
Olby, R. C. (1966). Origins of Mendelism. London: Constable.
Osier, M. V., Pakstis, A. J., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., et al.
(2002). A global perspective on genetic variation at the ADH genes reveals unusual
patterns of linkage disequilibrium and diversity. American Journal of Human
Genetics, 71, 84–99.
Palmer L and Cardon L. (2005). Shaking the tree: Mapping complex disease genes
with linkage disequilibrium. Lancet, 366, 1223–1234.
Perera, F. P. (1997). Environment and cancer: Who are susceptible? Science, 278,
1068–1073.
9 Obtaining Robust Causal Evidence From Observational Studies 249
Pradhan, A. D., Manson, J. E., Rifai, N., Buring, J. E., & Ridker, P. M. (2001).
C-reactive protein, interleukin 6, and risk of developing type 2 diabetes mellitus.
Journal of the American Medical Association, 286, 327–334.
Radimer, K., Bindewald, B., Hughes, J., Ervin, B., Swanson, C., & Picciano, M. F. (2004).
Dietary supplement use by US adults: Data from the National Health and
Nutrition Examination Survey, 1999–2000. American Journal of Epidemiology, 160,
339–349.
Reynolds, K., Lewis, L. B., Nolen, J. D. L., Kinney, G. L., Sathya, B., & He, J. (2003).
Alcohol consumption and risk of stroke: A meta-analysis. Journal of the American
Medical Association, 289, 579–588.
Ridker, P. M., Cannon, C. P., Morrow, D., Rifai, N., Rose, L. M., McCabe, C. H., et al.
(2005). C-reactive protein levels and outcomes after statin therapy. New England
Journal of Medicine, 352, 20–28.
Rimm, E. (2001). Alcohol and coronary heart disease—laying the foundation for future
work [Commentry]. International Journal of Epidemiology, 30, 738–739.
Rimm, E. B., Stampfer, M. J., Ascherio, A., Giovannucci, E., Colditz, G. A., & Willett,
W. C. (1993). Vitamin E consumption and the risk of coronary heart disease in men.
New England Journal of Medicine, 328, 1450–1456.
Roderic, T. H., Wimer, R. E., & Wimer, C. C. (1976). Genetic manipulation of neuroa-
natomical traits. In L. Petrinovich & J. L. McGaugh (Eds.), Knowing, thinking, and
believing. New York: Plenum Press.
Rose, G. (1982). Incubation period of coronary heart disease. British Medical Journal,
284, 1600–1601.
Rose, S. (1995). The rise of neurogenetic determinism. Nature, 373, 380–382.
Roseboom, T. J., van der Meulen, J. H., Osmond, C., Barker, D. J. P., Ravelli, A. C. J.,
Schroeder-Tanka, J. M., et al. (2000). Coronary heart disease after prenatal exposure
to the Dutch famine, 1944–45. Heart, 84, 595–598.
Rothman, N., Wacholder, S., Caporaso, N. E., Garcia-Closas, M., Buetow, K., & Fraumeni,
J. F. (2001). The use of common genetic polymorphisms to enhance the epidemiologic
study of environmental carcinogens. Biochimica et Biophysica Acta, 1471, C1–C10.
Rutherford, S. L. (2000). From genotype to phenotype: Buffering mechanisms and the
storage of genetic information. BioEssays, 22, 1095–1105.
Scholl, T. O., & Johnson, W. G. (2000). Folic acid: Influence on the outcome of
pregnancy. American Journal of Clinical Nutrition, 71(Suppl.), 1295S–1303S.
Scientific Steering Committee on Behalf of the Simon Broome Register Group. (1991).
Risk of fatal coronary heart disease in familial hyper-cholesterolaemia. British
Medical Journal, 303, 893–896.
Scriver, C. R. (1988). Nutrient–gene interactions: The gene is not the disease and vice
versa. American Journal of Clinical Nutrition, 48, 1505–1509.
Scriver, C. R., Mahon, B., & Levy, H. L. (1987). The Hartnup phenotype: Mendelain
transport disorder, multifactorial disease. American Journal of Human Genetics, 40,
401–412.
Sesso, D., Buring, J. E., Rifai, N., Blake, G. J., Gaziano, J. M., & Ridker, P. M. (2003).
C-reactive protein and the risk of developing hypertension. Journal of the American
Medical Association, 290, 2945–2951.
Shaper, A. G. (1993). Alcohol, the heart, and health [Editorial]. American Journal of
Public Health, 83, 799–801.
Shastry, B. S. (1998). Gene disruption in mice: Models of development and disease.
Molecular and Cellular Biochemistry, 181, 163–179.
250 Causality and Psychopathology
Warren, K. R., & Li, T. K. (2005). Genetic polymorphisms: Impact on the risk of fetal
alcohol spectrum disorders. Birth Defects Research A: Clinical and Molecular
Teratology, 73, 195–203.
Weimer, R. E. (1973). Dissociation of phenotypic correlation: Response to posttrial ether-
ization and to temporal distribution of practice trials. Behavior Genetics, 3, 379–386.
Weiss, K., & Terwilliger, J. (2000). How many diseases does it take to map a gene with
SNPs? Nature Genetics, 26, 151–157.
West-Eberhard, M. J. (2003). Developmental plasticity and evolution. New York: Oxford
University Press.
Wheatley, K., & Gray, R. (2004). Mendelian randomization—an update on its use to
evaluate allogeneic stem cell transplantation in leukaemia [Commentary].
International Journal of Epidemiology, 33, 15–17.
Wilkins, A. S. (1997). Canalization: A molecular genetic perspective. BioEssays, 19,
257–262.
Williams, R. S., & Wagner, P. D. (2000). Transgenic animals in integrative biology:
Approaches and interpretations of outcome. Journal of Applied Physiology, 88, 1119–1126.
Wolf, U. (1995). The genetic contribution to the phenotype. Human Genetics, 95, 127–148.
Wright, A. F., Carothers, A. D., & Campbell, H. (2002). Gene–environment interac-
tions—the BioBank UK study. Pharmacogenomics Journal, 2, 75–82.
Wu, T., Dorn, J. P., Donahue, R. P., Sempos, C. T., & Trevisan, M. (2002). Associations
of serum C-reactive protein with fasting insulin, glucose, and glycosylated hemoglo-
bin: The Third National Health and Nutrition Examination Survey, 1988–1994.
American Journal of Epidemiology, 155, 65–71.
Youngman, L. D., Keavney, B. D., Palmer, A., Parish, S., Clark, S., Danesh, J., et al.
(2000). Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial
infarction and in 6002 controls: test of causality by ‘‘Mendelian randomization.’’
Circulation, 102(Suppl. II), 31–32.
Zuckerkandl, E., & Villet, R. (1988). Concentration—affinity equivalence in gene reg-
ulation: Convergence and envirnonmental effects. Proceedings of the National
Academy of Sciences USA, 85, 4784–4788.
10
252
10 Rare Variant Approaches to Neuropsychiatric Disorders 253
Genetic Variation
The search for ‘‘disease genes’’ is more precisely the search for disease-
related genetic variation. Basic instructions are coded in DNA to create and
sustain life; these instructions vary somewhat between individuals, creating a
primary source of human diversity. Variation in these instructions is also
thought to be largely responsible for differences in susceptibility to diseases
influenced by genes.
Concretely, when individuals differ at the level of DNA, it is often with
regard to the sequence of its four constituent parts, called ‘‘nucleotides’’ or
‘‘bases,’’ which make up the DNA code: adenine (A), guanine (G), cytosine
(C), and thymine (T). Indeed, within the human genome, variations at indi-
vidual nucleotides appear quite frequently (approximately 1 in every 1,000
bases) (International Human Genome Sequencing Consortium, 2004; Lander
et al., 2001; McPherson et al., 2001). The vast majority of this variation is
related to an individual’s ethnic origin and has no overt consequence for
human disease. However, it is not known at present what proportion of
the observed differences between individuals either within our outside of
regions of the genome that specify the production of proteins (through the
process of transcription and translation) might confer subtle alterations in
function. At present, while elegant and inventive approaches are being
employed to address the question, particularly with regard to ‘‘noncoding’’
DNA (Noonan, 2009; Prabhakar et al., 2008), the consequences of sequence
variations identified in these regions remain difficult to interpret.
Consequently, while only 2% of the genome is ultimately translated into
protein, it is this subset that is most readily understood with regard to its
impact on a phenotype of interest (International Human Genome
Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001).
The terminology applied to genetic variation may be somewhat confusing
due to a number of redundant or loosely defined terms. While a threshold of
5% is often used as the cutoff for rare variation, many authors also
254 Causality and Psychopathology
distinguish between these and very rare (<1%) alleles. Common variations,
regardless of their impact on gene function, are often referred to as poly-
morphisms or alleles, but both terms are also at times applied to any change in
the genome, regardless of its frequency, that does not appear to be deleter-
ious to the function of the RNA or protein that it encodes. In the ensuing
discussion we use the term polymorphism to refer to common variants. Some
authors refer to rare variants and mutations synonymously. In the current
discussion, mutation will refer to the subcategory of rare variation that is
thought to cause or carry risk for disease.
Several other terms warrant definition here: Common variations at a single
nucleotide are typically referred to as single-nucleotide polymorphisms (SNPs).
For example, if the sequence in a specific region of DNA is ACTCTCCT in
most individuals, but in more than 5% of individuals the same region reads
as ACTCTACT on at least one of a pair of chromosomes carrying this
sequence, this would represent an SNP with the major allele being C and
the minor allele being A. Moreover, the frequency of the ‘‘A’’ would be
referred to as the minor allele frequency. SNPs are thought to be most often
the consequence of a single error in replication of the DNA at some point in
human history that has subsequently spread through the population.
A second form of variation often used in genetic studies involves variable
numbers of DNA repeats: These are short, repetitive sequences of nucleotides
that are prone to instability during DNA replication. This type of variation is
known as a short tandem repeat (STR). In this case, the sequence abbreviated
GTACACAGT found on one chromosome in an individual might be found to
be GTACACACACACACT on a second chromosome or in another individual.
Interestingly, these types of repeats are so frequently prone to change that
there are often multiple forms within a population, but they are not so
changeable as to be likely to undergo expansion among closely related indi-
viduals. These properties, along with the ease of assaying STRs, make them
highly suitable for tracing DNA inheritance from generation to generation, as
will be discussed later.
A third, much more recently appreciated type of variation is known as a
copy number variant (CNV) (Iafrate et al., 2004; Redon et al., 2006; Sebat et al.,
2004). In this case, the structure of chromosomes varies among individuals.
For example, a deletion or duplication of DNA might be present on one
chromosome in an individual but not in another. CNVs specifically refer to
these types of changes that fall below the resolution of the light microscope.
Chromosomal variations that exceed this threshold are now referred to as
‘‘gross’’ cytogenetic deletions, duplications, or rearrangements. The lower size
bound of a CNV and how it is distinguished from a small insertion or
deletion of DNA sequence (called an in/del) vary from author to author,
but a common cutoff is 1,000 base pairs of DNA.
10 Rare Variant Approaches to Neuropsychiatric Disorders 255
Over the last few years it has become clear that CNVs are distributed
throughout the genome, populating even those regions that contain genes
coding for RNAs and proteins. Previously, the finding of a loss of a coding
region in an affected individual was taken as prima facie evidence for a
causal relationship between the variation and the observed phenotype.
However, as microarrays ushered in an era of much higher resolution ana-
lysis of the genome, it has become clear that this conventional wisdom
reflected an implicit, incorrect assumption regarding the ‘‘intactness’’ of
the genome. In fact, it is now clear that widespread copy number variation
is seen among control populations (Redon et al., 2006; Sharp, Cheng, &
Eichler, 2006), requiring more rigorous approaches to demonstrating a rela-
tionship between a structural change in the genome and a clinical outcome.
Finally, an important distinction regarding the distribution of disorder-
related variation is often made: If multiple variations in a single gene lead
to or contribute to an outcome of interest, this is referred to as allelic hetero-
geneity. If variations in many different genes may lead to a single disease or
syndrome, which is referred to as locus heterogeneity (a locus is simply a given
region of the genome). Both are widely observed in human disease and have
been invoked across nearly all complex disorders to help explain the difficul-
ties that have been encountered in clarifying the genetic bases of disease
(Botstein & Risch, 2003).
Irrespective of whether a variant is introduced into the sequence or structure
of the DNA, once it is present in the human genome, natural selection plays a
defining role. Our current understanding of this process is certainly more
nuanced than when first proposed, but the basic notions continue to serve
well to understand the dynamics of variation: Changes that do not impact repro-
ductive fitness in a negative fashion may be readily passed from generation to
generation and, over time, have the potential to become common. Alternatively,
changes that result in impaired fitness are subject to negative or purifying
selection, decreasing the frequency of that allele in the population.
Of course, the impact on fitness is only one of several forces that dictate
the population frequency of a specific genetic variant. A variant newly intro-
duced into the population would most likely be rare, regardless of its func-
tional consequences, given the large number of possible positions for this
change and the lack of time for it to be distributed through the population.
Moreover, the history of specific ethnic groups, including migration patterns
and social norms, can significantly influence the dynamics of particular
genetic variants over time.
With regard to disease risk, it has nonetheless proven quite instructive to
think about the distinction between advantageous or neutral and deleterious
variants. Based on these classical notions, alleles contributing to early-onset
disorders that reduce fertility or indirectly lead to decreased reproductive
256 Causality and Psychopathology
Linkage Studies
In linkage analysis, one seeks to determine if the transmission of any chro-
mosomal segment from one generation to another within a family or families
coincides with the presence of the phenotype of interest. If every chromo-
some (or, in practice, every autosome) is evaluated simultaneously, the study
is referred to as a genomewide linkage scan.
This process of tracing inheritance relies fundamentally on genetic varia-
tion. If every chromosome were identical, it would be impossible to observe
258 Causality and Psychopathology
nearly 1:1) as well as to the fact that fine mapping is more readily accom-
plished in parametric versus nonparametric studies.
Biological studies of the implicated gene in vitro and in vivo, including
modeling the identified human mutations, is another highly desirable avenue
for developing convergent evidence to support a linkage finding. The practical
reality is that in neuropsychiatric disorders the relevant tissue is most often
not accessible for direct study in humans, rendering model systems particu-
larly attractive. Nonetheless, it is important to recall that there are critical
differences between the human and the rodent brain (or fly or worm) and
that these differences may be particularly relevant to the domains of function
that are of most interest in neuropsychiatric disorders. On the one hand, the
demonstration of a ‘‘neural’’ phenotype in an animal carrying a human
mutation or allele may be instructive and often the first step toward the
identification of the relevant highly conserved molecular pathways across
species. However, there have been many instances in which knockouts of
genes recapitulating clearly causal Mendelian mutations in neurodevelop-
mental syndromes have not resulted in phenotypes resembling those found
in humans, suggesting the need for some caution in the interpretation of
model systems data.
Genetic Association
In contrast to linkage studies, association methodologies are ‘‘cross-sectional’’
as they investigate variation across populations as opposed to studying
genetic transmissions within families. In essence, the methodology relies
on a classic case–control design: Genetic variants are identified as the
‘‘exposure,’’ and the allele frequency is compared in affected and unaffected
individuals. It is important to mention here that while case–control analysis
has become the most widely used association strategy of late, there are
variations on this theme, called transmission tests, that rely on a combination
of linkage and association and evaluate parent–child trios. These approaches
have also been quite popular, particularly with regard to pediatric disorders.
Until recently, genetic association studies could feasibly investigate only
one or a small number of known, common genetic polymorphisms in or
near an identified gene(s) of interest. Relative to nonparametric linkage ana-
lyses, the approach is theoretically better able to detect small increments of
risk; but, importantly, it is not able in practice to detect rare variants con-
tributing at a particular locus. Given the popularity of common variant
candidate gene association studies across all of medicine and particularly
in psychiatric genetics, this distinction is quite important: It is not uncom-
mon for either positive or negative results to be reported with respect to a
gene suggesting that it is or is not associated with a disorder. In fact, the
10 Rare Variant Approaches to Neuropsychiatric Disorders 261
studies (GWASs) have become the gold standard for common variant dis-
covery in complex disorders (Hirschhorn & Daly, 2005). These investigations
take advantage of SNPs spaced across the entire genome to conduct associa-
tion studies without the requirement of an a priori hypothesis regarding a
specific gene or genes. This technological advance, in conjunction with a now
sufficiently large collection of patient samples, has led to a spate of studies
that have begun to confirm the clear contribution of common alleles to
common disease (Bilguvar et al., 2008; Hakonarson et al., 2007; Saxena
et al., 2007; Scott et al., 2007; Zeggini & McCarthy, 2007).
Several aspects of these recent findings deserve comment here. First, it is
remarkable to begin to see concrete evidence for the common allele:common
disease hypothesis after years of uncertainty. It is notable, however, that the
scale of the effect of individual alleles identified in recent studies has been
extraordinarily modest, explaining why very large sample sizes have been
required to clarify contradictory results (Altshuler & Daly, 2007). In neurop-
sychiatric genetics specifically, a great deal of effort has been expended trying
to understand inconsistent common variant association findings. These
recent investigations suggest that the simplest answer may suffice: When
the sample size is sufficiently large and genomewide association is employed,
reproducible results will emerge if common variants play a role (Psychiatric
GWAS Consortium Steering Committee, 2009; Ma et al., 2009; McMahon
et al., 2010; Weiss, Arking, Daly, & Chakravarti, 2009). Similarly, the total
amount of individual variation in disease risk accounted for by the identified
common alleles has been surprisingly modest. This underscores the fact that
the contribution of rare variation might help to explain a larger amount of
risk in complex disorders than previously anticipated.
Whether a candidate gene study or GWAS, typically the first evidence for a
probabilistic relationship between a variation and a clinical phenomenon of
interest involves surpassing a preordained statistical threshold. In candidate
gene common variant association, there is not yet complete agreement on
this issue, including regarding how to appropriately correct for multiple
comparisons. The difficulties that have attended replication of studies using
this approach have now led to the general expectation that some type of
internal replication of association will be attempted prior to publication. Of
course, replication in an independent laboratory in a separate sample
remains the gold standard. In addition, as either common variant methodol-
ogy may detect association of an allele that is near, but not directly contained
within, the tested set of alleles, the identification of the ‘‘functional’’ variant
within the association interval is generally considered strong supporting evi-
dence (State, 2006). As noted, associated variants identified in regulatory and
other noncoding sequences mapping very far from known coding regions can
pose significant challenges in this regard. Finally, while statistical thresholds
10 Rare Variant Approaches to Neuropsychiatric Disorders 263
for candidate gene studies remain a matter of debate, there has emerged
something of a consensus regarding appropriate thresholds for GWAS ana-
lyses that are quite stringent and seem to contribute to the markedly
improved reliability of this approach compared to prior generations of
common variant association analysis.
Both GWAS and most candidate gene studies assay known alleles with a
preordained minor allele frequency, typically restricting the analysis to
common variants. In contrast, a mutation burden approach applies association
strategies to rare variants. This is critically important if one desires to test the
hypothesis that rare variants may contribute broadly in the population to the
occurrence of a common complex disorder or phenotype as opposed to explain-
ing Mendelian inheritance within one or a small number of families.
Establishing a population association of rare alleles may in practice be
quite challenging. Taken individually, rare and especially very rare alleles at
a given locus would require sample sizes that could not practically be reached
to achieve a statistically significant result. An alternative method of addres-
sing risk assesses the total amount of rare variation present within a gene or
genes of interest in cases versus controls. The identification of such rare
variations, apart from CNVs (which will be described later, see Cytogenetics
and CNV Detection), requires that individual genes be comprehensively eval-
uated using either direct sequencing or a multistep mutation-screening pro-
cess that identifies sequences containing possible variations followed by
confirmation via sequencing (Abelson et al., 2005).
While intuitively attractive, until quite recently, technological realities have
placed significant limits on mutation burden approaches; detection of pre-
viously unknown rare variants, even via mutation screening, has been many
times more expensive than genotyping of known variants. Consequently, in
practice, the method has been applied only to candidate genes. Nonetheless,
several notable investigations have highlighted the value of these investiga-
tions. For example, Helen Hobbs and colleagues have convincingly demon-
strated that mutations in genes known to be involved in rare forms of
hypocholesterolemia are present in the general population and contribute
to the overall variation in high-density lipoprotein levels (Cohen et al.,
2004). Similarly, recent work from Richard Lifton’s lab has shown a signifi-
cant contribution of rare alleles in genes responsible for rare syndromic
forms of hypotension to blood pressure variation in the general population
(Ji et al., 2008). Moreover, technological advances are promising to vastly
expand the application of these types of approaches. Within the past year,
so-called next-generation sequencing has made the evaluation of all coding
segments of the genome a practical reality (Choi et al., 2009; Ng et al., 2009);
and within a relatively short time frame, whole-genome sequencing promises
to become commonplace.
264 Causality and Psychopathology
Millar et al., 2000; Vorstman et al., 2006). Typically in this approach, the
mapping of a translocation, chromosomal inversion, or deletion is used as a
means of identifying a candidate gene(s), which is then further studied for
rare variants in patients without known chromosomal abnormalities
(Abelson et al., 2005; Jamain et al., 2003).
The most recent advances in cytogenetics have been particularly fascinat-
ing. As previously noted, new technologies have recently led to the discovery
that submicroscopic variations in chromosomal structure are widespread
throughout the genomes of normal individuals. Of note, such CNVs can be
detected using microarrays, including those designed for SNP genotyping,
which currently have a resolution of as small as several hundred bases.
One unexpected consequence of CNV detection has been to cast doubt on
causal inferences associated with previous cytogenetic investigations. As
noted, prior to the discovery of CNVs (particularly their presence in coding
regions of the genome), it was largely presumed that a rearrangement or loss
of genetic material disrupting gene structure was the likely cause of an
observed phenotype. It is now clear that rearrangements may often physically
disrupt genes without overtly negative consequences. Conversely, structural
derangements that do not map to coding regions of the genome have been
known for some time to have deleterious potential (Kleinjan & van
Heyningen, 1998; State et al., 2003).
A final important observation about copy number detection is that it was
the first practical technique that was able to identify both common and rare
changes in chromosomal structure at high resolution on a genomewide scale.
The implications of this technological advance are discussed in more detail in
the following section.
Outlier Approaches
In 2004, the identification of several cases of affected females with deletions
on the X chromosome led Jamain and colleagues (2003) to evaluate genes in
10 Rare Variant Approaches to Neuropsychiatric Disorders 267
the region of these deletions for rare variants among nearly 200 individuals
with ASDs. The authors found a single clearly deleterious mutation in the
gene Neuroligin 4 in one family with two affected males. In a second family,
a rare variant was also found in Neuroligin 3, a closely related molecule on
the X chromosome. This variant was not as unequivocally damaging to pro-
tein function but has subsequently been shown to influence synaptic activity
in mice (Tabuchi et al., 2007). Shortly after the initial report regarding
NLGNs and ASDs, a separate research group used parametric linkage in
an extended family with mental retardation and ASD to identify the same
X-chromosome interval for Neuroligin 4 (Laumonnier et al., 2004). Fine map-
ping of this region showed a unique, highly deleterious mutation in NLGN4
present in every affected family member, consistent with Mendelian expecta-
tions. These two findings represented the first identification of a functional
mutation in cases of idiopathic autism (i.e., autism not accompanied by some
other evidence of genetic syndrome) and the first convincing independent
replication of a genetic finding in ASDs.
Neuroligins are postsynaptic transmembrane neuronal adhesion mole-
cules that interact with neurexins, which are present on the presynaptic
terminal (Lise & El-Husseini, 2006). Subsequent studies have confirmed
that the mutations identified in NLGN4 in humans with ASDs lead to
abnormalities in the specification of excitatory glutamatergic synapses
in vitro as well as to synaptic maturation defects in mice (Chih, Afridi,
Clark, & Scheiffele, 2004; Chih, Engelman, & Scheiffele, 2005; Chih,
Gollan, & Scheiffele, 2006; Varoqueaux et al., 2006), providing important
convergent support for the initial finding. While additional mutation screen-
ings of individuals with ASDs have not led to the characterization of further
clearly functional variants in NLGN4 (Blasi et al., 2006), several recent studies
have provided strong additional evidence for the importance of this finding
through the identification of rare mutations among affected individuals in
molecules that interact directly or indirectly with the NLGN4 protein. These
include SHANK3 (Durand et al., 2007; Moessner et al., 2007) and Neurexin-1
(Kim et al., 2008; Marshall et al., 2008; Szatmari et al., 2007).
Another notable rare variant finding reported in the New England Journal
of Medicine (Strauss et al., 2006) used parametric linkage analysis to identify a
rare homozygous mutation in the gene contactin-associated protein 2
(CNTNAP2) among the Old Order Amish population that led to intractable
seizure, autism, and mental retardation. The study was notable both for the
statistical power due to the inbred nature of this population and for the
availability of pathological brain specimens due to epilepsy surgery performed
on several of the probands. As with NLGN4, CNTNAP2 is a neuronal adhe-
sion molecule (Poliak et al., 1999), and recent work has demonstrated that it
too is present in the synaptic plasma membrane (Bakkaloglu et al., 2008).
268 Causality and Psychopathology
Moreover, two common variant association studies and a rare variant muta-
tion burden analysis have pointed to this molecule as carrying risk for ASDs
(Alarcon et al., 2008; Arking et al., 2008; Bakkaloglu et al., 2008).
These findings raise several important issues with regard to rare variants
and autism. First, they demonstrate the utility of the outlier approach to
provide clues to the pathophysiology of complex disorders. Prior to the iden-
tification of NLGN4, no specific data had implicated a molecular or cellular
mechanism underlying any aspect of idiopathic autism. Subsequently, con-
siderable effort has been aimed at delineating the relationship between
synapse function and ASDs (Zoghbi, 2003). Similarly, the identification and
characterization of CNTNAP2, coupled with the long-standing appreciation of
increased rates of seizures in individuals with autism, has raised considerable
interest in neuronal migration and its potential contribution to ASDs.
These findings also point to some of the challenges of demonstrating
causality when rare events are being investigated. In the initial identification
of NLGN, the link between the observed mutation and the observed pheno-
type was not inferred due to statistical evidence but, rather, to the specifics of
the gene itself and the nature of the observed mutation. In this case, the fact
that the gene is located on the X chromosome (thus, only one copy is present
in males) and the mutation is clearly deleterious to the formation of protein
product led the authors to conclude that the rare variant and the autism
phenotype must be related. While they were able to show in their small
family that transmission of the mutation was consistent with Mendelian
expectations, there were not sufficient observations to support this finding
with statistical analysis, nor was there a sufficient number of rare variants
identified to conduct a mutation burden study (Laumonnier et al., 2004).
Nonetheless, the nature of the NLGN4X mutation, its recapitulation
in vitro and in model systems, the independent replication using an alter-
native method (parametric linkage) in a separate family, and the identification
of additional rare variants in a molecular pathway specified by NLGN4
together strongly support the relevance of this molecule for ASDs. The
rarity of the clearly deleterious variants among affected individuals and the
finding that mutations in NGLN4 do not always result in observable pathol-
ogy (Macarov et al., 2007) are reminders that even highly penetrant mutations
do not always lead to the phenotype of interest and that rare variant discovery
may ultimately be extraordinarily valuable even if the initial observations
remain restricted to one or an extremely small number of events.
of affected individuals. As noted, the disorder has an early onset and affects
the fundamental ability of individuals to make and keep social relationships;
additionally, the monozygotic–dizygotic concordance difference is consistent
with a considerable burden of new (and therefore rare) variation. Moreover,
there is consistent evidence that autism incidence increases with paternal age
(Cantor, Yoon, Furr, & Lajonchere, 2007; Reichenberg et al., 2006), as does the
burden of de novo mutation (Crow, 2000).
Perhaps most importantly, there is long-standing evidence that individuals
with autism are many times more likely than normally developing controls to
carry rare (now considered) gross microscopic chromosomal abnormalities,
including de novo rearrangements (Bugge et al., 2000; Wassink, Piven, &
Patil, 2001). In 2007, Jonathan Sebat and colleagues at Cold Spring Harbor
provided dramatic confirmation of the importance of individually rare cyto-
genetic events in ASD when they evaluated patients with autism in search of
de novo copy number changes and showed that in apparently sporadic cases
there is a substantial increased burden of rare copy number variation com-
pared both to familial cases of autism and to controls (Sebat et al., 2007). The
detection of rare variation at a high resolution across the entire genome,
the demonstration of its cumulative burden in the ASD phenotype, and
the specific CNVs identified which may provide specific clues to the identity
of genes with other rare variants contributing to autism all represent a very
significant step forward in the search for multiple independent mutations
contributing to ASD. These findings have subsequently been supported by
additional studies demonstrating an increased burden of de novo CNVs in
autism versus controls (Marshall et al., 2008), as well as studies demonstrat-
ing association of rare, recurrent CNVs with ASDs (Bucan et al., 2009;
Glessner et al., 2009; Kumar et al., 2008; Szatmari et al., 2007; Weiss
et al., 2008).
These latter findings underscore how much the dogma regarding rare
variation has begun to change, spurred by the advent of CNV detection.
For example, the most notable finding with regard to specific copy number
alterations and their involvement in ASD has been with respect to a region
on the short arm of chromosome 16 (16p11.2) (Kumar et al., 2008; Szatmari
et al., 2007; Weiss et al., 2008). The first studies to systematically address the
role of this variation in ASD relied on standard case–control association
methodology (Kumar et al., 2008; Weiss et al., 2008). They did not seek to
demonstrate a one-to-one relationship between carrying the variation and
having the phenotype within families, which would have been expected in
the era of standard cytogenetics. Indeed, in these initial analyses, 16p11 was
observed in unaffected individuals and, more importantly, within families de
novo variations were found in one affected individual but not in a second
affected sibling. This latter observation would previously have been taken as
270 Causality and Psychopathology
strong evidence against the contribution of this variation within these pedi-
grees, based on Mendelian expectations for rare disorders.
These findings highlight the shift from conceptualizing rare variation as
being synonymous with Mendelian inheritance. Indeed, as the risks asso-
ciated with common variation have been found to be much smaller than
previously anticipated, prior notions about effect sizes that would come
under negative selection and result in rare transmitted alleles must also be
reconsidered. Moreover, as the consequences of rare variants may be more
subtle than previously anticipated and their contribution more complex, a
shift to association methodologies became a necessity. Fortunately, the field
of psychiatric genetics has several decades of experience with these strategies,
which point to the key requirements for the next generation of rare variant
studies, including controlling for population stratification, accounting for
multiple comparisons, and leveraging sufficiently large sample sizes to
allow for the detection of alleles of comparatively modest individual effect.
References
Abelson, J. F., Kwan, K. Y., O’Roak, B. J., Baek, D. Y., Stillman, A. A., Morgan, T. M.,
et al. (2005). Sequence variants in SLITRK1 are associated with Tourette’s syndrome.
Science, 310(5746), 317–320.
Alarcon, M., Abrahams, B. S., Stone, J. L., Duvall, J. A., Perederiy, J. V., Bomar, J. M., et al.
(2008). Linkage, association, and gene-expression analyses identify CNTNAP2 as an
autism-susceptibility gene. American Journal of Human Genetics, 82(1), 150–159.
Altshuler, D., & Daly, M. (2007). Guilt beyond a reasonable doubt. Nature Genetics,
39(7), 813–815.
Arking, D. E., Cutler, D. J., Brune, C. W., Teslovich, T. M., West, K., Ikeda, M., et al. (2008).
A common genetic variant in the neurexin superfamily member CNTNAP2 increases
familial risk of autism. American Journal of Human Genetics, 82(1), 160–164.
Bailey, A., Le Couteur, A., Gottesman, I., Bolton, P., Simonoff, E., Yuzda, E., et al.
(1995). Autism as a strongly genetic disorder: Evidence from a British twin study.
Psychological Medicine, 25(1), 63–77.
Bakkaloglu, B., O’Roak, B. J., Louvi, A., Gupta, A. R., Abelson, J. F., Morgan, T. M.,
et al. (2008). Molecular cytogenetic analysis and resequencing of contactin associated
protein-like 2 in autism spectrum disorders. American Journal of Human Genetics,
82(1), 165–173.
Bilguvar, K., Yasuno, K., Niemela, M., Ruigrok, Y. M., von Und Zu Fraunberg, M., van
Duijn, C. M., et al. (2008). Susceptibility loci for intracranial aneurysm in European
and Japanese populations. Nature Genetics, 40(12), 1472–1477.
Blasi, F., Bacchelli, E., Pesaresi, G., Carone, S., Bailey, A. J., & Maestrini, E. (2006).
Absence of coding mutations in the X-linked genes neuroligin 3 and neuroligin 4 in
individuals with autism from the IMGSAC collection. American Journal of Medical
Genetics B Neuropsychiatric Genetics, 141B(3), 220–221.
Botstein, D., & Risch, N. (2003). Discovering genotypes underlying human pheno-
types: Past successes for Mendelian disease, future approaches for complex disease.
Nature Genetics, 33(Suppl.), 228–237.
Brzustowicz, L. M., Hodgkinson, K. A., Chow, E. W., Honer, W. G., & Bassett, A. S.
(2000). Location of a major susceptibility locus for familial schizophrenia on
chromosome 1q21-q22. Science, 288(5466), 678–682.
Bucan, M., Abrahams, B. S., Wang, K., Glessner, J. T., Herman, E. I., Sonnenblick, L. I.,
et al. (2009). Genome-wide analyses of exonic copy number variants in a family-based
study point to novel autism susceptibility genes. PLoS Genetics, 5(6), e1000536.
272 Causality and Psychopathology
Bugge, M., Bruun-Petersen, G., Brondum-Nielsen, K., Friedrich, U., Hansen, J.,
Jensen, G., et al. (2000). Disease associated balanced chromosome rearrangements:
A resource for large scale genotype–phenotype delineation in man. Journal of
Medical Genetics, 37(11), 858–865.
Campbell, D. B., Sutcliffe, J. S., Ebert, P. J., Militerni, R., Bravaccio, C., Trillo, S., et al.
(2006). A genetic variant that disrupts MET transcription is associated with autism.
Proceedings of the National Academy of Sciences USA, 103(45), 16834–16839.
Cantor, R. M., Yoon, J. L., Furr, J., & Lajonchere, C. M. (2007). Paternal age and autism
are associated in a family-based sample. Molecular Psychiatry, 12(5), 419–421.
Chakravarti, A. (1999). Population genetics—making sense out of sequence. Nature
Genetics, 21(1 Suppl.), 56–60.
Chih, B., Afridi, S. K., Clark, L., & Scheiffele, P. (2004). Disorder-associated mutations
lead to functional inactivation of neuroligins. Human Molecular Genetics, 13(14),
1471–1477.
Chih, B., Engelman, H., & Scheiffele, P. (2005). Control of excitatory and inhibitory
synapse formation by neuroligins. Science, 307(5713), 1324–1328.
Chih, B., Gollan, L., & Scheiffele, P. (2006). Alternative splicing controls selective
trans-synaptic interactions of the neuroligin–neurexin complex. Neuron, 51(2),
171–178.
Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., et al. (2009).
Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.
Proceedings of the National Academy of Sciences USA, 106(45), 19096–19101.
Cohen, J. C., Kiss, R. S., Pertsemlidis, A., Marcel, Y. L., McPherson, R., & Hobbs, H.
H. (2004). Multiple rare alleles contribute to low plasma levels of HDL cholesterol.
Science, 305(5685), 869–872.
Constantino, J. N., Lajonchere, C., Lutz, M., Gray, T., Abbacchi, A., McKenna, K., et al.
(2006). Autistic social impairment in the siblings of children with pervasive devel-
opmental disorders. American Journal of Psychiatry, 163(2), 294–296.
Constantino, J. N., & Todd, R. D. (2005). Intergenerational transmission of subthres-
hold autistic traits in the general population. Biological Psychiatry, 57(6), 655–660.
Crow, J. F. (2000). The origins, patterns and implications of human spontaneous
mutation. Nature Reviews Genetics, 1(1), 40–47.
Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J., & Lander, E. S. (2001). High-
resolution haplotype structure in the human genome. Nature Genetics, 29(2), 229–232.
Dave, B. J., & Sanger, W. G. (2007). Role of cytogenetics and molecular cytogenetics in
the diagnosis of genetic imbalances. Seminars in Pediatric Neurology, 14(1), 2–6.
Durand, C. M., Betancur, C., Boeckers, T. M., Bockmann, J., Chaste, P., Fauchereau,
F., et al. (2007). Mutations in the gene encoding the synaptic scaffolding protein
SHANK3 are associated with autism spectrum disorders. Nature Genetics, 39(1),
25–27.
Fu, Y. H., Kuhl, D. P., Pizzuti, A., Pieretti, M., Sutcliffe, J. S., Richards, S., et al.
(1991). Variation of the CGG repeat at the fragile X site results in genetic instability:
Resolution of the Sherman paradox. Cell, 67(6), 1047–1058.
Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., et al.
(2002). The structure of haplotype blocks in the human genome. Science, 296(5576),
2225–2229.
Glessner, J. T., Wang, K., Cai, G., Korvatska, O., Kim, C. E., Wood, S., et al. (2009).
Autism genome-wide copy number variation reveals ubiquitin and neuronal genes.
Nature, 459(7246), 569–573.
10 Rare Variant Approaches to Neuropsychiatric Disorders 273
Gupta, A. R., & State, M. W. (2007). Recent advances in the genetics of autism.
Biological Psychiatry, 61(4), 429–437.
Hakonarson, H., Grant, S. F., Bradfield, J. P., Marchand, L., Kim, C. E., Glessner, J. T.,
et al. (2007). A genome-wide association study identifies KIAA0350 as a type 1
diabetes gene. Nature, 448(7153), 591–594.
Hirschhorn, J. N., & Altshuler, D. (2002). Once and again—issues surrounding repli-
cation in genetic association studies. Journal of Clinical Endocrinology and
Metabolism, 87(10), 4438–4441.
Hirschhorn, J. N., & Daly, M. J. (2005). Genome-wide association studies for common
diseases and complex traits. Nature Reviews Genetics, 6(2), 95–108.
Hirschhorn, J. N., Lohmueller, K., Byrne, E., & Hirschhorn, K. (2002). A comprehen-
sive review of genetic association studies. Genetics in Medicine, 4(2), 45–61.
Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L., Donahoe, P. K., Qi, Y., et al.
(2004). Detection of large-scale variation in the human genome. Nature Genetics,
36(9), 949–951.
International HapMap Consortium. (2005). A haplotype map of the human genome.
Nature, 437(7063), 1299–1320.
International Human Genome Sequencing Consortium. (2004). Finishing the euchro-
matic sequence of the human genome. Nature, 431(7011), 931–945.
Jamain, S., Quach, H., Betancur, C., Rastam, M., Colineaux, C., Gillberg, I. C., et al.
(2003). Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4
are associated with autism. Nature Genetics, 34(1), 27–29.
Ji, W., Foo, J. N., O’Roak, B. J., Zhao, H., Larson, M. G., Simon, D. B., et al. (2008).
Rare independent mutations in renal salt handling genes contribute to blood
pressure variation. Nature Genetics, 40(5), 592–599.
Kim, H. G., Kishikawa, S., Higgins, A. W., Seong, I. S., Donovan, D. J., Shen, Y., et al.
(2008). Disruption of neurexin 1 associated with autism spectrum disorder.
American Journal of Human Genetics, 82(1), 199–207.
Kleinjan, D. J., & van Heyningen, V. (1998). Position effect in human genetic disease.
Human Molecular Genetics, 7(10), 1611–1618.
Kumar, R. A., KaraMohamed, S., Sudi, J., Conrad, D. F., Brune, C., Badner, J. A., et al.
(2008). Recurrent 16p11.2 microdeletions in autism. Human Molecular Genetics,
17(4), 628–638.
Lander, E., & Kruglyak, L. (1995). Genetic dissection of complex traits: Guidelines for
interpreting and reporting linkage results. Nature Genetics, 11(3), 241–247.
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., et al. (2001).
Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.
Laumonnier, F., Bonnet-Brilhault, F., Gomot, M., Blanc, R., David, A., Moizard, M. P.,
et al. (2004). X-linked mental retardation and autism are associated with a mutation
in the NLGN4 gene, a member of the neuroligin family. American Journal of Human
Genetics, 74(3), 552–557.
Lise, M. F., & El-Husseini, A. (2006). The neuroligin and neurexin families: From
structure to function at the synapse. Cellular and Molecular Life Sciences, 63(16),
1833–1849.
Lubs, H. A. (1969). A marker X chromosome. American Journal of Human Genetics,
21(3), 231–244.
Ma, D., Salyakina, D., Jaworski, J. M., Konidari, I., Whitehead, P. L., Andersen, A. N.,
et al. (2009). A genome-wide association study of autism reveals a common novel
risk locus at 5p14.1. Annals of Human Genetics, 73(Pt 3), 263–273.
274 Causality and Psychopathology
Macarov, M., Zeigler, M., Newman, J. P., Strich, D., Sury, V., Tennenbaum, A., et al.
(2007). Deletions of VCX-A and NLGN4: A variable phenotype including normal
intellect. Journal of Intellectual Disability Research, 51(Pt 5), 329–333.
Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C., Feuk, L., Skaug, J., et al. (2008).
Structural variation of chromosomes in autism spectrum disorder. American Journal
of Human Genetics, 82(2), 477–488.
McClellan, J. M., Susser, E., & King, M. C. (2007). Schizophrenia: A common disease
caused by multiple rare alleles. British Journal of Psychiatry, 190, 194–199.
McMahon, F. J., Akula, N., Schulze, T. G., Muglia, P., Tozzi, F., Detera-Wadleigh, S.
D., et al. (2010). Meta-analysis of genome-wide association data identifies a risk
locus for major mood disorders on 3p21.1. Nature Genetics, 42(2), 128–131.
McPherson, J. D., Marra, M., Hillier, L., Waterston, R. H., Chinwalla, A., Wallis, J.,
et al. (2001). A physical map of the human genome. Nature, 409(6822), 934–941.
Millar, J. K., Wilson-Annan, J. C., Anderson, S., Christie, S., Taylor, M. S., Semple, C.
A., et al. (2000). Disruption of two novel genes by a translocation co-segregating
with schizophrenia. Human Molecular Genetics, 9(9), 1415–1423.
Moessner, R., Marshall, C. R., Sutcliffe, J. S., Skaug, J., Pinto, D., Vincent, J., et al.
(2007). Contribution of SHANK3 mutations to autism spectrum disorder. American
Journal of Human Genetics, 81(6), 1289–1297.
Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., et al.
(2009). Targeted capture and massively parallel sequencing of 12 human exomes.
Nature, 461(7261), 272–276.
Noonan, J. P. (2009). Regulatory DNAs and the evolution of human development.
Current Opinion in Genetics and Development, 19(6), 557–564.
O’Roak, B. J., & State, M. W. (2008). Autism genetics: Strategies, challenges, and
opportunities. Autism Research, 1(1), 4–17.
Poliak, S., Gollan, L., Martinez, R., Custer, A., Einheber, S., Salzer, J. L., et al. (1999).
Caspr2, a new member of the neurexin superfamily, is localized at the juxtaparanodes
of myelinated axons and associates with K+ channels. Neuron, 24(4), 1037–1047.
Prabhakar, S., Visel, A., Akiyama, J. A., Shoukry, M., Lewis, K. D., Holt, A., et al.
(2008). Human-specific gain of function in a developmental enhancer. Science,
321(5894), 1346–1350.
Pritchard, J. K. (2001). Are rare variants responsible for susceptibility to complex
diseases? American Journal of Human Genetics, 69(1), 124–137.
Psychiatric GWAS Consortium Steering Committee. (2009). A framework for inter-
preting genome-wide association studies of psychiatric disorders. Molecular
Psychiatry, 14(1), 10–17.
Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., et al. (2006).
Global variation in copy number in the human genome. Nature, 444(7118), 444–454.
Reichenberg, A., Gross, R., Weiser, M., Bresnahan, M., Silverman, J., Harlap, S., et al.
(2006). Advancing paternal age and autism. Archives of General Psychiatry, 63(9),
1026–1032.
Risch, N. (1990). Linkage strategies for genetically complex traits. I. Multilocus
models. American Journal of Human Genetics, 46(2), 222–228.
Risch, N., & Merikangas, K. (1996). The future of genetic studies of complex human
diseases. Science, 273(5281), 1516–1517.
Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I., Chen, H., et al.
(2007). Genome-wide association analysis identifies loci for type 2 diabetes and
triglyceride levels. Science, 316(5829), 1331–1336.
10 Rare Variant Approaches to Neuropsychiatric Disorders 275
Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., et al.
(2007). A genome-wide association study of type 2 diabetes in Finns detects multiple
susceptibility variants. Science, 316(5829), 1341–1345.
Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., et al. (2007).
Strong association of de novo copy number mutations with autism. Science,
316(5823), 445–449.
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., et al. (2004). Large-
scale copy number polymorphism in the human genome. Science, 305(5683), 525–528.
Sharp, A. J., Cheng, Z., & Eichler, E. E. (2006). Structural variation of the human
genome. Annual Review of Genomics and Human Genetics, 7, 407–442.
State, M. W. (2006). A surprising METamorphosis: Autism genetics finds a common
functional variant. Proceedings of the National Academy of Sciences USA, 103(45),
16621–16622.
State, M. W., Greally, J. M., Cuker, A., Bowers, P. N., Henegariu, O., Morgan, T. M.,
et al. (2003). Epigenetic abnormalities associated with a chromosome 18(q21-q22)
inversion and a Gilles de la Tourette syndrome phenotype. Proceedings of the National
Academy of Sciences USA, 100(8), 4684–4689.
Strauss, K. A., Puffenberger, E. G., Huentelman, M. J., Gottlieb, S., Dobrin, S. E.,
Parod, J. M., et al. (2006). Recessive symptomatic focal epilepsy and mutant
contactin-associated protein-like 2. New England Journal of Medicine, 354(13),
1370–1377.
Sutherland, G. R. (1977). Fragile sites on human chromosomes: Demonstration of
their dependence on the type of tissue culture medium. Science, 197(4300), 265–266.
Szatmari, P., Paterson, A. D., Zwaigenbaum, L., Roberts, W., Brian, J., Liu, X. Q., et al.
(2007). Mapping autism risk loci using genetic linkage and chromosomal rearrange-
ments. Nature Genetics, 39(3), 319–328.
Tabuchi, K., Blundell, J., Etherton, M. R., Hammer, R. E., Liu, X., Powell, C. M., et al.
(2007). A neuroligin-3 mutation implicated in autism increases inhibitory synaptic
transmission in mice. Science, 318(5847), 71–76.
Varoqueaux, F., Aramuni, G., Rawson, R. L., Mohrmann, R., Missler, M., Gottmann,
K., et al. (2006). Neuroligins determine synapse maturation and function. Neuron,
51(6), 741–754.
Veenstra-Vanderweele, J., Christian, S. L., & Cook, E. H., Jr. (2004). Autism as a
paradigmatic complex genetic disorder. Annual Review of Genomics and Human
Genetics, 5, 379–405.
Veenstra-VanderWeele, J., & Cook, E. H., Jr. (2004). Molecular genetics of autism
spectrum disorder. Molecular Psychiatry, 9(9), 819–832.
Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., et al.
(1991). Identification of a gene (FMR-1) containing a CGG repeat coincident with a
breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell,
65(5), 905–914.
Vorstman, J. A., Staal, W. G., van Daalen, E., van Engeland, H., Hochstenbach, P. F., &
Franke, L. (2006). Identification of novel autism candidate regions through analysis
of reported cytogenetic abnormalities associated with autism. Molecular Psychiatry,
11(1), 18–28.
Wassink, T. H., Piven, J., & Patil, S. R. (2001). Chromosomal abnormalities in a clinic
sample of individuals with autistic disorder. Psychiatric Genetics, 11(2), 57–63.
Weiss, L. A., Arking, D. E., Daly, M. J., & Chakravarti, A. (2009). A genome-wide linkage
and association scan reveals novel loci for autism. Nature, 461(7265), 802–808.
276 Causality and Psychopathology
Weiss, L. A., Shen, Y., Korn, J. M., Arking, D. E., Miller, D. T., Fossdal, R., et al. (2008).
Association between microdeletion and microduplication at 16p11.2 and autism.
New England Journal of Medicine, 358(7), 667–675.
Zeggini, E., & McCarthy, M. I. (2007). Identifying susceptibility variants for type 2
diabetes. Methods in Molecular Biology, 376, 235–250.
Zoghbi, H. Y. (2003). Postnatal neurodevelopmental disorders: Meeting at the synapse?
Science, 302(5646), 826–830.
Zondervan, K. T., & Cardon, L. R. (2004). The complex interplay among factors that
influence allelic association. Nature Reviews Genetics, 5(2), 89–100.
part iii
The philosopher Ernst Nagel (1957, p. 15) defined development in a way that
links it to both benign and pathological outcomes:
1. Sections of this chapter are based in part on Costello (2008) and Costello & Angold (2006).
279
280 Causality and Psychopathology
Most chronic diseases are the results of a process extending over dec-
ades, and many of the events occurring in this period play a substantial
role in the study of physical growth, of mental and hormonal develop-
ment, and in the process of aging, the essential feature is that changes
over time are followed at the individual level.
There are two main streams of psychological research: one into ‘‘norms’’ of
human development and behavior and the other into differences among
individuals and groups. Developmental psychopathology aims to integrate
the two (Cicchetti, 1984). The more we understand about normal develop-
ment in the general population, the more we can learn about the causes of
pathology in the minority of the population who have disorders.
Human development is marked by stages or turning points at which change
in one or more systems occurs quite rapidly, inducing a qualitative difference
in capacity (Pickles & Hill, 2006). Here, we consider what we can learn about
the causes of psychiatric disorders from what developmental science has
learned about two key developmental turning points: the period before and
immediately after birth and the long process that leads to sexual maturation.
Prenatal and perinatal development can carry risk for psychopathology later
in life. Several lines of research suggest that intrauterine growth retardation
creates risk for a range of psychiatric outcomes at different developmental
stages, depending on the timing of exposure in relation to time-specific vul-
nerabilities of the developing organism (Barker, 2004). Low birth weight has
been implicated in risk for schizophrenia (Nilsson et al., 2005; Silverton,
Mednick, Schulsinger, Parnas, & Harrington, 1988), attention-deficit/hyperac-
tivity disorder (ADHD) (Botting, Powls, Cooke, & Marlow, 1997; Breslau et al.,
1996; Breslau & Chilcoat, 2000; Pharoah, Stevenson, Cooke, & Stevenson,
1994; Szatmari, Saigal, Rosenbaum, Campbell, & King, 1990), and eating dis-
orders (Favaro, Tenconi, & Santonastaso, 2006). Since 1990, studies have been
published both supporting (Botting et al., 1997; Frost, Reinherz, Pakiz-Camras,
Giaconia, & Lefkowitz, 1999; Gale & Martyn, 2004; Gardner et al., 2004; Patton,
Coffey, Carlin, Olsson, & Morley, 2004; Pharoah, Stevenson, Cooke, &
Stevenson, 1994; Weisglas-Kuperus, Koot, Baerts, Fetter, & Sauer, 1993) and
disconfirming (Buka, Tsuang, & Lipsitt, 1993; Cooke, 2004; Jablensky, Morgan,
Zubrick, Bower, & Yellachich, 2005; Osler, Nordentoft, & Nybo Andersen, 2005;
Szatmari et al., 1990) the idea that low birth weight predicts depression.
The problem is moving beyond correlation to causal explanations. Clearly,
experimental assignment to a high-risk perinatal environment is not ethical
for human research. In a longitudinal study, we tried to narrow down the
range of possible causal explanations by testing two competing hypotheses to
explain why the incidence of depression increases dramatically in girls, but
not boys, when they are about age 13. A simple bivariate analysis showed that
depression was much more common (38.1% vs. 8.4%) in girls who had
weighed less than 2,500 g at birth than in other girls. One hypothesis
states that low birth weight is one of a range of risk factors that could lead
284 Causality and Psychopathology
Quasi-Experiments
What distinguishes quasi-experiments from randomized experiments is that
in the former case we cannot be sure that group assignment is free of bias.
In other respects (e.g., the selection of the intervention, the measures admi-
nistered, the timing of measurement), the two designs may be close to
identical. However, the difference—inability to use random assignment—
can threaten the validity of causal conclusions based on the results (as dis-
cussed earlier). We describe three such strategies used to test whether and
how traumatic events cause psychiatric disorders. (In the following diagrams,
O = observation, X = event, T = time).
If data have already been collected before the event or intervention, it may be
possible to set up a pre- vs. post-, exposed vs. not exposed design that comes
close to random assignment. However, it is unlikely that there will have been
an opportunity to collect ‘‘before’’ measures on those to whom the (typically
11 Causal Thinking in Developmental Disorders 287
unforeseen) event will occur, with the result that the most common form of
quasi-experiment following an unexpected catastrophe is the following:
T1 T2 T3
Sample X O
Population norm O
For example, Hoven and colleagues (2005) compared children from New
York during the September 11, 2001 (9/11), attack on the Twin Towers to a
representative population sample from nearby Stamford, Connecticut, who
had been assessed with the same instruments just before 9/11, as well as to
other community samples. The New York children assessed 6 months after
9/11 had higher rates of most diagnoses.
This design is critically dependent on the comparability of the postevent
sample and the sample on which the measures were normed since otherwise
any differences found might be the result of preexisting differences rather
than the event. Therefore, although this design is often the only one avail-
able, it tends to be the weakest of the various quasi-experiments.
T1 T2
Sample a X O
Sample b X O
For example, the same researchers (Hoven et al., 2005) divided New York
City into three areas at different geographical distances from the site of the
World Trade Center and sampled children attending schools in each area, to
test the hypothesis that physical distance from the event reduced the risk of
psychiatric disorder; this finding would support a causal relationship between
the event and the disorder. They found high rates of mental disorder
throughout the study area but significantly lower rates in children who
went to school in the area closest to the site of the attack. This took the
researchers by surprise; their post hoc explanation was that the extent of
social support and mental health care following 9/11 prevented the harm
that the event might have caused.
Next, they measured personal and family exposure to the attack and com-
pared children who had family members involved in the attack to those who
288 Causality and Psychopathology
This design has the potential to come closest to a randomized design because
the same subjects are studied both before and after an event that occurred in
one group but not the other:
T1 T2 T3
Sample a O O
Sample b O X O
However, if sample a and sample b were not randomly assigned from the
same subject pool, the researcher must convince the reader that there were
no differences between the two groups before the event that could potentially
confound the causal relationship.
For example, in a longitudinal study of development across the transition
to adulthood, we interviewed a representative sample of young people every 1
or 2 years since 1993. Subjects were interviewed each year on a date as close
as possible to their birthday. Thus, in 2001, when the participants were aged
19 and 21, about two-thirds of them had been interviewed when, on 9/11, the
Twin Towers and the Pentagon were struck. We continued to interview the
remaining subjects until the end of the year (Costello, Erkanli, Keeler, &
Angold, 2004), but the world facing these young people was a very different
one from that in which we had interviewed the first group of participants; for
example, there was talk of a national draft, which would directly affect this
age group.
The strength of this design is critically dependent on the comparability of
the groups interviewed before and after the event. In this case we had 8 years
of interviews with the participants before 2001. We compared the before-9/11
and the after-9/11 groups on a wide range of factors and were able to demon-
strate that each was a random subsample of the main sample. Thus, we had
a quasi-experiment that was equivalent to randomly assigning subjects who
11 Causal Thinking in Developmental Disorders 289
had experienced vs. those who had not experienced 9/11. We predicted that,
even though the participants were living 500 miles away from where the
events occurred, this ‘‘distant trauma’’ (Terr et al., 1999) would increase
levels of anxiety and possibly, in this age group, alcohol and drug abuse.
We also hypothesized that the potential for military conscription might
further increase anxiety levels, especially in males. We were wrong on both
counts. There was no increase in levels of anxiety. Women interviewed after
9/11 reported higher levels of drug use in general, and cannabis in particular,
with rates of reported use approaching twice the pre-9/11 level. Conversely,
men interviewed after 9/11 were less likely to report substance abuse, and
use of all drugs was lower.
The examples of quasi-experimental studies described here suggest that
such designs can be quite effective at discounting previously held beliefs but
that they are open to the risk of post hoc interpretations (as in the post-9/11
examples). Finally, as Shadish, Cook, and Campbell (2002) point out, ‘‘they
can undermine the likelihood of doing even better studies.’’
Natural Experiments
Natural experiments are gifts to the researcher; they are situations that could
not have been planned or proposed but do what a randomized experiment
does. That is, they assign participants to one exposure or another without
bias and hold all other variables constant while manipulating the risk factor
of interest. Sometimes the unbiased assignment is created by events, as when
one group of families in our longitudinal study received an income supple-
ment while others did not, where race (American Indian vs. Anglo) was the
sole criterion (Costello, Compton, Keeler, & Angold, 2003). In this case, we
had 4 years of assessments of children’s psychiatric status before and after
the introduction of the income supplement and, thus, could compare the
children’s behavior before and after the intervention in both groups. The
years of measurement before the event enabled us to rule out the potential
confounding of ethnicity with the children’s emotional and behavioral
symptoms.
A tremendously important possibility for natural experimentation occurs
when genes and environment can be separated. Such naturally occurring
situations provided the foundation for the rise of genetic epidemiology,
which ‘‘focuses on the familial, and in particular genetic, determinants of
disease and the joint effects of genes and non-genetic determinants’’ (Burton,
Tobin, & Hopper, 2005, p. 941). Several researchers have made ingenious use
of the fact that ‘‘people take their genes with them when they move from one
country to another, but often the migration entails a radical change in life-
style’’ (Rutter, Pickles, Murray, & Eaves, 2001, p. 310). If a comparison group
290 Causality and Psychopathology
is available from both the old and the new countries, a natural experiment
exists of the following form:
T1 T2
Sample a1 X O
Sample a2 O
Sample b O
Sample a1, who migrated, can be compared both with sample a2, who stayed
home, and with sample b, who grew up in the new country. If a1 and a2 are
more similar than a1 and b, this suggests that the pathology being measured is
more strongly influenced by the genetic similarity of the two groups of the same
race/ethnicity than it is by the environmental differences in which the two
groups now live. Conversely, a greater similarity between a1 and b suggests a
strong environmental effect. For example, Verhulst and colleagues compared
Turkish adolescents in Holland with Turkish adolescents in Turkey and Dutch
adolescents in Holland on a self-report measure of child psychopathology
(Janssen et al., 2004). They found that the immigrant youth reported more
anxious, depressed, and withdrawn symptoms than the Dutch youth but
more delinquency, attentional problems, and somatic problems than the
Turkish youth in Turkey. This suggests that Turkish adolescents are, in general,
more prone to emotional symptoms than Dutch adolescents but that migration
caused some behavioral problems not seen at home. Interestingly, a follow-up
study when the two samples living in Holland were in their 20s showed that
differences between the two groups shrank significantly, largely because the
immigrants’ mental health improved more than did that of the native Dutch.
There are, of course, important caveats to be considered before causal
conclusions can be drawn from migrant designs: Why did people migrate?
Are they representative of the nonmigrants at home? So long as these issues
are carefully considered, however, migrant designs can be very helpful in
pulling apart entangled component causes.
intervention with teachers, parents, and children tested the theory that ‘‘early
starters’’ (i.e., children who show conduct problems early in childhood) tend
to increase in aggressive behavior over time and to persist in antisocial
behavior longer than other antisocial children (Moffitt, 1993). The interven-
tion had positive effects 4 years later, and mediational analyses supported
specific causal pathways. For example, improvements in parenting skills
affected the child’s behavior at home but not at school, while improvements
in social cognition about peers affected deviant peer associations.
Additionally, children whose prosocial behavior in the classroom improved
had improved ratings in classroom sociometric assessments (Bierman et al.,
2002). It would benefit causal research greatly if prevention trials were, like
Fast Track, specific about their causal theories and rigorous in testing them.
Conclusions
References
Brown, G. W., & Harris, T. O. (1978). The social origins of depression: A study of
psychiatric disorder in women. New York: Free Press.
Buka, S. L., Tsuang, M., & Lipsitt, L. (1993). Pregnancy/delivery complications and
psychiatric diagnosis: A prospective study. Archives of General Psychiatry, 50(2),
151–156.
Burton, P. R., Tobin, M. D., & Hopper, J. L. (2005). Key concepts in genetic epide-
miology. Lancet, 366(9489), 941.
Cairns, R. B., Gariépy, J. L., & Hood, K. E. (1990). Development, microevolution, and
social behavior. Psychological Review, 97(1), 49–65.
Champagne, F. A., & Meaney, M. J. (2006). Stress during gestation alters postpartum
maternal care and the development of the offspring in a rodent model. Biological
Psychiatry, 59(12), 1227–1235.
Champoux, M., Bennett, A., Shannon, C., Higley, J. D., Lesch, K. P., & Suomi, S. J.
(2002). Serotonin transporter gene polymorphism, differential early rearing, and
behavior in rhesus monkey neonates. Molecular Psychiatry, 7(10), 1058–1063.
Cicchetti, D. (1984). The emergence of developmental psychopathology. Child
Development, 55(1), 1–7.
Cicchetti, D. (2006). Development and psychopathology. In D. Cicchetti & D. J. Cohen
(Eds.), Developmental psychopathology (2nd ed., Vol. 1, pp. 1–23). Hoboken, NJ: John
Wiley & Sons.
Cooke, R. W. (2004). Health, lifestyle, and quality of life for young adults born very
preterm. Archives of Disease in Childhood, 89(3), 201–206.
Costello, E. J. (2008). Using epidemiological and longitudinal approaches to study
causal hypotheses. In M. Rutter (Ed.), Rutter’s child and adolescent psychiatry (pp.
58–70). Oxford: Blackwell Scientific.
Costello, E. J., & Angold, A. (2006). Developmental epidemiology. In D. Cicchetti & D.
Cohen (Eds.), Theory and method (2nd ed., Vol. 1, pp. 41–75). Hoboken, NJ: John
Wiley & Sons.
Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between
poverty and psychopathology: A natural experiment. Journal of the American Medical
Association, 290(15), 2023–2029.
Costello, E. J., Erkanli, A., Keeler, G., & Angold, A. (2004). Distant trauma: A prospec-
tive study of the effects of 9/11 on rural youth. Applied Developmental Science, 8(4),
211–220.
Eisenberg, L. (1977). Development as a unifying concept in psychiatry. British Journal
of Psychiatry, 131, 225–237.
Favaro, A., Tenconi, E., & Santonastaso, P. (2006). Perinatal factors and the risk of
developing anorexia nervosa and bulimia nervosa. Archives of General Psychiatry,
63(1), 82–88.
Feist, G. J. (2006). The psychology of science and the origins of the scientific mind. New
Haven, CT: Yale University Press.
Frost, A. K., Reinherz, H. Z., Pakiz-Camras, B., Giaconia, R. M., & Lefkowitz, E. S.
(1999). Risk factors for depressive symptoms in late adolescence: A longitudinal
community study. American Journal of Orthopsychiatry, 69(3), 370–381.
Gale, C. R., & Martyn, C. N. (2004). Birth weight and later risk of depression in a
national birth cohort. British Journal of Psychiatry, 184, 28–33.
Gardner, F., Johnson, A., Yudkin, P., Bowler, U., Hockley, C., Mutch, L., et al. (2004).
Behavioral and emotional adjustment of teenagers in mainstream school who were
born before 29 weeks’ gestation. Pediatrics, 114(3), 676–682.
294 Causality and Psychopathology
Ge, X., Brody, G., Conger, R., & Murry, V. (2002). Contextual amplification of pubertal
transition effects on deviant peer affiliation and externalizing behavior among
African American children. Developmental Psychology, 38(1), 42–54.
Ge, X., Conger, R. D., & Elder, G. H. (1996). Coming of age too early: Pubertal
influences on girls’ vulnerability to psychological distress. Child Development,
67(6), 3386–3400.
Gottlieb, G., & Willoughby, M. (2006). Probabilistic epigenesis of psychopathology. In
D. Cicchetti & D. Cohen (Eds.), Developmental psychopathology: Theory and method
(2nd ed., Vol. 1, pp. 673–700). Hoboken, NJ: John Wiley & Sons.
Greenough, W. T. (1991). Experience as a component of normal development:
Evolutionary considerations. Developmental Psychopathology, 27(1), 14–17.
Hay, D. F., & Angold, A. (1993). Introduction: Precursors and causes in development
and pathogenesis. In D. F. Hay & A. Angold (Eds.), Precursors and causes in devel-
opment and psychopathology (pp. 1–21). Chichester: John Wiley & Sons.
Hernan, M. A., & Robins, J. M. (2006). Estimating causal effects from epidemiological
data. Journal of Epidemiology and Community Health, 60(7), 578–586.
Hoven, C. W., Duarte, C. S., Lucas, C. P., Wu, P., Mandell, D. J., Goodwin, R. D., et al.
(2005). Psychopathology among New York City public school children 6 months
after September 11. Archives of General Psychiatry, 62(5), 545–552.
Insel, T. R., & Fenton, W. S. (2005). Psychiatric epidemiology: It’s not just about
counting anymore. Archives of General Psychiatry, 62(6), 590–592.
Jablensky, A., Morgan, V., Zubrick, S. R., Bower, C., & Yellachich, L.-A. (2005).
Pregnancy, delivery, and neonatal complications in population cohort of women
with schizophrenia and major affective disorders. American Journal of Psychiatry,
162(1), 79–91.
Janssen, M. M., Verhulst, F., Bengi-Arslan, L., Erol, N., Salter, C., & Crijnen, A. M.
(2004). Comparison of self-reported emotional and behavioral problems in Turkish
immigrant, Dutch and Turkish adolescents. Social Psychiatry and Psychiatric
Epidemiology, 39(2), 133–140.
Kaltiala-Heino, R., Kosunen, E., & Rimpela, M. (2003). Pubertal timing, sexual beha-
viour and self-reported depression in middle adolescence. Journal of Adolescence,
26(5), 531–545.
Kaltiala-Heino, R., Marttunen, M., Rantanen, P., & Rimpela, M. (2003). Early puberty
is associated with mental health problems in middle adolescence. Social Science &
Medicine, 57(6), 1055–1064.
Kandel, D. B., & Davies, M. (1982). Epidemiology of depressive mood in adolescents:
An empirical study. Archives of General Psychiatry, 39(10), 1205–1212.
Magnusson, D., Stattin, H., & Allen, V. L. (1985). Differential maturation among girls
and its relation to social adjustment: A longitudinal perspective. Stockholm: University
of Stockholm.
McGue, M. (1989). Nature–nurture and intelligence. Nature, 340, 507–508.
Moffitt, T. E. (1990). Juvenile delinquency and attention deficit disorder: Boys’ devel-
opmental trajectories from age 3 to age 15. Child Development, 61(3), 893–910.
Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior:
A developmental taxonomy. Psychological Review, 100(4), 674–701.
Moffitt, T. E., Caspi, A., Belsky, J., & Silva, P. A. (1992). Childhood experience and the
onset of menarche: A test of a sociobiological model. Child Development, 63(1), 47–58.
Nagel, E. (1957). Determinism and development. In D. B. Harris (Ed.), The concept of
development (pp. 15–26). Minneapolis: University of Minnesota Press.
11 Causal Thinking in Developmental Disorders 295
Needleman, H. L., & Bellinger, D. (1991). The health effects of low level exposure to
lead. Annual Review of Public Health, 12, 111–140.
Nilsson, E., Stalberg, G., Lichtenstein, P., Cnattingius, S., Olausson, P. O., & Hultman,
C. M. (2005). Fetal growth restriction and schizophrenia: A Swedish twin study.
Twin Research & Human Genetics, 8(4), 402–408.
Offord, D. R., Boyle, M. H., Racine, Y. A., Fleming, J. E., Cadman, D. T., Blum, H. M.,
et al. (1992). Outcome, prognosis, and risk in a longitudinal follow-up study. Journal
of the American Academy of Child and Adolescent Psychiatry, 31(5), 916–923.
Osler, M., Nordentoft, M., & Nybo Andersen, A.-M. (2005). Birth dimensions and risk
of depression in adulthood: Cohort study of Danish men born in 1953. British
Journal of Psychiatry, 186, 400–403.
Patton, G. C., Coffey, C., Carlin, J. B., Olsson, C. A., & Morley, R. (2004). Prematurity at
birth and adolescent depressive disorder. British Journal of Psychiatry, 184, 446–447.
Pharoah, P. O. D., Stevenson, C. J., Cooke, R. W. I., & Stevenson, R. C. (1994).
Prevalence of behaviour disorders in low birthweight infants. Archives of Disease in
Childhood, 70, 271–274.
Pickles, A., & Hill, J. (2006). Developmental pathways. In D. Cicchetti & D. Cohen
(Eds.), Developmental psychopathology: Theory and method (2nd ed., Vol. 1, pp.
211–243). Hoboken, NJ: John Wiley & Sons.
Plomin, R., DeFries, J., & Loehlin, J. (1977). Genotype–environment interaction and
correlation in the analysis of human behavior. Psychological Bulletin, 84(2), 309–322.
Robins, J., Hernan, M., & Brumback, B. (2000). Marginal structural models and causal
inference in epidemiology. Epidemiology, 11(5), 550–560.
Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multi-
variate matched sampling methods that incorporate the propensity score. American
Statistician, 39(1), 33–38.
Rothman, K. J. (1976). Reviews and commentary: Causes. American Journal of
Epidemiology, 104(6), 587–592.
Rothman, K. J., & Greenland, S. (2005). Causation and causal inference in epidemiol-
ogy. American Journal of Public Health, 95(Suppl. 1), S144–S150.
Rutter, M. (1988). Studies of psychosocial risk: The power of longitudinal data. New York:
Cambridge University Press.
Rutter, M. (1994). Concepts of causation, tests of causal mechanisms, and implications
for intervention. In A. C. Petersen & J. T. Mortimer (Eds.), Youth unemployment and
society (Vol. 13, pp. 147–171). New York: Cambridge University Press.
Rutter, M., Pickles, A., Murray, R., & Eaves, L. (2001). Testing hypotheses on specific
environmental causal effects on bevavior. Psychological Bulletin, 127(3), 291–324.
Sameroff, A., & Seifer, R. (1995). Accumulation of environmental risk and child mental
health (Vol. 31). New York: Garland Publishing.
Scarr, S., & McCartney, K. (1983). How people make their own environments: A theory
of genotype–environment effects. Child Development, 54(2), 424–435.
Seifer, R., Sameroff, A. J., Baldwin, C. P., & Balwin, A. (1989, April). Risk and pro-
tective factors between 4 and 13 years of age. Paper presented at the annual meeting
of the Society for Research in Child Development, San Francisco, CA.
Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental
designs for generalized causal inference. Boston: Houghton Mifflin.
Silverton, L., Mednick, S. A., Schulsinger, F., Parnas, J., & Harrington, M. E. (1988).
Genetic risk for schizophrenia, birthweight, and cerebral ventricular enlargement.
Journal of Abnormal Psychology, 97(4), 496–498.
296 Causality and Psychopathology
Simmons, R. G., & Blyth, D. A. (1992). Moving into adolescence: The impact of
pubertal change and school context. In P. H. Rossi, M. Useem, & J. D. Wright
(Eds.), Social institutions and social change (pp. 366–403). New York: Aldine de
Gruyter.
Sroufe, L. A. (1988). The role of infant–caregiver attachment in development. In
J. Belsky & T. Nezworski (Eds.), Clinical Implications of Attachment (pp. 18–38).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Szatmari, P., Saigal, S., Rosenbaum, P., Campbell, D., & King, S. (1990). Psychiatric
disorders at five years among children with birthweights <1000g: A regional per-
spective. Developmental Medicine and Child Neurology, 32(11), 954–962.
Terr, L., Bloch, D., Michel, B., Shi, H., Reinhardt, J., & Metayer, S. (1999). Children’s
symptoms in the wake of Challenger: A field study of distant-traumatic effects, an
outline of related conditions. American Journal of Psychiatry, 156(10), 1536–1544.
Tschann, J. M., Adler, N. E., Irwin, C. E., Jr., Millstein, S. G., Turner, R. A., & Kegeles,
S. M. (1994). Initiation of substance use in early adolescence: The roles of pubertal
timing and emotional distress. Health Psychology, 13(4), 326–333.
Weisglas-Kuperus, N., Koot, H. M., Baerts, W., Fetter, W. P., & Sauer, P. J. (1993).
Behaviour problems of very low birthweight children. Developmental Medicine and
Child Neurology, 35(5), 406–416.
Worthman, C. M., & Kuzara, J. (2005). Life history and the early origins of health
differentials. American Journal of Human Biology, 17(1), 95–112.
12
1. The DSM-III did contain explicit exceptions for disorders with known etiology or pathophysio-
logical process. For example, it stated that in organic mental disorders organic factors have
been identified. It is hardly necessary to point out the difference between these mental dis-
orders and PTSD with respect to causal assumptions.
297
298 Causality and Psychopathology
natural disaster, rape, and other assault. Persons who learned about a threat
to the physical integrity of another person or about a traumatic event experi-
enced by a friend could be considered victims. A novel form of PTSD took
shape following the 9/11 terrorist attacks, when the entire population of the
United States was considered to have been affected by a ‘‘distant’’ trauma,
produced chiefly by viewing television coverage. Weeks after the attacks,
researchers conducted telephone surveys and detected a rise in the prevalence
of PTSD and major depression as well as a ‘‘dose–response’’ relationship
between television viewing time and symptoms. Furthermore, a rise of new
9/11-related PTSD cases was reported among those who viewed televised
images on the 1-year anniversary of the events (Bernstein et al., 2007).
Some commentators criticized the ‘‘conceptual bracket creep’’ on several
counts, including the concern that it produces a heterogeneous and ‘‘diluted’’
population of cases, making it far more difficult to detect and characterize
pathological alterations in PTSD (McNally, 2003).
This chapter examines three themes in research on PTSD. They concern
essential features of the disorder that inform, intersect, and complicate one
another. The first concerns the internal logic of PTSD in the DSM that
captures the way a traumatic experience is linked to the clinical syndrome
through memory. The second concerns risk factors and diathesis. The third
concerns comorbidity, both bivariate associations of trauma and PTSD with
specific disorders and multivariate approaches to underlying liabilities across
a wide range of psychiatric disorders.
Exposure PTSD
M F M F
Breslau et al. (1991) 43.0 36.7 6.0 11.3
Norris (1992) 73.6 64.8 — —
Resnick et al. (1993) — 69.0 — 12.3
Kessler et al. (1995) 60.7 51.2 5.0 10.4
Breslau et al. (1997) — 40.0 — 13.8
Stein et al. (1997) 81.3 74.2 — —
Breslau et al. (1998) 92.2 87.1 6.2 13.0
Breslau et al. (2004b) 87.2 78.4 6.3 7.9
Kessler et al. (2005) — — 3.6 9.7
Table 12.2 Conditional Probability of PTSD Across Specific Traumas: Estimates From
Two Population Surveys
from two epidemiological studies that used the DSM-IV definition. In both
studies, stressors grouped under ‘‘assaultive violence’’ were associated with
the highest probability of PTSD (15% and 21%) and learning about trauma
experience by a close friend or relative was associated with the lowest prob-
ability (2.2% and 2.9%). The risk from any qualifying trauma was <10%
(8.8% and 9.2%). Clearly, even trauma types that have the highest PTSD
risk leave the majority of victims unaffected by the disorder.
12 Causes of Posttraumatic Stress Disorder 303
not they had developed PTSD in response to the prior trauma. Consequently,
it is unclear whether prior trauma per se or, instead, prior PTSD predicts an
elevated risk for PTSD following a subsequent trauma. Evidence that pre-
viously exposed persons are at increased risk for PTSD only if their prior
trauma resulted in PTSD would not support the hypothesis that exposure to
traumatic events increases the risk of (i.e., sensitizes to) the PTSD effects of
a subsequent trauma, transforming persons with ‘‘normal’’ reactions to stres-
sors into persons susceptible to PTSD. It might suggest that trauma preci-
pitates PTSD in persons with preexisting susceptibility that had already been
present before the prior trauma occurred. Evidence that personal vulnerabil-
ities, chiefly neuroticism, history of major depression and anxiety disorders,
and family history of psychiatric disorders, increase the risk for PTSD has
been consistently reported. There also is evidence that personal vulnerabil-
ities might be stronger predictors of psychiatric response to traumatic events
than trauma severity, especially in civilian samples.
We recently examined this question in our longitudinal epidemiological
study of young adults (Breslau, Peterson, & Schultz, 2008). At baseline and at
three reassessments over the following 10 years, respondents were asked
about the occurrence of traumatic events and PTSD. Data from one follow-
up assessment or more were available on 990 respondents (98.3% of the
initial panel). Exposure to trauma and PTSD measured at baseline and at
the 5-year follow-up were used to predict new exposure and PTSD during the
respective subsequent periods: from baseline to the 5-year assessment and
from the baseline and 5-year assessments to the 10-year assessment.
Preexisting major depression and any anxiety disorder were included as cov-
ariates to control for their effects (Table 12.4).
In this adjusted model the relative risk for PTSD following exposure to
traumatic events in subsequent periods was significantly higher among
trauma victims with PTSD in the preceding periods than in trauma victims
Table 12.4 Prior Trauma and PTSD and the Subsequent Occurrence of Trauma and
PTSD (n = 990)
Table 12.5 Relative Risk for PTSD Following Exposure to Trauma Associated With
Prior Trauma, Prior PTSD, and Covariates
who had not succumbed to PTSD. Odds ratios were 2.68 (95% confidence
interval 1.33–5.41) and 1.22 (95% confidence interval 0.64–2.34), respectively,
adjusted for sex, race, education, preexisting major depression and anxiety
disorders, and time of assessment (Table 12.5). We concluded that there was
no support in these data for the idea that traumatic events experienced in the
past lurk inside, waiting to shape reactions to future traumatic events. The
findings suggest that preexisting susceptibility to a pathological response to
stressors accounts for the PTSD response to the prior trauma and the sub-
sequent trauma.
Our results had been foreshadowed by a 1987 Israeli study of acute combat
stress reaction (CSR) among soldiers in the 1982 Lebanon War (Solomon,
Mikulincer, & Jakob, 1987). The authors reported that CSR occurred more
frequently among soldiers of the Lebanon War who had experienced CSR in
a previous war, but not among soldiers who had fought in a previous war but
had not experienced CSR, compared to new recruits who had not fought in a
previous war. The authors concluded that knowledge of the outcome of prior
combat was essential for predicting soldiers’ response to subsequent combat.
Soldiers who suffered CSR in a previous war might have had preexisting vul-
nerability that also accounted for their increased risk of CSR during the sub-
sequent war. Soldiers who had fought in a previous war but had not
experienced CSR had a lower rate of CSR during the subsequent war than
new recruits who had no war experience. It is tempting to interpret this obser-
vation as evidence of ‘‘inoculation.’’ However, the new recruits included sol-
diers who would have had CSR had they fought in a prior war. This undetected
‘‘vulnerable’’ subset would push up the rate of CSR in the group of new recruits
as a whole.
12 Causes of Posttraumatic Stress Disorder 307
Intelligence
Studies in Vietnam veterans have reported associations between intelligence
test scores and the risk for PTSD (Macklin et al., 1998; McNally & Shin, 1995;
Pitman, Orr, Lowenhagen, Macklin, & Altman, 1991). Evidence on the role of
intelligence in children’s psychiatric response to adversity was reported for a
range of disorders and for PTSD (Fergusson & Lynskey, 1996; Silva et al.,
2000). Several articles published in 2006 and 2007 reported on cognitive
ability measured in early childhood and subsequent PTSD in general popula-
tion samples (Breslau, Lucia, & Alvarado, 2006; Koenen, Moffitt, Poulton,
Martin, & Caspi, 2007; Storr, Ialongo, Anthony, & Breslau, 2007) and on
Vietnam veterans from the twin registry for whom predeployment test
scores were available (Kremen et al., 2007). Some of the studies found that
a decrease in risk was conferred by high IQ rather than the full range of IQ.
For example, we found that age 6 Wechsler Intelligence Scale for Children–
Revised IQ >115 was associated with a lower risk of subsequent exposure to
trauma and, among those exposed, a markedly lower risk of PTSD (adjusted
odds ratio = 0.21) (Breslau et al., 2006). Similarly, Gilbertson et al. (2006)
reported that above average cognitive functions protect from chronic PTSD
and that those with PTSD had average, rather than below average, cognitive
function. The mean IQ of PTSD veterans and their monozygotic twin broth-
ers was 105, whereas the mean IQ of non-PTSD combat veterans and their
monozygotic twin brothers was 118.
These studies perform two tasks. First, they dispel the notion that IQ
deficits observed among patients with PTSD reflect stress-induced
308 Causality and Psychopathology
Neuroticism
Neuroticism is a personality trait that at the high end is a disposition to
respond to stress with negative affect, depression, and anxiety and at the
low end manifests as emotional stability and ‘‘normality.’’ An early study
that called attention to neuroticism’s salience in the psychiatric response to
traumatic experiences reported on the survivors of the 1983 Australian bush-
fires. In contrast with the expectation that the intensity of the stressor would
be the primary cause, neuroticism and history of predisaster disturbances
emerged as stronger predictors of morbidity (McFarlane, 1988, 1989).
Studies of Vietnam combat veterans reported that PTSD and PTSD symptoms
were correlated with neuroticism (Casella & Motta, 1990; Hyer et al., 1994;
Talbert, Braswell, Albrecht, Hyer, & Boudewyns, 1993). In a general popula-
tion sample of young adults, neuroticism predicted both exposure to traumatic
events and PTSD after exposure, controlling for other risk factors (Breslau,
Davis, & Andreski, 1995; Breslau et al., 1991). In most of the studies neuroti-
cism was measured after the trauma, but three studies measured neuroticism
prior to the trauma and reported an association between neuroticism and
PTSD or postdisaster disturbance (Alexander & Wells, 1991; Engelhard, van
den Hout, & Kindt, 2003; Parslow, Jorm, & Christensen, 2006). Recently, pro-
spective studies have reported that anxious/depressed mood, anxiety disorders,
and difficult temperament measured in childhood predicted subsequent PTSD
(Breslau et al., 2006; Koenen et al., 2007; Storr et al., 2007).
Research on neuroticism demonstrated connections with neurophysiologi-
cal substrates, in particular, the lability of the autonomic nervous systems.
There is evidence supporting heritability and stability from childhood to
adulthood. Genetic control of neuroticism has been reported in numerous
studies since the 1970s. Molecular genetics studies, using both association
and linkage methods, identified gene regions that are likely to influence
variation in neuroticism (Fullerton et al., 2003; Lesch et al., 1996). A recent
meta-analysis concluded that there is a strong association between a seroto-
nin transporter promoter polymorphism (5-HTTLPR) and neuroticism, when
12 Causes of Posttraumatic Stress Disorder 309
Comorbidity
Table 12.6 Incidence and Relative Risk for Other Disorders in 10-Year Follow-Up in
Detroit Area Study of Young Adults
Table 12.7 Risk for Exposure to Trauma and PTSD by Preexisting Disorders
.50
Internalizing Externalizing
.95 .78
Distress Fear
.75
.86 .74 .79 .78 .70
.84 .71 .70 .84 .59
Generalized Adult
Major Social Specific Panic Alcohol Drug Conduct
Dysthymia Anxiety Agoraphobia Antisocial
Depression Phobia Phobia Disorder Disorder Disorder Disorder
Disorder Behavior
Figure 12.1 Path diagram for best-fitting meta-analysis model. Used with permission
of ANNUAL REVIEWS, INC., from Annual Review of Clinical Psychology, article by
R. Krueger and K. E. Markon, volume 2, 2006; permission conveyed through
Copyright Clearance Center, Inc.
12 Causes of Posttraumatic Stress Disorder 313
Dysthmyia 0.82
Distress
Posttraumatic stress 0.83
0.9
0.85
Generalized anxiety
0.75
Neurasthenia Internalizing
0.82
Social Phobia
0.83 Fear
Panic disorder 0.9
0.83
Agoraphobia 0.78 0.6
Obsessive-compulsive
0.72 Externalizing
Alcohol dependence
0.70
Drug dependence
Table 12.8 Factor Loadings of Lifetime DSM-III-R Diagnoses: NCS Data (n = 5,877)
What might these underlying liabilities be? Kruege (1999) points out that
the two major spectra of disorders, internalizing and externalizing, are linked
in the literature to personality traits of neuroticism and disinhibition: neuro-
ticism to internalizing disorders and neuroticism in the presence of high
disinhibition to externalizing disorders. Based on our findings, we have sug-
gested the possibility of a common diathesis between PTSD and major
depression and proposed that it might be a mistake to regard PTSD and
major depression in ‘‘comorbid’’ cases as separate and distinct (Breslau,
Davis, Peterson, & Schultz, 2000). As to the relationship of PTSD with sub-
stance-use disorders, the evidence suggests a different explanation. If there
are common underlying liabilities, they are probably weaker. It is clear that
we cannot conclude that the association of PTSD with alcohol- or drug-use
disorder is environmental, with alcohol or drug involvement increasing the
probability of exposure to traumatic events and indirectly increasing the risk
for PTSD. Also, survival analysis with time-dependent covariates of the retro-
spective data gathered at baseline did not support a causal pathway from
substance-use disorders to exposure. A recent study on another sample repli-
cates these findings (Reed et al., 2007). The evidence in our prospective data
that PTSD increased the risk for drug-use disorders, especially prescription
medicines, if replicated, would provide a part of the explanation. PTSD and
substance-use disorder are probably connected by multiple pathways, includ-
ing a more complex shared liabilities pattern which involves both neuroticism
and disinhibition.
Conclusion
Findings from our prospective research help to rule out some of the potential
pathways that might account for PTSD comorbidity. Trauma-exposed persons
who did not succumb to PTSD (i.e., about 90% of those exposed) are not at a
markedly increased risk for other disorders. PTSD following exposure to
stressors might identify persons with preexisting liability to a range of dis-
orders. The findings do not support the idea that trauma caused PTSD in
some victims and major depression in others. They led us to conclude that
the two disorders might have a shared diathesis and that, when observed
together in ‘‘comorbid’’ cases, they are not distinct disorders with separate
etiologies. Multivariate analysis of psychiatric comorbidity illuminates etiol-
ogy by seeking to identify core processes underlying multiple disorders. The
liability constructs that emerge resemble personality traits linked to psycho-
pathology. Neuroticism is a liability for internalizing disorders, a spectrum
that contains PTSD. The construct of PTSD as a process with an inner logic
has close affinity to neuroticism. The core dimension in neuroticism is the
12 Causes of Posttraumatic Stress Disorder 315
References
Alexander, D. A., & Wells, A. (1991). Reactions of police officers to body-handling after a
major disaster. A before-and-after comparison. British Journal of Psychiatry, 159, 547–
555.
American Psychiatric Association. (1980). Diagnostic and statistical manual of mental
disorders (3rd ed.). Washington DC: Author.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental
disorders: DSM-IV (4th ed.). Washington DC: Author.
Andreasen, N. C. (1980). Post-traumatic stress disorder. In A. M. Freedman, H. I.
Kaplan, & B. J. Sadock (Eds.), Comprehensive textbook of psychiatry (3rd ed.).
Baltimore: Williams & Wilkins.
Andreasen, N. C. (2004). Acute and delayed posttraumatic stress disorders: A history
and some issues. American Journal of Psychiatry, 161(8), 1321–1323.
Andrews, B., Brewin, C. R., Philpott, R., & Stewart, L. (2007). Delayed-onset posttrau-
matic stress disorder: A systematic review of the evidence. American Journal of
Psychiatry, 164(9), 1319–1326.
316 Causality and Psychopathology
Bernstein, K. T., Ahern, J., Tracy, M., Boscarino, J. A., Vlahov, D., & Galea, S. (2007).
Television watching and the risk of incident probable posttraumatic stress disorder:
A prospective evaluation. Journal of Nervous and Mental Disease, 195(1), 41–47.
Blank, A. S. (1985). Irrational reactions to post-traumatic stress disorder and Viet Nam
veterans. In S. Sonnenberg, A. S. Blank, Jr., & J. A. Talbott (Eds.), The trauma of
war: Stress and recovery in Viet Nam veterans (p. xxi). Washington DC: American
Psychiatric Press.
Bremner, J. D. (1999). Does stress damage the brain? Biological Psychiatry, 45(7),
797–805.
Bremner, J. D., Southwick, S. M., Johnson, D. R., Yehuda, R., & Charney, D. S. (1993).
Childhood physical abuse and combat-related posttraumatic stress disorder in
Vietnam veterans. American Journal of Psychiatry, 150(2), 235–239.
Breslau, J. (2004). Cultures of trauma: Anthropological views of posttraumatic stress
disorder in international health. Culture, Medicine and Psychiatry, 28(2), 113–126.
Breslau, J. (2005). Response to ‘‘Commentary: Deconstructing critiques on the inter-
nationalization of PTSD’’. Culture, Medicine and Psychiatry, 29(3), 371–376.
Breslau, N., Chase, G. A., & Anthony, J. C. (2002). The uniqueness of the DSM
definition of post-traumatic stress disorder: Implications for research. Psychological
Medicine, 32(4), 573–576.
Breslau, N., Chilcoat, H. D., Kessler, R. C., & Davis, G. C. (1999). Previous exposure to
trauma and PTSD effects of subsequent trauma: Results from the Detroit Area
Survey of Trauma. American Journal of Psychiatry, 156(6), 902–907.
Breslau, N., & Davis, G. C. (1987). Posttraumatic stress disorder. The stressor criterion.
Journal of Nervous and Mental Disease, 175(5), 255–264.
Breslau, N., Davis, G. C., & Andreski, P. (1995). Risk factors for PTSD-related trau-
matic events: A prospective analysis. American Journal of Psychiatry, 152(4), 529–535.
Breslau, N., Davis, G. C., Andreski, P., & Peterson, E. (1991). Traumatic events and
posttraumatic stress disorder in an urban population of young adults. Archives of
General Psychiatry, 48(3), 216–222.
Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. (1997). Psychiatric sequelae of
posttraumatic stress disorder in women. Archives of General Psychiatry, 54(1), 81–87.
Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. R. (2000). A second look at
comorbidity in victims of trauma: The posttraumatic stress disorder–major depres-
sion connection. Biological Psychiatry, 48(9), 902–909.
Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the inci-
dence of nicotine, alcohol and drug disorders in persons who have experienced
trauma. Archives of General Psychiatry, 60, 289–294.
Breslau, N., Kessler, R. C., Chilcoat, H. D., Schultz, L. R., Davis, G. C., & Andreski, P.
(1998). Trauma and posttraumatic stress disorder in the community: The 1996
Detroit Area Survey of Trauma. Archives of General Psychiatry, 55(7), 626–632.
Breslau, N., Lucia, V. C., & Alvarado, G. F. (2006). Intelligence and other predisposing
factors in exposure to trauma and posttraumatic stress disorder: A follow-up study at
age 17 years. Archives of General Psychiatry, 63(11), 1238–1245.
Breslau, N., Peterson, E., & Schultz, L. (2008). A second look at prior trauma and the
posttraumatic stress disorder-effects of subsequent trauma: A prospective epidemio-
logical study. Archives of General Psychiatry, 65(4), 431–437.
Breslau, N., Peterson, E. L., Poisson, L. M., Schultz, L. R., & Lucia, V. C. (2004a).
Estimating post-traumatic stress disorder in the community: Lifetime perspective
and the impact of typical traumatic events. Psychological Medicine, 34(5), 889–898.
12 Causes of Posttraumatic Stress Disorder 317
Breslau, N., Wilcox, H. C., Storr, C. L., Lucia, V. C., & Anthony, J. C. (2004b). Trauma
exposure and posttraumatic stress disorder: A study of youths in urban America.
Journal of Urban Health, 81(4), 530–544.
Brewin, C. R., Andrews, B., & Valentine, J. D. (2000). Meta-analysis of risk factors for
posttraumatic stress disorder in trauma-exposed adults. Journal of Consulting and
Clinical Psychology, 68(5), 748–766.
Casella, L., & Motta, R. W. (1990). Comparison of characteristics of Vietnam
veterans with and without posttraumatic stress disorder. Psychological Reports,
67(2), 595–605.
Dohrenwend, B. P., Turner, J. B., Turse, N. A., Adams, B. G., Koenen, K. C., &
Marshall, R. (2006). The psychological risks of Vietnam for U.S. veterans: A revisit
with new data and methods. Science, 313(5789), 979–982.
Ehlers, A., & Clark, D. M. (2000). A cognitive model of posttraumatic stress disorder.
Behaviour Research and Therapy, 38(4), 319–345.
Engdahl, B., Dikel, T. N., Eberly, R., & Blank, A., Jr. (1997). Posttraumatic stress
disorder in a community group of former prisoners of war: A normative response
to severe trauma. American Journal of Psychiatry, 154(11), 1576–1581.
Engelhard, I. M., van den Hout, M. A., & Kindt, M. (2003). The relationship between
neuroticism, pre-traumatic stress and post-traumatic stress: A prospective study.
Personality and Individual Differences, 35, 381–388.
Fergusson, D. M., & Lynskey, M. T. (1996). Adolescent resiliency to family adversity.
Journal of Child Psychology and Psychiatry, 37(3), 281–292.
Fullerton, J., Cubin, M., Tiwari, H., Wang, C., Bomhra, A., Davidson, S., et al. (2003).
Linkage analysis of extremely discordant and concordant sibling pairs identifies
quantitative-trait loci that influence variation in the human personality trait neuroti-
cism. Amercian Journal of Human Genetics, 72(4), 879–890.
Galea, S., Ahern, J., Resnick, H., Kilpatrick, D., Bucuvalas, M., Gold, J., et al. (2002).
Psychological sequelae of the September 11 terrorist attacks in New York City. New
England Journal of Medicine, 346(13), 982–987.
Gilbertson, M. W., Paulus, L. A., Williston, S. K., Gurvits, T. V., Lasko, N. B., Pitman,
R. K., et al. (2006). Neurocognitive function in monozygotic twins discordant for
combat exposure: Relationship to posttraumatic stress disorder. Journal of Abnormal
Psychology, 115(3), 484–495.
Goldstein, G., van Kammen, W., Shelly, C., Miller, D. J., & van Kammen, D. P. (1987).
Survivors of imprisonment in the Pacific theater during World War II. American
Journal of Psychiatry, 144(9), 1210–1213.
Goodwin, D. W., & Guze, S. B. (1984). Psychiatric diagnosis (3rd ed.). New York: Oxford
University Press.
Green, B. L., Lindy, J. D., & Grace, M. C. (1985). Posttraumatic stress disorder. Toward
DSM-IV. Journal of Nervous and Mental Disease, 173(7), 406–411.
Helzer, J. E. (1981). Methodological issues in the interpretations of the consequences
of extreme situations. In B. S. Dohrenwend & B. P. Dohrenwend (Eds.), Stressful life
events and their contexts: Monographs in psychosocial epidemiology (Vol. 2, pp. 108–129).
New York: Prodist.
Hyer, L., Braswell, L., Albrecht, B., Boyd, S., Boudewyns, P., & Talbert, S. (1994).
Relationship of NEO-PI to personality styles and severity of trauma in chronic
PTSD victims. Journal of Clinical Psychology, 50(5), 699–707.
Jones, E., & Wessely, S. (2005). War syndromes: The impact of culture on medically
unexplained symptoms. Medical History, 49(1), 55–78.
318 Causality and Psychopathology
McNally, R. J., & Shin, L. M. (1995). Association of intelligence with severity of post-
traumatic stress disorder symptoms in Vietnam combat veterans. American Journal
of Psychiatry, 152(6), 936–938.
Norris, F. H. (1992). Epidemiology of trauma: Frequency and impact of different
potentially traumatic events on different demographic groups. Journal of
Consulting and Clinical Psychology, 60(3), 409–418.
North, C. S., & Pfefferbaum, B. (2002). Research on the mental health effects of
terrorism. Journal of the American Medical Association, 288(5), 633–636.
Ozer, E. J., Best, S. R., Lipsey, T. L., & Weiss, D. S. (2003). Predictors of posttraumatic
stress disorder and symptoms in adults: A meta-analysis. Psychological Bulletin,
129(1), 52–73.
Parslow, R. A., Jorm, A. F., & Christensen, H. (2006). Associations of pre-trauma
attributes and trauma exposure with screening positive for PTSD: Analysis of
a community-based study of 2,085 young adults. Psychological Medicine, 36(3),
387–395.
Pitman, R. K., Orr, S. P., Lowenhagen, M. J., Macklin, M. L., & Altman, B. (1991). Pre-
Vietnam contents of posttraumatic stress disorder veterans’ service medical and
personnel records. Comprehensive Psychiatry, 32(5), 416–422.
Post, R. M., & Weiss, S. R. (1998). Sensitization and kindling phenomena in mood,
anxiety, and obsessive–compulsive disorders: The role of serotonergic mechanisms
in illness progression. Biological Psychiatry, 44(3), 193–206.
Reed, P. L., Anthony, J. C., & Breslau, N. (2007). Incidence of drug problems in young
adults exposed to trauma and posttraumatic stress disorder: Do early life experiences
and predispositions matter? Archives of General Psychiatry, 64(12), 1435–1442.
Resnick, H. S., Kilpatrick, D. G., Dansky, B. S., Saunders, B. E., & Best, C. L. (1993).
Prevalence of civilian trauma and posttraumatic stress disorder in a representative
national sample of women. Journal of Consulting and Clinical Psychology, 61(6),
984–991.
Sapolsky, R. M., Uno, H., Rebert, C. S., & Finch, C. E. (1990). Hippocampal damage
associated with prolonged glucocorticoid exposure in primates. Journal of
Neuroscience, 10(9), 2897–2902.
Sen, S., Burmeister, M., & Ghosh, D. (2004). Meta-analysis of the association between
a serotonin transporter promoter polymorphism (5-HTTLPR) and anxiety-related
personality traits. American Journal of Medical Genetics B Neuropsychiatric Genetics,
127(1), 85–89.
Shephard, B. (2001). A war of nerves: Soldiers and psychiatrists in the twentieth century.
Cambridge, MA: Harvard University Press.
Silva, R. R., Alpert, M., Munoz, D. M., Singh, S., Matzner, F., & Dummit, S. (2000).
Stress and vulnerability to posttraumatic stress disorder in children and adolescents.
American Journal of Psychiatry, 157(8), 1229–1235.
Simms, L. J., Watson, D., & Doebbeling, B. N. (2002). Confirmatory factor analyses of
posttraumatic stress symptoms in deployed and nondeployed veterans of the Gulf
War. Journal of Abnormal Psychology, 111(4), 637–647.
Slade, T., & Watson, D. (2006). The structure of common DSM-IV and ICD-10 mental
disorders in the Australian general population. Psychological Medicine, 36(11),
1593–1600.
Solomon, Z., Mikulincer, M., & Jakob, B. R. (1987). Exposure to recurrent combat
stress: Combat stress reactions among Israeli soldiers in the Lebanon war.
Psychological Medicine, 17(2), 433–440.
320 Causality and Psychopathology
Stein, M. B., Walker, J. R., Hazen, A. L., & Forde, D. R. (1997). Full and partial
posttraumatic stress disorder: Findings from a community survey. American
Journal of Psychiatry, 154(8), 1114–1119.
Storr, C. L., Ialongo, N. S., Anthony, J. C., & Breslau, N. (2007). Childhood antecedents
of exposure to traumatic events and posttraumatic stress disorder. American Journal
of Psychiatry, 164(1), 119–125.
Talbert, F. S., Braswell, L. C., Albrecht, J. W., Hyer, L. A., & Boudewyns, P. A. (1993).
NEO-PI profiles in PTSD as a function of trauma level. Journal of Clinical Psychology,
49(5), 663–669.
Terr, L. C., Bloch, D. A., Michel, B. A., Shi, H., Reinhardt, J. A., & Metayer, S. (1999).
Children’s symptoms in the wake of Challenger: A field study of distant-traumatic
effects and an outline of related conditions. American Journal of Psychiatry, 156(10),
1536–1544.
Watson, D. (2005). Rethinking the mood and anxiety disorders: A quantitative hier-
archical model for DSM-V. Journal of Abnormal Psychology, 114(4), 522–536.
Yehuda, R., & McFarlane, A. C. (1995). Conflict between current knowledge about
posttraumatic stress disorder and its original conceptual basis. American Journal of
Psychiatry, 152(12), 1705–1713.
Yehuda, R., McFarlane, A. C., & Shalev, A. Y. (1998). Predicting the development of
posttraumatic stress disorder from the acute response to a traumatic event. Biological
Psychiatry, 44(12), 1305–1313.
Yehuda, R., Resnick, H. S., Schmeidler, J., Yang, R. K., & Pitman, R. K. (1998).
Predictors of cortisol and 3-methoxy-4-hydroxyphenylglycol responses in the acute
aftermath of rape. Biological Psychiatry, 43(11), 855–859.
Young, A. (1995). The harmony of illusions: Inventing post-traumatic stress disorder.
Princeton, NJ: Princeton University Press.
Young, A. (2001). Our traumatic neurosis and its brain. Science in Context, 14(4), 661–683.
Zammit, S., Allebeck, P., David, A. S., Dalman, C., Hemmingsson, T., Lundberg, I.,
et al. (2004). A longitudinal study of premorbid IQ score and risk of developing
schizophrenia, bipolar disorder, severe depression, and other nonaffective psychoses.
Archives of General Psychiatry, 61(4), 354–360.
Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the inci-
dence of nicotine, alcohol and drug disorders in persons who have experienced
trauma. Archives of General Psychiatry, 60, 289–294.
13
Terms such as disorder, illness, disease, dysfunction, and deviance embody the pre-
conceptions of historical development (Klein, 1999). That individuals become ill
for no apparent reason, suffering from pain, dizziness, malaise, rash, wasting,
etc., has been known since prehistoric days. The recognition of illness led to the
social definition of the patient and the development of various treatment
institutions (e.g., nursing, medicine, surgery, quacks, and faith healers).
Illness is an involuntary affliction that justifies the sick, dependent role
(Parsons, 1951). That is, because the sick have involuntarily impaired func-
tioning, it is a reasonable social investment to exempt them (at least tem-
porarily) from normal responsibilities. Illness implies that something has
gone wrong. However, gaining exemption from civil or criminal responsibil-
ities is often desired. Therefore, if no objective criteria are available, an illness
claim can be viewed skeptically. By affirming involuntary affliction, diagnosis
immunizes the patient against charges of exploitative parasitism.
Therefore, illness may be considered a hybrid concept, with two compo-
nents: (1) the necessary inference that something has actually, involuntarily,
gone wrong (disease) and (2) the qualification that the result (illness) must be
sufficiently major, according to current social values, to ratify the sickness
exemption role. The latter component is related to the particular historical
stage, cultural traditions, and values. This concept has been exemplified by
the phrase ‘‘harmful dysfunction’’ (Wakefield, 1992).
However, this does not mean that the illness concept is arbitrary since the
inference that something has gone wrong is necessary. Beliefs as to just what
has gone wrong (e.g., demon possession, bad air, bacterial infection) as well
321
322 Causality and Psychopathology
as the degree of manifested dysfunction that warrants the sick role reflect the
somewhat independent levels of scientific and social development (for further
reference, see Lewis, 1967).
How can we affirm that something has gone wrong if there is no objective
evidence? The common statistical definition of abnormality simply is
‘‘unusual.’’ Something is abnormal if it is rare. Although biological variability
ensures that someone is at an extreme, there is a strong presumption that
something has gone wrong if sufficiently extreme. For instance, hemoglobin
of 5g/100ml exceeds normal biological variation, indicating that something
has gone wrong. Therefore, infrequency (e.g., dextrocardia) usefully indicates
that something is probably wrong but is not sufficient (e.g., left-handedness)
or necessary (e.g., dental caries). A mysterious shift from well-being to pain
and manifest dysfunction strongly indicates that something has gone wrong.
That such distressing states may remit affirms that somehow repair had
come. What has gone wrong is a deviation from an implicit standard, for-
mulated by the evolutionary theory of adaptive functions and dysfunctions
(Millikan, 1993; Klein, 1993, 1999).
Medical diagnosis was placed on a firmer footing by Sydenham in the
seventeenth century by the concept of syndromes, forms of ill health compar-
able to the types of animal and vegetable species in terms of symptoms and
course, for example, gout and rheumatoid arthritis. A symptom complex was
more than a concatenation of symptoms and signs. It implied some common
latent cause distinct from those supporting ordinary health, even if such
causes were unknown. Kraepelin made use of syndromes in distinguishing
dementia praecox from manic–depressive illness by initially arguing that the
different symptom complexes provided firm prognostic differences. ‘‘Points
of rarity’’ are not necessary to differentiate syndromes and even if a latent
cause is entirely categorical, its manifestations may not evidence bimodality
(Murphy, 1964).
Since around the middle of the nineteenth century, disease has become
defined by objectively demonstrated etiology and pathology, thanks to crucial
discoveries made by scientists such as Pasteur and Virchow. Causal analysis
allowed diagnostic progression past simple syndromal definition by objec-
tively elucidating necessary etiologies. Evident examples can be found
among infectious diseases and avitaminoses.
This is not the case in psychiatry. Attempting to find objective differences
(biomarkers) between normal subjects and subjects with various psychiatric
syndromes has been the overriding focus of biological psychiatry. Beset by
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 323
Syndromal Heterogeneity
Psychiatric disorders are largely familial. Twin, adoption, and related study
designs indicate that syndromal familiality is largely genetic (or due to gene-
by-environment interactions). Therefore, the tremendous advances in mole-
cular genetics and genomics raised hopes for objective diagnostic genetic
tests, revealed by such methods as linkage and association studies.
Unfortunately, increasing disappointment set in once molecular genetic
research focused on highly familial but non-Mendelian syndromes. Despite
remarkably significant statistical associations found in individual studies,
there have been repeated failures of replication (Riley & Kendler, 2006).
Further, as seen in the example of Huntington disease, gene identification
does not necessarily lead to advances in treatment and/or more complete
knowledge of pathophysiology.
The lack of replicability of many genomic studies as well as the low-
magnitude effect estimates reported highlight the central problem of syndro-
mal heterogeneity (Bodmer, 1981). Smoller and Tsuang (1998, p. 1152) clearly
state the problem and suggest a straightforward solution:
However, this may not be feasible. As Crow (2007, p. 13) states, ‘‘Recent meta-
analyses have not identified consistent sites of linkage. The three largest studies
of schizophrenia fail to agree on a single locus . . . there is no replicable
support for any of the current candidate genes.’’ Although Crow’s remarks
324 Causality and Psychopathology
However, Flint and Munafo (2006) criticize this optimistic assumption in their
detailed meta-analytic review of human and animal data. Endophenotypes
appear to be only on a par with genetic biomarkers, which blunts optimism.
These conclusions as well as prior assumptions of enhanced endopheno-
type utility for genetic analysis are limited by the paucity of studies
specifically addressing differential endophenotypic utility. Unfortunately,
Flint and Munafo’s critique has not, as yet, been widely discussed in the
literature.
surprising that there are any consistently evident symptom complexes at all.
However, certain syndromes (e.g., mania, melancholia, depressive disorder,
obsessive–compulsive disorder, and panic disorder) have been stereotypically
described for centuries, albeit under different labels, in many places and
languages.
Even more cogent, syndromal decomposition into a heap of independent
dysfunctions, each underlying a particular syndromal facet, is inconsistent
with total syndromal remission and recurrence. The observation of surprising
remissions, periods of apparent health, and recurrences was facilitated by the
long-term mental hospitals, where remission and discharge was a notable
event. Further, since there was often only one available hospital, relapses
could be noted. Falret in 1854 (as reprinted in Pichot, 2006, p. 145) described
‘‘Circular insanity [Folie circulaire] . . . characterized by the successive and
regular reproduction of the manic state, the melancholic state, and a more or
less prolonged lucid interval.’’ This description anticipated Kraepelin’s more
inclusive concept of manic–depressive disorder. Both syndromal descriptions
emphasized periods of remission as an essential diagnostic element.
Remissions and relapses occur in many psychiatric and general medical
illnesses (gout, intermittent porphyria, etc.). Since it is highly improbable that
multiple independent causes should concertedly cease, the inference that
complex syndromes have multiple independent causes is implausible.
Those diverse syndromes are recognizable, familial, and extraordinarily dif-
ferent from ordinary health and behavior. This is consonant with Sydenham’s
hypothesis that each syndrome has a common underlying proximal cause,
even if there is no common distal genetic defect. However, the frequent
reliable recognition, since Sydenham, of quite distinct syndromes implies
that multiple small genetic contributions become manifest by taking different
routes to impairing, perhaps in several ways, a distinct evolved function—
which may generate a distinct syndrome (Klein & Stewart, 2004). The argu-
ment is not that all complex psychiatric presentations evidence periods of
total remission; rather, it is logically incorrect to assume that a symptom
complex must be due to a group of independent endophenotypes.
‘‘Comorbidity’’ suggests sequential and/or interactive causal processes.
However, the argument that one aspect of a complex syndrome is likely to
be the direct manifestation of an endophenotype is not logically supported
and unlikely to pay off. For instance, the sudden onset of an apparently
spontaneous panic attack causes an immediate flight to help. With the repeti-
tion of such attacks, chronic anticipatory anxiety often develops. Panic attacks
are often followed by avoidant and dependent measures, misleadingly
referred to as ‘‘agoraphobia.’’
Six weeks of imipramine treatment prevents spontaneous panic attacks.
However, chronic anticipatory anxiety and phobic avoidances remit more
326 Causality and Psychopathology
History provides many examples of how practice precedes and enables theory.
Artificial selection by culling unwanted hereditary traits and inbreeding
desired traits was essential to Darwin’s formulation of evolution by natural
selection. Remarkably, studies of pathology led to the discovery of unsus-
pected normal functions. Clinical studies of scurvy, beriberi, and pellagra
led to treatment with nutritional supplements, discovery of specific vitamins,
and discovery of enzymatic cofactors. The serendipitous observation of
cowpox-induced immunity to smallpox led to vaccination, while the study
of beverage contamination led to pasteurization. Germ theory, bacteriology,
immunology, and other evolved mechanisms of resistance to infection fol-
lowed. This list could be extended indefinitely. Empirical therapies often
illuminate dysfunctions, thus bringing unknown normal functions into sight.
Conclusion
DSM for quite a while. The large difficulty is that neither NIH nor industry
nor the APA DSM process supports such study designs, especially of
marketed medications.
Achieving the necessary long-term support may depend on the realization
that current genomic and brain-imaging efforts are unlikely to succeed in
resolving nosological ambiguities because syndrome and genetic heterogene-
ity defeats group contrast and correlative studies. Our suggestion is to sub-
stantially diminish heterogeneity by objectively identifying specific
pharmacotherapeutic responders through intensive design. We argue that
major psychotropic drug effects depend on normalization of proximal patho-
physiology. Objective predictors of specific medication remission that are
also specifically treatment-responsive must be causally relevant to the under-
lying pathophysiology; these predictors are central clues to both pathophysiol-
ogy and drug response. Finally, studying known effective agents hastens
this goal. This is worth emphasis as it affords a strong basis for program-
matic support. Such specific objective signs would improve psychiatric differ-
ential diagnosis beyond both the current clinical consensus and biomarker
approaches.
References
Adly, C., Straumanis, J., & Chesson, A. (1992). Fluoxetine prophylaxis of migraine.
Headache, 32, 101–104.
Bodmer, W. F. (1981). Gene clusters, genome organization and complex phenotypes.
When the sequence is known, what will it mean? American Journal of Human
Genetics, 33, 664–682.
Braff, D. L., Freedman, R., Schork, N. J., & Gottesman, I. I. (2007). Deconstructing
schizophrenia: An overview of the use of endophenotypes in order to understand a
complex disorder. Schizophrenia Bulletin, 33, 21–32.
Chassan, J. B. (1967). Research design in clinical psychology and psychiatry. New York:
Appleton-Century-Crofts.
Crow, T. J. (2007). How and why genetic linkage has not solved the problem of
psychosis: Review and hypothesis. American Journal of Psychiatry, 30, 13–21.
de Visser, S. J., van der Post, J., Pieters, M. S., Cohen, A. F, & van Gerven, J. M.
(2001). Biomarkers for the effects of antipsychotic drugs in healthy volunteers.
British Journal of Clinical Pharmacology, 51, 119–132.
Dimascio, A., Havens, L. L., & Klerman, G. L. (1963a). The psychopharmacology of
phenothiazine compounds: A comparative study of the effects of chlorpromazine,
promethazine, trifluoperazine and perphenazine in normal males. I. Introduction,
aims and methods. Journal of Nervous and Mental Disease, 136, 15–28.
Dimascio, A., Havens, L. L., & Klerman, G. L. (1963b). The psychopharmacology of
phenothiazine compounds: A comparative study of the effects of chlorpromazine,
promethazine, trifluoperazine, and perphenazine in normal males. II. Results and
discussion. Journal of Nervous and Mental Disease, 136, 168–186.
336 Causality and Psychopathology
Dumont, G. J., de Visser, S. J., Cohen, A. F., & van Gerven, J. M.; Biomarker Working
Group of the German Association for Applied Human Pharmacology. (2005).
Biomarkers for the effects of selective serotonin reuptake inhibitors (SSRIs) in
healthy subjects. British Journal of Clinical Pharmacology, 59, 495–510.
Flint, J., & Munafo, M. R. (2006). The endophenotype concept in psychiatric genetics.
Psychological Medicine, 37, 163–180.
Fyer, A. J., Hamilton, S. P., Durner, M., Haghighi, F., Heiman, G. A., Costa, R., et al.
(2006). A third-pass genome scan in panic disorder: Evidence for multiple suscept-
ibility loci. Biological Psychiatry, 60(4), 388–401.
Gottesman, I. I., & Gould, T, D. (2003). The endophenotype concept in psychiatry:
Etymology and strategic intentions. American Journal of Psychiatry, 160, 636–645.
Jorgensen, O. S., Lober, M., Christiansen, J., & Gram, L. F. (1980). Plasma concentra-
tion and clinical effect in imipramine treatment of childhood enuresis. Clinical
Pharmacokinetics, 5, 386–393.
Judd, L. L., Hubbard, B., Janowsky, D. S., Huey, L. Y., & Attewell, P. A. (1979). The
effect of lithium carbonate on affect, mood, and personality of normal subjects.
Archives of General Psychiatry, 36, 860–866.
Klein, D. F. (1964a). Behavioral effects of imipramine and phenothiazines:
Implications for a psychiatric pathogenic theory and theory of drug action. In J.
Wortis (Ed.), Recent advances in biological psychiatry (Vol. VII, pp. 273–287). New
York: Plenum Press.
Klein, D. F. (1964b). Delineation of two drug-responsive anxiety syndromes.
Psychopharmacologia, 5, 397–408.
Klein, D. F. (1967). Importance of psychiatric diagnosis in prediction of clinical drug
effects. Archives of General Psychiatry, 16(1), 118–126.
Klein, D. F. (1978). A proposed definition of mental illness. In R. Spitzer, D. F. Klein
(Eds.), Critical Issues in Psychiatric Diagnosis (pp. 41–71). New York: Raven Press.
Klein, D. F. (1988). Cybernetics, activation, and drug effects. Acta Psychiatrica
Scandinavica Supplementum, 341, 126–137.
Klein, D. F. (1993). False suffocation alarms, spontaneous panics, and related condi-
tions; an integrative hypothesis. Archives of General Psychiatry, 50, 306–317.
Klein, D. F. (1999). Harmful dysfunction, disorder, disease, illness, and evolution.
Journal of Abnormal Psychology, 108, 421–429.
Klein, D. F., Gittelman, R., Quitkin, F., & Rifkin, A. (Eds.). (1980). Diagnosis and drug
treatment of psychiatric disorders: Adults and children (2nd ed.). Baltimore, MD:
Williams & Wilkins.
Klein, D. F., Ross, D. C., & Cohen, P. (1987). Panic and avoidance in agoraphobia:
Application of PATH analysis to treatment studies. Archives of General Psychiatry,
44(3), 377–385.
Klein, D. F., & Stewart, J. (2004). Genes and environment: Nosology and psychiatry.
Neurotoxicity Research, 6(1), 11–15.
Knutson, B., Wolkowitz, O. M., Cole, S. W., Chan, T., Moore, E. A., Johnson, R. C.,
et al. (1999). Selective alteration of personality and social behavior by serotonergic
intervention. American Journal of Psychiatry, 155, 373–379.
Kupfer, D. J., First, M. B., & Regier, D. A. (Eds.). (2002). A research agenda for DSM-V.
Washington DC: American Psychiatric Association.
Lewis, Aubrey (1967). The state of psychiatry: essays and addresses. London: Routledge
and Kegan Paul.
Loubinoux, I., Tombari, D., Pariente, J., Gerdelat-Mas, A., Franceries, X., Cassol, E.,
et al. (2005). Modulation of behavior and cortical motor activity in healthy subjects
by a chronic administration of a serotonin enhancer. NeuroImage, 27, 299–313.
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 337
McGrath, P. J., & Klein, D. F. (1983). Heuristically important mood altering drugs. In
J. Angst (Ed.), The origins of depression: current concepts and approaches (pp. 331–349).
New York: Springer-Verlag.
McGrath, P. J., Stewart, J. W., Petkova, E., Quitkin, F. M., Amsterdam, J. D., Fawcett,
J., et al. (2000). Predictors of relapse during fluoxetine continuation or maintenance
treatment of major depression. Journal of Clinical Psychiatry, 61, 518–524.
Meehl, P. E. (1990). Appraising and amending theories: The strategy of lakatosian
defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.
Meehl, P. E. (1992). Factors and taxa, traits and types, difference of degree and differ-
ences in kind. Journal of Personality, 60, 117–174.
Murphy, E. A. (1964). One cause? Many causes? The argument from the bimodal
distribution. J. chronic Dis., 17, 301–324.
Pace-Schott, E. F., Gersh, T., Silvestri, R., Stickgold, R., Salzman, C., & Hobson, A. J.
(2001). SSRI treatment suppresses dream recall frequency but increases subjective
dream intensity in normal subjects. Sleep Research, 10, 129–142.
Parsons, T. (1951). The social system. New York: Free Press.
Pichot, P. (2006). Tracing the origins of bipolar disorder: From Falret to SM-IV and
ICD-10. Journal of Affective Disorders, 96, 145–148.
Preter, M., & Klein, D. F. (2008). Panic, suffocation false alarms, separation anxiety
and endogenous opioids. Progress in Neuropsychopharmacology and Biological
Psychiatry, 32, 603–612.
Quitkin, F. M., McGrath, P. J., Stewart, J. W., Harrison, W., Wager, S. G., Nunes, E.,
et al. (1989). Phenelzine and imipramine in mood reactive depressives: Further
delineation of the syndrome of atypical depression. Archives of General Psychiatry,
46(9), 787–793.
Rapoport, J. L., Mikkelsen, E. J., Zavadil, A., Nee, L., Gruenau, C., Mendelson, W.,
et al. (1980). Childhood enuresis. II. Psychopathology, tricyclic concentration in
plasma, and antienuretic effect. Archives of General Psychiatry, 37, 1146–1152.
Riley, B., & Kendler, K. S. (2006). Molecular genetic studies of schizophrenia. European
Journal of Human Genetics, 14, 669–680.
Rosenzweig, P., Canal, M., Patat, A., Bergougnnan, L., Zieleniuk, I., & Bianchetti, G.
(2002). A review of the pharmacokinetics, tolerability and pharmacodynamics of
amisulpride in healthy volunteers. Human Psychopharmacology, 17, 1–13.
Rutter, M. (2007). Gene–environment interdependence. Developmental Science, 10, 12–18.
Satel, S. L., & Nelson, J. C. (1989). Stimulants in the treatment of depression: A critical
overview. Journal of Clinical Psychiatry, 50, 241–249.
Smoller, J. W., & Tsuang, M. T. (1998). Panic and phobic anxiety: Defining phenotypes
for genetic studies. American Journal of Psychiatry, 155, 1152–1162.
Stewart, J. W., McGrath, P. J., Quitkin, F. M., & Klein, D. F. (2007). Atypical depres-
sion: Current status and relevance to melancholia. Acta Psychiatrica Scandinavica
Supplementum, 433, 58–71.
Thirion, B., Pinel, P., Meriaux, S., Roche, A., Dehaene, S., & Poline, J. (2007). Analysis
of a large fMRI cohort: Statistical and methodological issues for group analyses.
NeuroImage, 35, 105–120.
Wakefield, J. C. (1992). Disorder as harmful dysfunction: A conceptual critique of
DSM-III-R’s definition of mental disorder. Psychological Review, 99(2), 232–247.
Zitrin, C. M., Klein, D. F., & Woerner, M. G. (1978). Behavior therapy, supportive
psychotherapy, imipramine, and phobias. Archives of General Psychiatry, 35(3),
307–316.
14
Introduction
338
14 The Need for Dimensional Approaches 339
They can then apply formal mathematics to model the resulting data, thereby
instantiating scientific theories in empirical data. The situation is very similar
in psychopathology research. We cannot induce psychopathology in human
beings, but given a group of persons who differ in their psychopathology
status, we can see if psychopathology status covaries with other variables
(e.g., test performance, physiology, genes, family history, developmental
antecedents).
The definition of psychopathology status is a fundamental and historically
vexing issue. For much of the history of our discipline, such definitions were
highly chaotic because different investigators meant different things when
they used the same label. The solution to this problem has been to imple-
ment a system that solves this definitional problem by providing consensus
definitions that draw on the opinions of diverse experts. This is the diagnostic
system that originated with DSM-III and has continued forward in much the
same form in DSM-III-R and DSM-IV.
The modern DSMs have been indispensable in psychopathology research
because, to a large extent, we know what other researchers mean when they
say they are studying a specific DSM diagnosis. Nevertheless, the modern
DSMs embody an important assumption, namely, that all mental disorders
are an either/or matter. Each diagnostic construct described in recent DSMs
is a polythetic category. That is, for each diagnosis, multiple criteria are listed,
and a certain combination of criteria indicates membership in a category of
mental disorder, whereas not having those criteria indicates membership in
the complementary nondisordered group.
The practical needs for dichotomous categorical psychopathology labels
(e.g., for third-party payment purposes) have been acknowledged elsewhere
(First, 2005; Krueger & Markon, 2006b). However, if our goal is scientific—to
understand the origins and nature of psychopathology—dimensional psycho-
pathology constructs are indispensable (Helzer, Kramer, & Krueger, 2006). A
fundamental reason for this is that dichotomous variables (e.g., presence vs.
absence of a mental disorder) contain less information than variables that can
take on more values (e.g., how much a research participant resembles a
mental disorder prototype on a multipoint scale) (Kraemer, Noda, &
O’Hara, 2004; MacCallum, Zhang, Preacher, & Rucker, 2002). This means
that many more research participants are needed to discern the correlates of
a dichotomous psychopathology construct, as opposed to a continuous psy-
chopathology construct. The literal ‘‘costs’’ of dichotomization can be quite
profound, if one thinks of the problem in terms of the finite amount of
money available for research on psychopathology. We can therefore achieve
greater research traction with less money by using dimensional constructs
because we do not need as many research participants to ask key research
questions.
342 Causality and Psychopathology
As can be seen from the foregoing sections, there is now a nontrivial corpus
of research on psychopathology from a dimensional perspective. This seems
particularly remarkable given the exclusively categorical nature of mental
disorders as defined in the modern DSMs. In recognition of this burgeoning
dimensional literature, the American Psychiatric Institute for Research and
Education (APIRE) organized a meeting in July 2006 to discuss a research
agenda for contemplating the inclusion of dimensions throughout the
upcoming DSM-5 (Helzer et al., 2008). Although the primary sources
should be consulted to understand the numerous ideas discussed at the
meeting, a general consensus was that the DSM-5 could benefit from the
explicit inclusion of dimensional elements in many areas of psychopathology.
the history of the idea that there are ‘‘specific genes for specific psychopathol-
ogies,’’ with its implication that causal genes will operate in a straightforward
Mendelian manner (e.g., there is one relevant gene, and it has two forms,
mutated/disease-causing and nonmutated, and the etiologic effect of the
mutated form is insensitive to environmental inputs). Although there are
some human neuropsychiatric diseases where the etiology can be understood
in this way (e.g., Huntington disease), Kendler (2005) concluded that genetic
effects on most psychopathological conditions are not likely to be this
straightforward. Rather, genetic effects on psychopathology are likely smaller,
many genes are likely relevant, and these genes are likely sensitive to
environmental inputs.
As with the complexity of psychopathological phenotypes, this etiologic
complexity can also be usefully parsed using dimensional-structural
approaches. Indeed, the dimensional structure of etiologic factors may resem-
ble the structure of psychopathology itself, a finding that breaks down the
conceptual barrier between ‘‘cause’’ (or etiology) and ‘‘effect’’ (psychopatho-
logical phenotypes). To pick one example, twin research on the externalizing
spectrum shows that the genetic effects on individual DSM disorders invol-
ving antisocial behavior and substance dependence are largely (but not exclu-
sively) in common, and this common genetic risk can be well-modeled as a
dimension (Krueger et al., 2002; Kendler, Prescott, Myers, & Neale, 2003;
Young, Stallings, Corley, Krauter, & Hewitt, 2000). This genetic risk dimen-
sion represents the effects of numerous individual genes that increase the
probability of psychopathology in concert; as such, it provides a compelling
target for identifying specific genetic polymorphisms that increase the risk
for psychopathology. This strategy appears to have greater traction for identi-
fying relevant polymorphisms when compared with a strategy aimed at
detecting putatively separate and dichotomous genetic effects on putatively
separate and dichotomous externalizing disorders (see Dick, 2007). The gen-
eral point is that dimensional thinking can extend usefully beyond nosology
to also encompass thinking about etiology. We look forward to seeing if a
dimensional perspective can get us closer to understanding not only what
psychopathology is but also where it comes from.
References
Aggen, S. H., Neale, M. C., & Kendler, K. S. (2005). DSM criteria for major depression:
Evaluating symptom patterns using latent-trait item response models. Psychological
Medicine, 35, 475–487.
Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7,
249–253.
350 Causality and Psychopathology
Note: page numbers followed by ‘‘f ’’ and ‘‘t’’ denote figures and tables, respectively.
353
354 Index
Validity
construct, 41–42 Zanna, M. P., 19
external, 42–43 Zuckerkandl, E., 214