Causality and Psychopathology Finding The Determinants of Disorders and Their Cures American Psychopathological Association PDF

Causality and Psychopathology
american psychopathological association

Volumes in the Series
Causality and Psychopathology: Finding the Determinants of
Disorders and Their Cures (Shrout, Keyes, and Ornstein, eds.)
Mental Health in Public Health (Cottler, ed.)
Trauma, Psychopathology, and Violence: Causes, Correlates,
or Consequences (Widom)
FINDING THE DETERMINANTS OF DISORDERS
AND THEIR CURES
EDITED BY
patrick e. shrout, ph.d.

Professor of Psychology
Department of Psychology
New York University
New York, NY
katherine m. keyes, ph.d., mph

Columbia University Epidemiology Merit Fellow,
Department of Epidemiology
Columbia University
New York, NY
katherine ornstein, mph

Department of Epidemiology
Mount Sinai School of Medicine
New York, NY
1
2011
1
Oxford University Press
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright 2011 by Oxford University Press, Inc.
Published by Oxford University Press, Inc.
198 Madison Avenue, New York, New York 10016
www.oup.com
Oxford is a registered trademark of Oxford University Press
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.
____________________________________________
Library of Congress Cataloging-in-Publication Data
American Psychopathological Association. Meeting (98th : 2008 : New York, N.Y.)

Causality and psychopathology / edited by Patrick E. Shrout, Katherine
M. Keyes, Katherine Ornstein.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-0-19-975464-9 (alk. paper)
1. Psychology, Pathological—Etiology—Congresses. 2. Psychiatry—Research—
Methodology—Congresses. I. Shrout, Patrick E. II. Keyes, Katherine M.
III. Ornstein, Katherine. IV. Title.
[DNLM: 1. Mental Disorders—epidemiology—United States—Congresses.
2. Mental Disorders—etiology—United States—Congresses. 3. Psychopathology—
methods—United States—Congresses. WM 140]
RC454.A4193 2008
616.89—dc22
2010043586
ISBN-13: 978-0-19-975464-9
____________________________________________
Printed in USA
on acid-free paper
Preface
Research in psychopathology can reveal basic insights into human biology,

psychology, and social structure; and it can also lead to important interven-
tions to relieve human suffering. Although there is sometimes tension
between basic and applied science, the two are tied together by a fascination
with causal explanations. Basic causal stories such as how neurotransmitters
change brain function ‘‘downstream’’ are always newsworthy, even if the story
is about a mouse or rat brain. However, applications of causal understanding
to create efficacious prevention or intervention programs are also exciting.
Although good causal stories make the headlines, many psychopathology
researchers collect data that are descriptive or correlational in nature.
However, for decades epidemiologists have worked with such nonexperimen-
tal data to formulate causal explanations about the etiology and course of
disorders. Even as these explanations have been reported in textbooks and
have been used in courts of law to settle claims of responsibility for adverse
conditions, they have also been criticized for going too far. Indeed, many
scientists shy away from using explicit causal language when reporting obser-
vational data, to avoid being criticized for lack of rigor. Nonetheless, the
subtext in the reports of associations and developments always implies
causal mechanisms.
Because of the widespread interest in causal explanation, along with con-
cerns about what kinds of causal claims can be made from survey data,
longitudinal studies, studies of genetic relationships, clinical observations,
and imperfect clinical trials, the American Psychopathological Association
decided to organize its 2008 scientific meeting around the topic of causality
and psychopathology research. Those who were invited to speak at the 2.5-day
conference included authors of influential works on causation, statisticians
whose new methods are informative about causal processes, as well as
experts in psychopathology. This volume contains revised and refined ver-
sions of the papers presented by the majority of the invited speakers at
that unique meeting.
Not all of the authors have done work in psychopathology research, and
not all have previously written explicitly about causal inference. Indeed, the
goal of the meeting and this volume is to promote new creative thinking
v
Preface
about how causal inference can be promoted in psychopathology research in

the years to come. Moreover, the collection is likely to be of interest to
scientists working in other areas of medicine, psychology, and social science,
especially those who combine experimental and nonexperimental data in
building their scientific literature.
The volume is divided into three sections. The first section, ‘‘Causal
Theory and Scientific Inference,’’ contains contributions that address cross-
cutting issues of causal inference. The first two chapters introduce conceptual
and methodological issues that thread through the rest of the volume, while
the third chapter provides a formal framework for making and examining
causal claims. The fourth chapter introduces genetic analysis as a kind of
prototype of causal thinking in psychopathology, in that we know that the
variation in the genotype can lead to variation in the phenotype but not vice
versa. The author also argues for the practical counterfactual thinking
implied by the ‘‘interventionist’’ approach to causal inference developed by
J. Woodward and colleagues. The final chapter in this section provides a
stimulating illustration of the dramatically different inferences one can
reach from observational studies and clinical trials from the Women’s
Health Initiative. The focus of this chapter is the effect of hormone-
replacement therapy on coronary heart disease, as well on the risk of several
forms of cancer. Because this example did not have to face the difficulties of
diagnosis and nosology that confront so much psychopathology research, the
authors were able to focus on important issues such as selection bias and
heterogeneity of effects in reconciling data from trials and observational
studies.
The second section, ‘‘Innovations in Methods,’’ presents new tools and
perspectives for exploring and supporting causal theories in epidemiology.
Although the substantive focus is psychopathology research, the methods
are generally applicable for other areas of medicine. The first chapter in
this section (Chapter 6) proposes a novel but formal analysis of causal
claims that can be made about direct and indirect causal paths using graphi-
cal methods. Chapter 7 describes a statistical method called ‘‘growth mixture
modeling,’’ which can examine a variety of hypotheses about longitudinal
data that arise out of causal theories. Chapter 8 describes new ways to
improve the efficiency of clinical trials by providing causally relevant infor-
mation. The last two chapters (Chapters 9 and 10) provide insights into how
naturally occurring genetic variation can be leveraged to strengthen infer-
ences made about both genetic and environmental causal paths in
psychopathology.
The final section, ‘‘Causal Thinking in Psychiatry,’’ features critical ana-
lyses of causal claims within psychiatry by some of the best known psycho-
pathology researchers. These chapters examine claims in developmental
vi
Preface
psychopathology (Chapter 11), posttraumatic stress disorder (Chapter 12),

research on therapeutics (Chapter 13), and nosology (Chapter 14).
The convergence of this diverse and talented group to one meeting and
one volume was facilitated by active involvement of the officers and Council
of the American Psychopathological Association (APPA) during 2008. We
particularly thank Ezra Susser, the secretary of APPA, who was especially
generative in planning the meeting and, therefore, the volume. We also
acknowledge the valuable suggestions made by the other officers and coun-
cilors of APPA: James J. Hudziak, Darrel A. Regier, Linda B. Cottler, Michael
Lyons, Gary Heiman, John E. Helzer, Catina O’Leary, Lauren B. Alloy, John
N. Constantino, and Charles F. Zorumski. The meeting itself benefited
enormously from the special efforts of Gary Heiman and Catina O’Leary,
and it was supported by the National Institute of Mental Health through
grant R13 MH082613.
This volume is dedicated to Lee Nelkin Robins, a former president of
APPA who attended her last meeting in 2008. She died on September 25,
2009. Trained in sociology, Lee Robins made essential contributions to the
understanding of the development and distribution of mental disorders,
particularly antisocial and deviant behavior as a precursor of later problems.
Her rigorous causal thinking was informed by epidemiological data, and she
was instrumental in improving the quality and quantity of such data over the
course of her productive life.
vii
This page intentionally left blank
Contents
Contributors xi
part i causal theory and scientific inference

1 Integrating Causal Analysis into Psychopathology Research 3
patrick e. shrout, phd
2 What Would Have Been Is Not What Would Be 25

Counterfactuals of the Past and Potential Outcomes of the Future
sharon schwartz, phd, nicolle m. gatto, phd, and ulka b. campbell, phd
3 The Mathematics of Causal Relations 47

judea pearl, phd
4 Causal Thinking in Psychiatry 66

A Genetic and Manipulationist Perspective
kenneth s. kendler, md
5 Understanding the Effects of Menopausal Hormone Therapy 79

Using the Women’s Health Initiative Randomized Trials and
Observational Study to Improve Inference
garnet l. anderson, phd, and ross l. prentice, phd
part ii innovations in methods

6 Alternative Graphical Causal Models and the Identification
of Direct Effects 103
james m. robins, md, and thomas s. richardson, phd
ix
Contents
7 General Approaches to Analysis of Course 159

Applying Growth Mixture Modeling to Randomized
Trials of Depression Medication
bengt muthén, phd, hendricks c. brown, phd, aimee m. hunter, phd,
ian a. cook, md, and andrew f. leuchter, md
8 Statistical Methodology for a SMART Design in the

Development of Adaptive Treatment Strategies 179
alena i. oetting, ms, janet a. levy, phd,
roger d. weiss, md, and susan a. murphy, phd
9 Obtaining Robust Causal Evidence From Observational Studies 206

Can Genetic Epidemiology Help?
george davey smith, md, dsc
10 Rare Variant Approaches to Understanding the Causes of

Complex Neuropsychiatric Disorders 252
matthew w. state, md, phd
part iii causal thinking in psychiatry

11 Causal Thinking in Developmental Disorders 279
e. jane costello, phd, and adrian angold, mrcpsych
12 Causes of Posttraumatic Stress Disorder 297

naomi breslau, phd
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 321

A Programmatic Approach in Therapeutic Context
donald f. klein, md, dsc
14 The Need for Dimensional Approaches in Discerning the

Origins of Psychopathology 338
robert f. krueger, phd, and daniel goldman
Index 353
Contributors
garnet l. anderson, ph.d e. jane costello, ph.d.

WHI Clinical Coordinating Center Center for Developmental
Division of Public Health Sciences Epidemiology
Fred Hutchinson Cancer Department of Psychiatry and
Research Center Behavioral Sciences
Seattle, WA Duke University Medical Center
adrian angold, m.d. Durham, NC
Center for Developmental
nicolle m. gatto, ph.d.
Epidemiology
Director, TA Group Head,
Department of Psychiatry and
Epidemiology Safety and Risk
Behavioral Sciences
Management
Duke University Medical Center
Medical Division, Pfizer, Inc.
Durham, NC
New York, NY
naomi breslau, ph.d.
Department of Epidemiology daniel goldman
Michigan State University University of Minnesota
College of Human Medicine, Minneapolis, MN
East Lansing, MI
aimee m. hunter, ph.d.
hendricks c. brown, ph.d. Research Psychologist
University of Miami Laboratory of Brain, Behavior, and
Miami, FL Pharmacology
ulka b. campbell, ph.d. University of California, Los Angeles
Associate Director, Epidemiology Los Angeles, CA
Safety and Risk Management
kenneth s. kendler, m.d.
Medical Division, Pfizer, Inc.
New York, NY Virginia Institute for Psychiatric and
Behavioral Genetics
ian a. cook, m.d. Departments of Psychiatry and
Associate Professor Human and Molecular Genetics
Semel Institute for Neuroscience Medical College of Virginia
and Human Behavior
Virginia Commonwealth University
University of California,
Richmond, VA
Los Angeles,
Los Angeles, CA
xi
Contributors
donald f. klein, m.d., d.sc. judea pearl, ph.d.

Research Professor Cognitive Systems Laboratory
Phyllis Green and Randolph Cowen Computer Science Department
Institute for Pediatric Neuroscience University of California
NYU Child Study Center, Los Angeles, CA
NYU Medical Center
ross l. prentice
Professor Emeritus, Department
WHI Clinical Coordinating Center
of Psychiatry
Division of Public Health Sciences
College of Physicians & Surgeons
Fred Hutchinson Cancer
Columbia University
Research Center
New York, NY
Seattle, WA
robert f. krueger
Washington University in St. Louis sharon schwartz, ph.d.
St. Louis, MO Professor of Clinical Epidemiology
Columbia University
andrew f. leuchter, m.d.
New York, NY
Professor
Department of Psychiatry and george davey smith
Biobehavioral Science MRC Centre for Causal Analyses in
University of California, Los Angeles Translational Epidemiology
Los Angeles, CA Department of Social Medicine
University of Bristol
janet a. levy
Bristol, UK
Center for Clinical Trials Network
National Institute on Drug Abuse matthew w. state, m.d., ph.d.
Bethesda, MD Donald J. Cohen Associate Professor
susan a. murphy, ph.d. Co-Director Program on
Professor, Psychiatry Neurogenetics
University of Michigan Yale Child Study Center
Institute for Social Research Department of Genetics and
Ann Arbor, MI Psychiatry
Yale University School of Medicine
bengt muthén, ph.d.
New Haven, CT
Professor Emeritus
Graduate School of Education & roger d. weiss, m.d.
Information Studies Harvard Medical School
University of California, Los Angeles McLean Hospital
Los Angeles, CA Belmont, MA
alena i. oetting
University of Michigan
Institute for Social Research
Ann Arbor, MI
xii
part i
Causal Theory and Scientific Inference

1
Integrating Causal Analysis into Psychopathology

Research
patrick e. shrout
Both in psychopathology research and in clinical practice, causal thinking is

natural and productive. In the past decades, important progress has been
made in the treatment of disorders ranging from attention-deficit/hyperactiv-
ity disorder (e.g., Connor, Glatt, Lopez, Jackson, & Melloni, 2002) to depres-
sion (e.g., Dobson, 1989; Hansen, Gartlehner, Lohr, Gaynes, & Carey, 2005) to
schizophrenia (Hegarty, Baldessarini, Tohen, & Waternaux, 1994). The treat-
ments for these disorders include pharmacological agents as well as beha-
vioral interventions, which have been subjected to clinical trials and other
empirical evaluations. Often, the treatments focus on the reduction or elim-
ination of symptoms, but in other cases the interventions are designed to
prevent the disorder itself (Brotman et al., 2008). In both instances, the
interventions illustrate the best use of causal thinking to advance both scien-
tific theory and clinical practice.
When clinicians understand the causal nature of treatments, they can have
confidence that their actions will lead to positive outcomes. Moreover, being
able to communicate this confidence tends to increase a patient’s comfort
and compliance (Becker & Maiman, 1975). Indeed, there seems to be a basic
inclination for humans to engage in causal explanation, and such explana-
tions affect both basic thinking, such as identification of categories (Rehder &
Kim, 2006), and emotional functioning (Hareli & Hess, 2008). This inclina-
tion may lead some to ascribe causal explanations to mere correlations or
coincidences, and many scientific texts warn researchers to be cautious about
making causal claims (e.g., Maxwell & Delaney, 2004). These warnings have
been taken to heart by editors, reviewers, and scientists themselves; and there
is often reluctance regarding the use of causal language in the psychopathol-
ogy literature. As a result, many articles simply report patterns of association
and refer to mechanisms with euphemisms that imply causal thinking with-
out addressing causal issues head-on.
3
4 Causality and Psychopathology
Over 35 years ago Rubin (1974) began to talk about strong causal infer-
ences that could be made from experimental and nonexperimental studies
using the so-called potential outcomes approach. This approach clarified the
nature of the effects of causes A vs. B by asking us to consider what would
happen to a given subject under these two conditions. Forget for a moment
that at a single instant a subject cannot experience both conditions—Rubin
provided a formal way to think about how we could compare potential rather
than actual outcomes. The contrast of the potential outcomes was argued to
provide a measure of an individual causal effect, and Rubin and his collea-
gues showed that the average of these causal effects across many individuals
could be estimated under certain conditions. Although approaches to causal
analysis have also been developed by philosophers and engineers (see Pearl,
2009), the formal approaches of Rubin and his colleagues (e.g., Holland,
1986; Frangakis & Rubin, 2002) and statistical epidemiologists (Greenland,
Pearl, & Robins, 1999a, 1999b; Robins, 1986; Robins, Hernan, & Brumback,
2000) have prompted researchers to have new appreciation for the strengths
and limitations of both experimental and nonexperimental designs.
This volume is designed to promote conversations among those concerned
with causal inference in the abstract and those interested in causal explana-
tion of psychopathology more specifically. Authors include prominent contri-
butors from both types of literature. Some of the chapters from experts in
causal analysis are rather technical, but all engage important and cutting-
edge issues in the field. The psychopathology experts raise challenging issues
that are likely to be the subject of discussion for years to come.
In this introductory chapter, I give an overview of some of the themes that
will be discussed in the subsequent chapters. These themes have to do with
the assessment of causal effects, the sources of bias in clinical trials and
nonexperimental designs, and the potential of innovative designs and per-
spectives. In addition to the themes that are developed later in the volume,
I discuss two topics that are not fully discussed elsewhere in the volume. One
topic focuses on the role of time when considering the effects of causes in
psychopathology research. The other topic is mediation analysis, which is a
statistical method that was developed in psychology to describe the interven-
ing processes between an intervention and an outcome of that intervention.
Themes in Causal Analysis
In the pioneering work of Rubin (1978), randomized experiments had a

special status in making causal claims based on the causal effect. As men-
tioned, the causal effect is defined as the difference between the outcome
(say, Y) that would be observed if person U were administered treatment
A vs. what would have been observed if that person received treatment B. The
outcome under treatment A is called YA(U) and the outcome under treatment
B is called YB(U). Because only one treatment can be administered for a
given measurement of Y(U), the definition of the causal effect depends on
a counterfactual consideration, namely, what the outcome of someone with
treatment A would have been had he or she received treatment B or what the
outcome of someone assigned to treatment B would have been had he or she
received treatment A. Our inability to observe both outcomes is what Holland
(1986) called ‘‘the fundamental problem of causal inference.’’
Average Causal Effects from Experiments

Although the individual causal effect cannot be known, the average causal
effect can be estimated if subjects are randomly assigned to treatment A or B
and several other conditions are met. In this case, between-person informa-
tion can be used to estimate the average of within-person counterfactual
causal effects. The magnitude of the causal effect is taken seriously, and
efforts are made to estimate this quantity without statistical bias. This can
be done in a relatively straightforward way in engineering experiments and in
randomized studies with animal models since the two groups being com-
pared are known to be probabilistically equivalent. Moreover, in basic science
applications the assumption of a stable unit treatment value (SUTVA) (Rubin,
1980, 1990) is plausible. As Schwartz, Gatto, and Campbell discuss (Chapter 2),
the SUTVA assumption essentially means that subjects are exchangeable and
one subject’s outcome is unaffected by another subject’s treatment assign-
ment. This assumption will typically hold in carefully executed randomized
experiments using genetically pure laboratory animals. For example,
O’Mahony and colleagues (2009) randomly selected male Sprague-Dawley
rat pups for a postnatal stress experience, which involved removing them
daily from their mother for 3 hours on days 2–12. The randomly equivalent
control pups were left with their mothers. At the end of the study, the
investigators found differences in the two groups with respect to biological
indicators of stress and immune function. The biologically equivalent sub-
jects in this example are plausibly exchangeable, consistent with SUTVA; but
we must also assume that the subjects did not affect each other’s responses.
In the causal literature, it is common to represent the causal effects as
shown in Figure 1.1A. In this figure, the treatment is represented as variable
T, and it would take two values, one for treatment A and one for treatment B.
The outcome is represented by Y, which would have been the value of one of
the biological measurements in the O’Mahony et al. (2009) example. In
addition, there is variable E, which represents the other factors that can
influence the value of Y, such as genetic mutations or measurement error
Panel A Panel B
E C E
T Y X Y
Figure 1.1 Schematic representation of Treatment condition (T) on outcome (Y). Boxes
represent observed values and circles represent latent variables. In the panel on the left
(Panel A) the treatment is the only systematic influence on Y, but in the panel on the
right (Panel B) there is a confounding variable (C) that influences both the treatment
and the outcome.
in the biological assays. When random assignment is used to define T, then E

and T are unrelated; and this is represented in Figure 1.1 by a lack of any
explicit link between these two variables.
Clinical Trials
One might suppose that this formal representation works for experiments
involving humans as subjects. However, things get complicated quickly in
this situation, as is well documented in the literature on clinical trials (e.g.,
Fleiss, 1986; Everitt & Pickles, 2004; Piantadosi, 2005). It is easy enough to
assign people randomly to either group A or group B and to verify that the
two groups are statistically equivalent in various characteristics, but human
subjects are agents who can undo the careful experimental design.
Individuals in one group might not like the treatment to which they are
assigned and may take various actions, such as failing to adhere with the
treatment, switching treatments, selectively adding additional treatments, or
withdrawing from the study entirely.
This issue of nonadherence introduces bias into the estimate of the average
causal effect (see Efron, 1998; Peduzzi, Wittes, Detre, & Holford, 1993, for
detailed discussion). For example, if a drug assigned to group A has good
long-term efficacy but temporary negative side effects, such as dry mouth or
drowsiness, then persons who are most distressed by the side effects might
refuse to take the medication or cut back on the dose. Persons in group B
may not feel the need to change their assigned treatment, and thus, the two
groups become nonequivalent in adherence. One would expect that the com-
parison of the outcomes in the two groups would underestimate the efficacy
of the treatment.
A different source of bias will be introduced if persons in one group
are more likely to withhold data or to be lost to follow-up compared to the
other group. This issue of missing data is another threat to clear causal
inference in clinical trials. Mortality and morbidity are common reasons for
follow-up data being missing, but sometimes data are missing because sub-
jects has become so high-functioning that they do not have time to give to
follow-up measurement. If observations that are missing come from a dis-
tribution other than the observations that were completed and if this discre-
pancy is different in groups A and B, then there is potential for the estimate
of the causal effect to become biased (see Little & Rubin, 2002)
For many clinical studies, the bias in the causal effect created by differential
nonadherence and missing data patterns is set aside rather than confronted
directly. Instead, the analysis of the trials typically emphasizes intent to
treat (ITT). This requires that subjects be analyzed within the groups originally
randomized, regardless of whether they were known to have switched treat-
ment or failed to provide follow-up data. Missing data in this case must be
imputed, using either formal imputation methods (Little & Rubin, 2002) or
informal methods such as carrying the last observed measurement forward.
ITT shifts the emphasis of the trial toward effectiveness of the treatment proto-
col, rather than efficacy of the treatment itself (see Piantadosi, 2005, p. 324). For
example, if treatment A is a new pharmacologic agent, then the effectiveness
question is how a prescription of this drug is likely to change outcome
compared to no prescription. The answer to this question is often quite dif-
ferent from whether the new agent is efficacious when administered in tightly
controlled settings since effectiveness is affected by side effects, cost of treat-
ment, and social factors such as stigma associated with taking the treatment.
Indeed, as clinical researchers reach out to afflicted persons who are not
selected on the basis of treatment-seeking or volunteer motives, nonadherence
and incomplete data are likely to be increasingly more common and challen-
ging in effectiveness evaluation. Although these challenges are real, there are
important reasons to examine the effectiveness of treatments in representative
samples of persons outside of academic medical centers.
Whereas ITT and ad hoc methods of filling in missing data can provide
rigorous answers to effectiveness questions, causal theorists are drawn to
questions of efficacy. Given that we find that a treatment plan has no clear
effectiveness, do we then conclude that the treatment would never be effica-
cious? Or suppose that overall effectiveness is demonstrated: Can we look
more carefully at the data to determine if the treatment caused preventable
side effects? Learning more about the specific causal paths in the develop-
ment and/or treatment of psychopathology is what stimulates new ideas
about future interventions. It also helps to clarify how definitive results are
from clinical trials or social experiments (e.g., Barnard, Frangakis, Hill, &
Rubin, 2003). Toh and Hernán (2008) contrast findings based on an ITT
approach to findings based on causally informative analyses.
Nonexperimental Observational Studies

Just as nonadherence and selective missing data can undermine the rando-
mized equivalence of treatment groups A and B, selection effects make it
especially difficult to compare groups whose exposure to different agents is
simply observed in nature. Epidemiologists, economists, and other social
scientists have invested considerable effort into the development of methods
that allow for adjustment of confounding due to selection biases. Many of
these methods are reviewed or further developed in this volume (see
Chapters 6 and 11). In fact, the problems that ‘‘break’’ randomized experi-
ments with humans (Barnard et al., 2003) have formal similarity to selection,
measurement, and attrition effects in nonexperimental studies.
A simple version of this formal representation is shown in Figure 1.1B.
In this version, some confounding variable, C, is shown to be related to the
treatment, T, and the outcome, Y. If variable C is ignored (either because it is
not measured or because it is left out of the analysis for other reasons), then
the estimated causal effect of T on Y will be biased. There can be multiple
types of confounding effects, and missing data processes may be construed to
be examples of these. Often, the confounding is more complex than illu-
strated in Figure 1.1. For example, Breslau in this volume (Chapter 12)
considers examples where T is experience of trauma (vs. no trauma) and Y
are symptoms of avoidance/numbing that are consistent with posttraumatic
stress syndrome. Although the causal association between T and Y is often
assumed, Breslau considers the existence of other variables, such as person-
ality factors, that might be related to either exposure to T or appraisal of the
trauma and the likelihood of experiencing avoidance or numbing. If the
confounding variables are not identified as causal alternatives and if data
that are informative of the alternate causal paths are not obtained, then the
alleged causal effect of the trauma will be overstated.
Innovative Designs and Analyses for Improving Causal

Inferences
When studying the effects of purported causes such as environmental disas-
ters, acts of war or terror, bereavement, or illness/injury, psychopathology
researchers often note that random assignment is not possible but that a
hypothetical random experiment would provide the gold standard for clear
causal inference. This hypothetical ideal can be useful in choosing quasi-
experimental designs that find situations in nature that seem to mimic
random assignment. There are several classes of quasi-experimental design
that create informative contrasts in the data by clever definition of treatment
groups rather than random assignment (Shadish, Cook, & Campbell, 2002).
For example, Costello and colleagues describe a situation in which a subset of
rural families in the Great Smoky Mountain developmental study (see

Chapter 11; Costello, Compton, Keeler, & Angold, 2003) were provided with
new financial resources by virtue of being members of the Cherokee Indian
tribe at the time the tribe opened a new casino. Tribe members were provided
payments from profit from the casino, while their nontribe neighbors were
not. Costello and colleagues describe how this event, along with a develop-
mental model, allowed strong claims to be made about the protective impact
of family income on drug use and abuse.
Modern genetics provides new and exciting tools for creating groups that
appear to be equivalent in every respect except exposure. Kendler in this
volume (Chapter 4) describes how twin studies can create informative quasi-
experimental designs. Suppose that we are interested in an environmental
exposure that is observed in nature and that the probability of exposure is
known to be related to psychological characteristics such as cognitive ability,
risk-taking, and neuroticism, which are known to have genetic loadings. If we
can find monozygotic and dizygotic twin pairs with individual twins who differ
in exposure, then we have a strong match for selection. Modern genetic ana-
lyses are useful in isolating the risk of exposure (a selection factor) from the
causal effect of the exposure on subsequent psychological functioning.
Twin studies are not necessary to take advantage of genetic thinking to
overcome selection effects. Davey Smith (Chapter 9) writes that researchers
are learning about genetic variants that determine how various environmen-
tal agents (e.g., alcohol, cannabis) are metabolized and that these variants are
nearly randomly distributed in certain populations. Under the so-called
Mendelian randomization (Davey Smith & Ebrahim, 2003) approach, a
causal theory that involves known biochemical pathways can be tested by
comparing outcomes following exposure in persons who differ in the tar-
geted genetic location.
Mendelian randomization strategies and co-twin designs make use of
genetics to provide insightful causal analyses of environmental exposures.
Random genetic variation can also be tapped to examine the nature of genetic
associations themselves. State (Chapter 10) describes how rare genetic var-
iants, such as nucleotide substitutions or repeats or copy number variations,
can be informative about the genetic mechanisms in complex diseases. He
illustrates this causal approach with findings on autism. Because the rare
variants seem to be random, the selection issues that concern most observa-
tional studies are less threatening.
Analytic Approaches to Confounding

Understanding the nature of the effects of confounding by nonadherence and
missing values in clinical trials and by selection effects in nonexperimental
comparative studies has been aided by formal representations of the nature of

the causal effects in the work of Pearl (2000, see Chapter 3). Pearl has
promoted the use of directed acyclic graphs (DAGs), which are explicit state-
ments of assumed causal paths. These graphical representations can be used
to recognize sources of confounding as well as to identify sufficient sets of
variables to adjust for confounding. The graphs can also be used to gain new
insights into the meaning of direct and indirect effects. The interpretation of
these graphs is enhanced by the use of the "do" operator of Pearl, which
states explicitly that a variable, Ti, can be forced to take one fixed value,
do(Ti = ti), or an alternate. For example, if T is an indicator of treatment A
or B, then this operator explicitly asks what would happen if individual i were
forced to have one treatment or the other. The formal analysis of causation
requires consideration, whether empirical or hypothetical, of what would
happen to the causal "descendents" if a variable is changed from one fixed
value to a different fixed value. A particularly useful feature of the formal
exercise is the consideration of competing causal paths that can bias the
interpretation of a given set of data, as well as the consideration that the
causal model might differ across individuals.
Once articulated, investigators often are able to obtain measures of possi-
ble confounding variables. A question of great interest among modern causal
analysts is how to use these measures to eliminate bias. Traditionally, the
confounders are simply added as ‘‘control’’ variables in linear regression or
structural equation models (Morgan & Winship, 2007; Bollen, 1989). If (1) C
is known to be linearly related to T, (2) C is known to be linearly related to Y,
(3) the relation of T to Y is known to be the same for all levels of C, (4) C is
measured without error, and (5) the set of variables in C is known to repre-
sent all aspects of selection bias, then the regression model approach will
yield an unbiased estimate of the causal effect of T on Y. The adjusted effect
is often interpreted with phrases such as ‘‘holding constant C’’ and ‘‘the
effect of T on Y is X,’’ which can be interpreted as an average causal effect.
Causal analysts often talk about the fact that assumptions that are needed
to make an adjustment valid are untestable. An investigator might argue for
the plausibility of the linear model assumptions by testing whether nonlinear
terms improve the fit of the linear models and testing for interactions
between C and T in the prediction of Y. However, these empirical tests will
leave the skeptic unconvinced if the study sample is small and the statistical
power of the assumption tests is limited.
Another approach to adjustment relies on the computation of propensity
scores (e.g., Rosenbaum & Rubin, 1983), which are numerical indicators of
how similar individuals in condition A are to individuals in condition B.
These scores are computed as summaries of multivariate representations of
the similarity of the individuals in the two groups. The propensity scores
themselves are created using methods such as logistic regression and non-
linear classification algorithms with predictor variables that are conceptually
prior to the causal action of T on Y. One important advantage of this
approach is that the analyst is forced to study the distributions of the pro-
pensity scores in the two groups to be compared. Often, one discovers that
there are some persons in one group who have no match in the other group
and vice versa. These unique individuals are not simply included as extra-
polations, as they are in traditional linear model adjustments, but are instead
set aside for the estimation of the causal effect. The computation of the
adjusted group difference is based on either matching of propensity scores
or forming propensity score strata. This approach is used to make the groups
comparable in a way that is an approximation to random assignment given
the correct estimation of the propensity score (see Gelman & Hill, 2007).
Propensity score adjustment neither assumes a simple linear relation
between the confounder variables and the treatment nor leads to a unique
result. Different methods for computing the propensity score can yield dif-
ferent estimates of the average causal effect. The ways that propensity scores
might be used to improve causal inference continue to be developed. For
example, based on work by Robins (1993), Toh and Hernán (2008) describe a
method called inverse probability weighting for adjustment of adherence and
retention in clinical trials. This method uses propensity score information to
give high weight to individuals who are comparable across groups and low
weight to individuals who are unique to one group.
Whereas direct adjustment and calculation of propensity scores make use
of measured variables that describe the possible imbalance of the groups
indexed by T, the method of instrumental variables attempts to adjust for
confounding by using knowledge about the relation of a set of variables I to
the outcome Y. If I can affect Y only through the variable T, then it is possible
to isolate spurious correlation between the treatment (T) and the outcome
(Y). Figure 1.2 shows a representation of this statement. The instrumental
variable I is said to cause a change in T and, through this variable, to affect Y.
I T Y
Figure 1.2 Schematic representation of how an instrumental variable (I) can isolate the
causal effect from the correlation between the treatment variable (T) and the error
term (E).
There may be other reasons why T is related to Y (as indicated by correlation

between T and E), but if the instrumental variable model is correct, the causal
effect can be isolated. The best example of this is when I is an indicator of
random assignment, T is a treatment condition, and Y is the outcome. On
the average, random assignment is related to Y only through the treatment
regime T. Economists and others have shown that instrumental variables
allow for confounding to be eliminated even if the nature of the confounding
process is not measured.
In nonexperimental studies, the challenge is to find valid instrumental
variables. The arguments are often made on the basis of scientific theories
of the causal process. For example, in the Costello et al. (2003) Great Smoky
Mountain Study, if tribal membership has never been related to substance
use by adolescents in a rural community but it becomes related after it is
associated with casino profit payments, then a plausible case can be made for
tribal membership being an instrumental variable. However, as Hernán and
Robins (2006) discuss, careful reexamination of instrumental variable
assumptions can raise questions about essentially untestable assumptions
about the causal process.
The analytic approaches to confounding can provide important insights
into the effects of adherence and retention in clinical trials and the impact of
alternate explanations of causal effects by selection processes in nonexperi-
mental studies. As briefly indicated, the different approaches make different
assumptions and these assumptions can lead to different estimates of causal
effects. Researchers who strive for closure from a specific study find such a
lack of clarity to be unsatisfying. Indeed, one of the advantages of the ITT
analysis of randomized clinical trials is that it can give a single clear answer
to the question of treatment effectiveness, especially when the analyses follow
rigorous guidelines for a priori specification of primary outcomes and are
based on data with adequate statistical power.
Temporal Patterns of Causal Processes
As helpful as the DAG representations of cause can be, they tend to empha-
size causal relations as if they occur all at once. These models may be
perfectly appropriate in engineering applications where a state change in T
is quickly followed by a response change in Y. In psychopathology research,
on the other hand, both processes that are hypothesized to be causes and
processes that are hypothesized to be effects tend to unfold over time. For
example, in clinical trials of fluoxetine, the treatment is administered for 4–6
weeks before it is expected to show effectiveness (Quitkin et al., 2003). When
the treatment is ended, the risk of relapse is typically expected to increase
with time off the medication. There are lags to both the initial effect of
the treatment and the risk of relapse. Figure 1.3A shows one representation
of this effect over time, where the vertical arrows represent a pattern of
treatments.
Another pattern is expected in preventive programs aimed at reducing
externalizing problems in high-risk children through the improvement of
parenting skills of single mothers. The Incredible Years intervention of
Webster-Stratton and her colleagues (e.g., Gross et al., 2003) takes 12
weeks to unfold and involves both parent and teacher sessions, but the
impact of the program is expected to continue well beyond the treatment
period. The emphasis on positive parenting, warm but structured interac-
tions, and reduction of harsh interactions is expected to affect the mother–
child relationships in ways that promote health, growth, and reduction of
conduct problems. Figure 1.3B shows how this pattern might look over time,
with an initial lag of treatment and a subsequent shift.
For some environmental shocks or chemical agents with pharmacokinetics
of rapid absorption, metabolism, and excretion, the temporal patterns might
be similar to those found in engineering applications. These are character-
ized by rapid change following the treatment and fairly rapid return to base-
line after the treatment is ended. Figure 1.3C illustrates this pattern, which
might be typical for heart rate change following a mild threat such as a fall or
for headache relief following the ingestion of a dose of analgesic.
As Costello and Angold discuss (Chapter 11), the consideration of these
patterns of change is complicated by the fact that the outcome being studied
might not be stable. Psychological/biological processes related to symptoms
might be developing due to maturation or oscillating due to circadian
rhythms, or they might be affected by other processes related to the treatment
itself. In randomized studies, the control group can give a picture of the
trajectory of the naturally occurring process, so long as adequate numbers
of assessments are taken over time. However, the comparison of the treat-
ment and control group may no longer give a single outcome but, rather, a
series of estimated causal effects at different end points, both because of the
hypothesized time patterns illustrated in Figure 1.3 and because of the
natural course of the processes under study. Although one might expect
that effects that are observed at adjacent times are due to the same causal
mechanism, there is no guarantee that the responses are from the same
people. One group of persons might have a short-lived response at one
time and another group might have a response at the next measured time
point.
Muthén and colleagues’ parametric growth mixture models (Chapter 7)
shift the attention to the individual over time, rather than specific (and
perhaps arbitrarily chosen) end points. These models allow the expected
Panel A
10.0
5.0
0.0
0 5 10 15
Time
Panel B
show
10.0
5.0
0.0
0 5 10 15
Time
twelve weeks and shift around 10 weeks.
Panel C
10.0
Figure 1.3 Examples of

time trends relating treat-
ments (indicated by verti-
cal arrow) and response
5.0
on Y. Panel A shows an
effect that takes time to be
seen and additional time
to diminish when the
treatment is removed.
0.0 Panel B shows an effect
0 5 10 15
that takes time to be
seen, but then is lasting.
trajectory in group A to be compared with that in group B. This class of

models also considers various patterns of individual differences in the trajec-
tories, with an assumption that some persons in treatment group A might
have trajectories just like those in placebo group B. Although the parametric
assumptions about the nature of the trajectories can provide interesting
insights and possibly increased statistical power, causal analysts can have
strikingly different opinions about the wisdom of making strong untestable
assumptions.
Scientists working on problems in psychopathology often have a general
idea of the nature of the trajectory, and this is reflected in the timing of the
measurements. However, unless repeated measurements are taken at fairly
short intervals, it is impossible to document the nature of the alternative
patterns as shown in Figure 1.3. Such basic data are needed to choose
among the possible parametric models that can be fit in growth mixture
models, and they are also necessary to implement the ideas of Klein
(Chapter 13), which involve starting and stopping treatment repeatedly to
determine who is a true responder and who is not.
Note that Klein’s proposal is related to classic crossover designs in which
one group is assigned treatment sequence (A, B) and another group is
assigned (B, A). This design assumes a temporal pattern like in Figure
1.3C, and it requires a ‘‘washout’’ period during which the effect of the
first treatment is assumed to have dissipated. The literature on these designs
is extensive (e.g., Fleiss, 1986; Piantadosi, 2005), and it seems to provide an
intuitive solution to Holland’s (1986) fundamental problem of causal infer-
ence. If one cannot observe both potential outcomes, YA(U) and YB(U), at the
same instant, then why not fix the person U (say U = u) and estimate YA(U)
and YB(U) at different times? Holland called this a scientific approach to the
fundamental problem, but he asserted that the causal estimate based on this
design depends on an untestable homogeneity assumption, namely, that
person u at time 1 is exactly the same as person u at time 2, except for
the treatment. Although the test of that assumption cannot be definitive, an
accumulated understanding of temporal patterns of effects will make the
assumption more or less plausible.
Mediation and Moderation of Causal Effects
Just as psychopathology researchers are willing to consider scientific

approaches to the fundamental problem of causal inference using crossover
designs, they may also be inclined to develop intuitive statistical models of
causal process. For example, Freedland et al. (2009) found that assignment to
a cognitive behavior therapy (CBT) condition was related to reduced
depression 3 months after treatment completion among depressed patients

who had experienced coronary artery bypass surgery. A researcher might ask
if the improvement was due to mastery of one or another component of CBT,
namely, (1) control of challenging distressing automatic thoughts or (2) chan-
ging and controlling dysfunctional attitudes. Suppose the researcher had
included assessments of these two cognitive skills at the 2-month assessment
(1 month before the end point assessment). The question could be asked,
Can the effect of treatment (T = CBT) on depression (Y) be explained by an
intervening cognitive skill (M)?
Kenny and colleagues (Judd & Kenny, 1981; Baron & Kenny, 1986) forma-
lized the mediation analysis approach to this question using a set of linear
models that are represented in Figure 1.4. Panel A shows a causal relation
between T and Y (represented as c) and panel B shows how that effect might
be explained by mediator variable M. There are four steps in the Baron and
Kenny tradition: (1) show that T is related to Y (effect c in panel A); (2) show
that T is related to M (effect a in panel B); (3) show that, after adjusting for T,
M is related to Y (effect b in panel B) and then determine if the direct effect
of T on Y, after adjusting for M, remains non-zero (effect c0 in panel B). If the
direct effect can be considered to be zero, then Baron and Kenny described
the result as complete mediation—otherwise, it was partial mediation. In
addition to these steps, the mediation tradition suggests estimating the indir-
ect effect of T on Y through M as the product of estimates of a and b in
Figure 1.4B and testing the null hypothesis that the product is equal to zero
(see MacKinnon, 2008).
It is difficult to overestimate the impact of this approach on informal
causal analysis in psychopathology research. The Baron and Kenny report
alone has been cited more than 17,000 times, and thousands of these cita-
tions are by psychopathology researchers. Often, the mediation approach is
used in the context of experiments such as those already described, but other
times it is used to explain associations observed in cross-sectional surveys.
These have special problems.
Panel A Panel B eM
eY M eY
a b
c
T Y T c´ Y
Figure 1.4 Traditional formulation of Baron and Kenny (1986) mediation model, with
Panel A showing total effect (c) and Panel B showing indirect (a*b) and direct (c0 )
effect decomposition.
Although Kenny and his colleagues have explicitly warned that the analysis
is appropriate only when the ordering of variables is unambiguous, many
published studies have not established this order rigorously. Even if an experi-
mental design guarantees that the mediating and outcome processes (M, Y)
follow the intervention (T), M and Y themselves are often measured at the same
point in time and the association between M and Y is estimated as a correlation
rather than a manipulated causal relation. This leaves open the possibility of
important bias in the estimated indirect effect of T on Y through M.
Figure 1.5A is an elaboration of Figure 1.4B that represents the possibility
of other influences besides T on the association between M and Y. This is
shown as correlated residual terms, eM and eY. For example, if we were trying
to explain the effect of CBT (T) on depression (Y) through changes in control
of dysfunctional attitudes (M), we could surmise that there is a correlation of
degree of dysfunctional attitudes and depression symptoms that would be
observed even in the control group. Baseline intuitions, insight, or self-help
guides in the lay media might have led to covariation in the degree of
dysfunctional attitudes and depression. In fact, part of this covariation
could be reverse pathways such that less depressed persons more actively
read self-help strategies and then change their attitudes as a function of
Panel A
eM
eY
Y
Panel B
a c´
eM
M0 M
g1
rMY
b
eY
Y0 Y
g2
Figure 1.5 Formulation of mediation model to show correlated errors (Panel A) and an
extended model that includes baseline measures of the mediating variable (M0) and
the outcome measure () and the outcome measure (Y0).
the reading. If these sources of covariation are ignored, then the estimate of
the b effect will be biased, as will be the product, a * b. In most cases, the
bias will overestimate the amount of effect of T that goes through M.
Hafeman (2008) has provided an analysis of this source of bias from an
epidemiologic and causal analysis framework.
Although Figure 1.5A represents a situation where b will be biased when
the usual Baron and Kenny (1986) analysis is carried out, the model shown in
Figure 1.5A cannot itself be used to respecify the analysis to eliminate the
bias. This is because the model is not empirically identified. This means that
we cannot estimate the size of the correlation between eM and eY while also
estimating a, b, and c0 . However, investigators often have information that is
ignored that can be used to resolve this problem. Figure 1.5B shows a model
with adjustments for baseline (prerandomization) measures of the outcome
(Y0) and mediating process (M0). When these baseline measures are included,
it is possible both to account for baseline association between Y and M and to
estimate a residual correlation between Y and M. The residual correlation can
be estimated if it is reasonable to consider the baseline M0 as an instrumental
variable that has an impact on the outcome Y only through its connection with
the postrandomized measure of the mediating process, M1
How important can this adjustment be? Consider a hypothetical numerical
example in which a = 0.7, b = 0.4, and c0 = 0.28. Assuming that the effects
are population values, these values indicate a partial mediation model. The
total effect of T on Y (c in Figure 1.4A) is the sum of the direct and indirect
effects, 0.56 = 0.28 + (0.70)(0.40), and exactly half the effect goes through M.
The stability of the mediation process from baseline to postintervention is
represented by g1 and the comparable stability of the outcome variable is g2.
Finally, the degree of correlation between M0 and Y0 is rmy.
Figure 1.6 shows results from an analysis of the bias using the Figure
1.4B model to represent mediation for different levels of correlation between
M0 and Y0. The results differ depending on how stable are the mediating and
outcome processes in the control group. (For simplicity, the figure assumes
that they are the same, i.e., g1 = g2.) Focusing on the estimate of the indirect
effect, a * b, one can see that there is no bias if M and Y have no stability:
The estimate is the expected 0.28 for all values of rmy when g1 = g2 = 0.
However, when stability in M and Y is observed, the correlation between M0
and Y0 is substantial. Given that symptoms, such as depression, and coping
strategies, such as cognitive skills, tend to be quite stable in longitudinal
1. There can be further refinements to the model shown in Figure 1.5B. One might consider a
model where Y0 is related to the mediating process M. For example, if less depressed persons
in the study were inclined to seek self-help information and M represented new cognitive skills
that are available in the media, then the path between Y0 and M could be non-zero and
negative.
studies, we must conclude that important bias in estimates of the indirect

effect is likely to be the rule rather than the exception. When investigators
compute mediation analyses without taking into account the correlation of M
and Y at baseline, they run the risk of concluding that an experimental result
is completely mediated by an intervening process, when in fact there may be
direct effects or other intervening processes also involved.
The use of baseline measures is not the only way to make adjustments for
spurious correlations between M and Y. In social psychology, Spencer, Zanna,
and Fong (2005) argued that the causal involvement of the mediating path
would be most convincing if researchers developed supplemental studies that
manipulated M directly through randomized experiments. For example, in a
long-term study of the CBT effects on depression, an investigator might
randomly assign persons in the treatment group to experience (or not) a
‘‘booster’’ session that reminds the patient of the cognitive skills that were
taught previously in the CBT sessions. One of the more challenging assump-
tions of this approach is that the nature of the M change in a direct inter-
vention is identical to the nature of the M change that follows manipulation
of T. It is possible, for example, that the booster intervention on cognitive
skills might have a different impact from the original intervention because
the patient has become aware of barriers to the implementation of the skill
set. As a result of that experience, the patient might attend to different
Indirect Effect Bias

1.20
1.00
0.80
Product a*b
Stability .8
Stability .6
0.60 Stability .4
Stability .2
Stability .0
0.40
0.20
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Baseline Corr of M and Y
Figure 1.6 Chart showing the expected values of the indirect effect estimated from the
model in Panel B of Figure 1.4 when the actual model was Panel B of Figure 1.5 with
values a=.7, b=.4 and c0 =.28. Different lines show values associated with different
stabilities of the M and Y processes (g1, g2 in Figure 1.5B) as a function of the baseline
correlation between M and Y.
aspects of the booster session from the original intervention. This difference
could affect the strength of the relation between M and T in the direct
manipulation condition. Nonetheless, the new information provided by
direct manipulation of M is likely to increase the confidence one has in
the estimate of the indirect causal path.
Noting that it is the correlational nature of the link between M and Y that
makes it challenging to obtain unbiased estimates of indirect (mediated)
effects in randomized studies, it should not be surprising that the challenges
are much greater in nonexperimental research. There are a number of stu-
dies published in peer review journals that attempt to partition assumed
causal effects into direct and indirect components. For example, Mohr
et al. (2003) reported that an association between traumatic stress and ele-
vated health problems in 713 active police officers was fully mediated by
subjective sleep problems in the past month. All the variables were measured
in a cross-sectional survey. The path of stress to sleep to health problems is
certainly plausible, but it is also possible that health problems raise the risk
of both stress and sleep problems.
Even if there is no dispute about the causal order, there can be dispute
about the meaning of the mediation analysis in cases such as this.
Presumably, the underlying model unfolds on a daily basis: Stress today
disrupts sleep tonight, and this increases the risk of health problems tomor-
row. One might hope that cross-sectional summaries of stress and sleep
patterns obtained for the past month would be informative about the mediat-
ing process. However, Maxwell and Cole (Cole & Maxwell, 2003; Maxwell &
Cole, 2007) provided convincing evidence that there is no certain connection
between a time-dependent causal model and a result based on cross-sectional
aggregation of data. They studied the implications of a stationary model
where causal effects were observed on a daily basis for a number of days
or parts of days. In addition to the mediation effects represented in Figure
1.4B (a, b, c0 ), they represented the stability of the T, M, and Y processes from
one time point to the next. They studied the inferences that would be made
from a cross-sectional analysis of the variables under different assumptions
about the mediation effects and the stability of the processes. The bias of the
cross-sectional analysis was greatly influenced by the process stability, and the
direction of the bias was not consistent. Sometimes the bias of the indirect
effect estimate was positive and sometimes it was negative.
The Maxwell and Cole work prompts psychopathology researchers to think
carefully about the temporal patterns in mediation and to take seriously the
assumptions that were articulated by Judd and Kenny (1981). Others have
called for modifications of the original positions taken by Kenny and his
colleagues. An important alternate perspective has been advanced by
MacArthur Network researchers (Kraemer, Kiernan, Essex, & Kupfer, 2008),
who call into question the Baron and Kenny (1986) distinction between
mediation and moderation. As we have already reviewed in Figure 1.4, a
third variable is said by Baron and Kenny (1986) to be a mediator if it
both has a direct association with Y adjusting for T and can be represented
as being related linearly with T. A moderator, according to Baron and Kenny
(1986), is a third variable (W) that is involved in a statistical interaction with
T when T and W are used to predict Y. The MacArthur researchers note that
the Baron and Kenny distinction is problematic if various nonlinear transfor-
mations of Y are considered. Such transformations can produce interaction
models, even if there is no evidence that the causal effect is moderated. They
propose to limit the concept of moderation to effect modifiers. If the third
variable represents a status before the treatment is applied and if the size of
the TY effect varies with the level of the status, then moderation is demon-
strated from the MacArthur perspective. For randomized studies, the moder-
ating variable would be expected to be uncorrelated with T. If
psychopathology researchers embrace the MacArthur definition of modera-
tion, considerable confusion in the literature will be avoided in the future.
Conclusion
The time is ripe for psychopathology researchers to reconsider the conven-

tions for making causal statements about mental health processes. On the
one hand, conventions such as ITT analyses of clinical trials have led to
conservative conclusions about the causal processes involved in the changes
following interventions, and on the other hand, rote application of the Baron
and Kenny (1986) steps for describing mediated paths have led to premature
closure regarding which causal paths account for intervention effects. The old
conventions are efficient for researchers in that they prescribe a small
number of steps that must be followed in preparing manuscripts, but they
limit the insights that are possible from a deeper consideration of causal
mechanisms and pathways.
The new approaches to causal analysis will not lead to quick statements
about which factors are causal and which are spurious or even definitive
statements, but they will allow clinical and experimental data to be viewed
from multiple perspectives to reveal new causal insights. In many cases, the
new approaches are likely to suggest causal heterogeneity in a population.
Because of genetic differences, social context, developmental stage, timing of
measurements, and random environmental flux, the size of causal effects of
intervention T will vary from person to person. The new methods will help us
to appreciate how the alternate summaries of the population causal effect can
be affected by these distributions.
It will often take more effort to use the modern tools of causal analysis,
but the benefit of the effort is that researchers will be able to talk more
explicitly about interesting causal theories and patterns rather than about
associations that have been edited to remove any reference to ‘‘cause’’ or
‘‘effect.’’ In the long run the more sophisticated analyses will lead to more
nuanced prevention and treatment interventions and a deeper understanding
of the determinants of psychiatric problems and disorders. Many examples of
these insights are provided in the chapters that follow.
References
Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal stratification
approach to broken randomized experiments: A case study of school choice vou-
chers in New York City. Journal of the American Statistical Association, 98, 299–311.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in
social psychological research: Conceptual, strategic and statistical considerations.
Journal of Personality and Social Psychology, 51, 1173–1182.
Becker, M. H., & Maiman, L. A. (1975). Sociobehavioral determinants of compliance
with health and medical care. Medical Care, 13(1), 10–24.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Brotman, L. M., Gouley, K. K., Huang, K.-Y., Rosenfelt, A., O’Neal, C., Klein, R. G.,
et al. (2008). Preventive intervention for preschoolers at high risk for antisocial
behavior: Long-term effects on child physical aggression and parenting practices.
Journal of Clinical Child and Adolescent Psychology, 37, 386–396.
Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal
data: Questions and tips in the use of structural equation modeling. Journal of
Abnormal Psychology, 112, 558–577.
Connor, D. F., Glatt, S. J., Lopez, I. D., Jackson, D., & Melloni, R. H., Jr. (2002).
Psychopharmacology and aggression. I: A meta-analysis of stimulant effects on
overt/covert aggression-related behaviors in ADHD. Journal of the American
Academy of Child & Adolescent Psychiatry, 41(3), 253–261.
Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between
poverty and psychopathology: A natural experiment. Journal of the American Medical
Association, 290, 2023–2029.
Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic
epidemiology contribute to understanding environmental determinants of disease?
International Journal of Epidemiology, 32, 1–22.
Dobson, K. S. (1989). A meta-analysis of the efficacy of cognitive therapy for depres-
sion. Journal of Consulting and Clinical Psychology, 57(3), 414–419.
Efron, B. (1998). Forward to special issue on analyzing non-compliance in clinical
trials. Statistics in Medicine, 17, 249–250.
Everitt, B. S., & Pickles, A. (2004). Statistical aspects of the design and analysis of clinical
trials. London: Imperial College Press.
Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: Wiley.
Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics, 58, 21–29.
Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J., Davila-
Roman, V. G., et al. (2009). Treatment of depression after coronary artery bypass
surgery: A randomized controlled trial. Archives of General Psychiatry, 66(4), 387–396.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical
models. New York: Cambridge University Press.
Greenland, S., Pearl, J., & Robins, J. M. (1999a). Causal diagrams for epidemiologic
research. Epidemiology, 10(1), 37–48.
Greenland, S., Pearl, J., & Robins, J. M. (1999b). Confounding and collapsibility in
causal inference. Statistical Science, 14(1), 29–46.
Gross, D., Fogg, L., Webster-Stratton, C., Garvey, C., Julion, W., & Grady, J. (2003).
Parent training of toddlers in day care in low-income urban communities. Journal of
Consulting and Clinical Psychology, 71, 261–278.
Hafeman, D. M. (2008). A sufficient cause based approach to the assessment of
mediation. European Journal of Epidemiology, 23, 711–721.
Hansen, R. A., Gartlehner, G., Lohr, K. N., Gaynes, B. N., & Carey, T. S. (2005).
Efficacy and safety of second-generation antidepressants in the treatment of major
depressive disorder. Annals of Internal Medicine, 143, 415–426.
Hareli, S., & Hess, U. (2008). The role of causal attribution in hurt feelings and related
social emotions elicited in reaction to other’s feedback about failure. Cognition &
Emotion, 22(5), 862–880.
Hegarty, J. D., Baldessarini, R. J., Tohen, M., & Waternaux, C. (1994). One hundred
years of schizophrenia: A meta-analysis of the outcome literature. American Journal
of Psychiatry, 151(10), 1409–1416.
Hernán, M. A., & Robins, J. M. (2006). Instruments for causal inference: an epide-
miologist’s dream? Epidemiology, 17(4), 360–372.
Holland, P. (1986). Statistics and causal inference (with discussion). Journal of the
American Statistical Association, 81, 945–970.
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treat-
ment evaluations. Evaluation Review, 5, 602–619.
Kraemer, H., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria
defining moderators and mediators differ between the Baron & Kenny and
MacArthur approaches. Health Psychology, 27(Suppl. 2), S101–S108.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.).
New York: Wiley.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York:
Lawrence Erlbaum.
Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal
mediation. Psychological Methods, 12(1), 23–44.
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data:
A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.
Mohr, D., Vedantham, K., Neylan, T., Metzler, T. J., Best, S., & Marmar, C. R. (2003).
The mediating effects of sleep in the relationship between traumatic stress and
health symptoms in urban police officers. Psychosomatic Medicine, 65, 485–489.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference. New York:
Cambridge University Press.
O’Mahony, S. M., Marchesi, J. R., Scully, P., Codling, C., Ceolho, A., Quigley, E. M. M.,
et al. (2009). Early life stress alters behavior, immunity, and microbiota in rats:
Implications for irritable bowel syndrome and psychiatric illnesses. Biological
Psychiatry, 65(3), 263–267.
Pearl, J. (2009). Causality: Models, reasoning and inference. (Second edition) New York:
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference
on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan
Kaufmann.
Peduzzi, P., Wittes, J., Detre, K., & Holford, T. (1993). Analysis as-randomized and the
problem of non-adherence: An example from the Veterans Affairs Randomized Trial
of Coronary Artery Bypass Surgery. Statistics in Medicine, 12, 1185–1195.
Piantadosi, S. (2005). Clinical trials: A methodologic perspective (2nd ed.). New York:
Wiley.
Quitkin, F. M., Petkova, E., McGrath, P. J., Taylor, B., Beasley, C., Stewart, J., et al.
(2003). When should a trial of fluoxetine for major depression be declared failed?
American Journal of Psychiatry, 160(4), 734–740.
Rehder, B., & Kim, S. (2006). How causal knowledge affects classification: A generative
theory of categorization. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 32(4), 659–683.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with
sustained exposure periods—applications to control of the healthy worker survivor
effect. Mathematical Modeling, 7, 1393–1512.
Robins, J. M. (1993). Analytic methods for estimating HIV treatment and cofactor
effects. In D. G. Ostrow & R. C. Kessler (Eds.), Methodological issues of AIDS
mental health research (pp. 213–288). New York: Springer.
Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and
causal inference in epidemiology. Epidemiology, 11(5), 550–560.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of propensity scores in
observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-
randomized studies. Journal of Educational Psychology, 66(5), 688–701.
Rubin, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58.
Rubin, D. B. (1980). Discussion of ‘‘Randomization analysis of experimental data in
the Fisher randomization test,’’ by D. Basu. Journal of the American Statistical
Association, 75, 591–593.
Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of
Statistical Planning and Inference, 25, 279–292.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experi-
mental designs for generalized causal inference. Boston: Houghton-Mifflin.
Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why
experiments are often more effective than mediational analyses in examining psy-
chological processes. Journal of Personality and Social Psychology, 89(6), 845–851.
Toh, S., & Hernán, M. (2008). Causal inference from longitudinal studies with baseline
randomization. International Journal of Biostatistics, 4(1), article 22. Retrieved from
http://www.bepress.com/ijb/vol4/iss1/22
2
What Would Have Been Is Not What Would Be

Counterfactuals of the Past and Potential
Outcomes of the Future
sharon schwartz, nicolle m. gatto,
and
ulka b. campbell
Introduction
Epidemiology is often described as the basic science of public health. A

mainstay of epidemiologic research is to uncover the causes of disease that
can serve as the basis for successful public-health interventions (e.g., Institute
of Medicine, 1988; Milbank Memorial Fund Commission, 1976). A major
obstacle to attaining this goal is that causes can never be seen but only
inferred. For this reason, the inferences drawn from our studies must
always be interpreted with caution.
Considerable progress has been made in the methods required for sound
causal inference. Much of this progress is rooted in a full and rich articula-
tion of the logic behind randomized controlled trials (Holland, 1986). From
this work, epidemiologists have a much better understanding of barriers to
causal inference in observational studies, such as confounding and selection
bias, and their tools and concepts are much more refined.
The models behind this progress are often referred to as ‘‘counterfactual’’
models. Although researchers may be unfamiliar with them, they are widely
(although not universally) accepted in the field. Counterfactual models under-
lie the methodologies that we all use. Within epidemiology, when people talk
about a counterfactual model, they usually mean a potential outcomes
model—also known as ‘‘Rubin’s causal model.’’
As laid out by epidemiologists, the potential outcomes model is rooted in
the experimental ideas of Cox and Fisher, for which Neyman provided
the first mathematical expression. It was popularized by Rubin, who extended
25
it to observational studies, and expanded by Robins to exposures that vary

over time (Maldonado & Greenland, 2002; Hernan, 2004; VanderWeele &
Hernan, 2006). This rich tradition is responsible for much of the progress
we have just noted.
Despite this progress in methods of causal inference, a common charge in
the epidemiologic literature is that public-health interventions based on the
causes we identify in our studies often fail. Even when they do not fail, the
magnitudes of the effects of these interventions are often not what we
expected. Levins (1996) provides a particularly gloomy assessment:
The promises of understanding and progress have not been kept, and the
application of science to human affairs has often done great harm.
Public health institutions were caught by surprise by the resurgence
of old diseases and the appearance of new ones. . . . Pesticides increase
pests, create new pest problems and contribute to the load of poison in our
habitat. Antibiotics create new pathogens resistant to our drugs. (p. 1)
A less pessimistic assessment suggests that although public-health inter-

ventions may be narrowly successful, they may simultaneously lead to con-
siderable harm. An example is the success of antismoking campaigns in
reducing lung cancer rates in the United States, while simultaneously
increasing smoking and thereby lung cancer rates in less developed coun-
tries. This unintended consequence resulted from the redirection of cigarette
sales to these countries (e.g., Beaglehole & Bonita, 1997).
Ironically, researchers often attribute these public-health failures to a nar-
rowness of vision imposed by the same models of causal inference that
heralded modern advances in epidemiology and allied social and biological
sciences. That is, counterfactual models improve causal inference in our
studies but are held at least partly responsible for the failures of the inter-
ventions that follow those studies. Critics think that counterfactually based
approaches in epidemiology not only do not provide a sound basis for public-
health interventions but cannot (e.g., Shy, 1997; McMichael, 1999).
While there are many aspects of the potential outcomes model that war-
rant discussion, here we focus on one narrowly framed question: Is it pos-
sible, as the critics contend, that the same models that enhance the validity of
our studies can mislead us when we try to intervene on the causes these
studies uncover? We think the answer is a qualified ‘‘yes.’’
We will argue that the problem arises not because of some failure of the
potential outcomes approach itself but, rather, because of unintended con-
sequences of the metaphors and tools implied by the model. We think that
the language, analogies, and conceptual frame that enhance the valid estima-
tion of precise causal effects can encourage unrealistic expectations about the
2 What Would Have Been is Not What Would Be 27
relationship between the causal effects uncovered in our studies and results
of interventions based on their removal.
More specifically, we will argue that the unrealistic expectations of the
success of interventions arise in the potential outcomes frame because of a
premature emphasis on the effects of causal manipulation (understanding
what would happen if the exposure were altered) at the expense of two
other tasks that must come first in epidemiologic research: (1) causal identi-
fication (identifying if an exposure did cause an outcome) and (2) causal
explanation (understanding how the exposure caused the outcome). We will
describe an alternative approach that specifies all three of these steps—causal
identification, followed by causal explanation, and then the effects of causal
manipulation. While this alternative approach will not solve the discrepancy
between the results of our studies and the results of our interventions, it
makes the sources of the discrepancy explicit.
The roles of causal identification and causal explanation in causal infer-
ence, which we build upon here, have been most fully elaborated by Shadish,
Cook, and Campbell (2002), heirs to a prominent counterfactual tradition in
psychology (Cook & Campbell 1979) . We think that a dialogue between these
two counterfactual traditions (i.e., the potential outcomes tradition and the
Cook and Campbell tradition as most recently articulated in Shadish et al.)
can provide a more realistic assessment of what our studies can accomplish
and, perhaps, a platform for a more successful translation of basic research
findings into sound public-health interventions.
To make these arguments, we will (1) review the history and principles of
the potential outcomes model, (2) describe the limitations of this model as
the basis for interventions in the real world, and (3) propose an alternative
based on an integration of the potential outcomes model with other counter-
factual traditions.
We wish to make clear at the outset that virtually all of the ideas in this
chapter already appear in the causal inference literature (Morgan & Winship,
2007). This chapter simply presents the picture we see as we stand on the
shoulders of the giants in causal inference.
The Potential Outcomes Model
In the epidemiologic literature, a counterfactual approach is generally equa-

ted with a potential outcomes model (e.g., Maldonado & Greenland, 2002;
Hernan, 2004; VanderWeele & Hernan, 2006). In describing this model, we
will use the term exposure to mean a variable we are considering as a
possible cause. For ease of discourse, we will use binary exposures and
outcomes throughout. Thus, individuals will be either exposed or not and will
either develop the disease or not.
The concept at the heart of the potential outcomes model is the causal
effect of an exposure. A causal effect is defined as the difference between the
potential outcomes that would arise for an individual under two different
exposure conditions. In considering a disease outcome, each individual has
a potential outcome for the disease under each exposure condition.
Therefore, when comparing two exposure conditions (exposed and not
exposed), there are four possible pairs of potential outcomes for each indivi-
dual. An individual can develop the disease under both conditions, only
under exposure, only under nonexposure, or under neither condition.
Greenland and Robins (1986) used response types as a shorthand to
describe these different pairs of potential outcomes. Individuals who would
develop the disease under either condition (i.e., whether or not they were
exposed) are called ‘‘doomed’’; those who would develop the disease only if
they were exposed are called ‘‘causal types’’; those who would develop the
disease only if they were not exposed are called ‘‘preventive types’’; and those
who would not develop the disease under either exposure condition are called
‘‘immune.’’
Every individual is conceptualized as having a potential outcome under
each exposure that is independent of the actual exposure. Potential outcomes
are determined by the myriad of largely unknown genetic, in utero, child-
hood, adult, social, psychological, and biological causes to which the indivi-
duals have been exposed, other than the exposure under study.
The effect of the exposure for each individual is the difference between the
potential outcome under the two exposure conditions, exposed and not. For
example, if an individual’s potential outcomes were to develop the disease if
exposed but not if unexposed, then the exposure is causal for that individual
(i.e., he or she is a causal type).
Rubin uses the term treatment to refer to these types of exposures and
describes a causal effect in language that implies an imaginary clinical trial.
In Rubin’s (1978) terms, ‘‘The causal effect of one treatment relative to
another for a particular experimental unit is the difference between the
result if the unit had been exposed to the first treatment and the result if,
instead, the unit had been exposed to the second treatment’’ (p. 34). One of
Rubin’s contributions was the popularization of this definition of a causal
effect in an experiment and the extension of the definition to observational
studies (Hernan, 2004).
For example, the causal effect of smoking one pack of cigarettes a day for
a year (i.e., the first treatment) relative to not smoking at all (the second
treatment) is the difference between the disease outcome for an individual if
he or she smokes a pack a day for a year compared with the disease outcome
in that same individual if he or she does not smoke at all during this same
time interval.
One can think about the average causal effect in a population simply as
the average of the causal effects for all of the individuals in the population. It
is the difference between the disease experience of the individuals in a parti-
cular population if we were to expose them all to smoking a pack a day and
the disease experience if we were to prevent them from smoking at all during
this same time period.
A useful metaphor for this tradition is that of ‘‘magic powder,’’ where the
magic powder can remove an exposure. Imagine we sprinkle an exposure on
a population and observe the disease outcome. Imagine then that we use
magic powder to remove that exposure and can go back in time to see the
outcome in the same population. The problem of causal inference is two-
fold—we do not have magic powder and we cannot go back in time. We can
never see the same people at the same time exposed and unexposed. That is,
we can never see the same people both smoking a pack of cigarettes a day for
a year and, simultaneously, not smoking cigarettes at all for a year.
From a potential outcomes perspective, this is conceptualized as a miss-
ing-data problem. For each individual, at least one of the exposure experi-
ences is missing. In our studies, we provide substitutes for the missing data.
Of course, our substitutes are never exactly the same as what we want.
However, they can provide the correct answer if the potential outcomes of
the substitute are the same as the potential outcomes of the target, the
person or population you want information about.
The potential outcomes model is clearly a counterfactual model in the sense
that the same person cannot simultaneously experience both exposure and
nonexposure. The outcomes of at least one of the exposure conditions must
represent a counterfactual, an outcome that would have, but did not, happen.
Rubin (2005), however, objects to the use of the term counterfactual when
applied to his model. Counterfactual implies there is a fact (e.g., the outcome
that did occur in a group of exposed individuals) to which the counterfactual
(e.g., the outcome that would have occurred had this group of individuals not
been exposed) is compared. However, for Rubin, there is no fact to begin
with. Rather, the comparison is between the potential outcomes of two
hypothetical exposure conditions, neither of which necessarily reflects an
occurrence. The causal effect for Rubin is between two hypotheticals. Thus,
in the potential outcomes frame, when epidemiologists use the term counter-
factual, they mean ‘‘hypothetical’’ (Morgan & Winship, 2007). This subtle
distinction has important implications, as we shall see.
This notion of a causal effect as a comparison between two hypotheticals
derives from the rootedness of the potential outcomes frame in experimental
traditions. Holland (1986), an early colleague of Rubin and explicator of his
work, makes this experimental foundation clear in his summary of the three
main tenets of the potential outcomes model.
First, the potential outcomes model studies the effects of causes and not
the causes of effects. Thus, the goal is to estimate the average causal effect of
an exposure, not to identify the causes of an outcome. For a population, this
is the average causal effect, defined as the average difference between two
potential outcomes for the same individuals, the potential outcome under
exposure A vs. the potential outcome under exposure B. The desired, but
unobservable, true causal effect is the difference in outcome in one population
under two hypothetical exposure conditions: if we were to expose the entire
population to exposure A vs. if we were to expose them to exposure B. As in
an experiment, the exposure is treated as if it were in the control of the
experimenter; the goal is to estimate the effect that this manipulation
would have on the outcome.
Second, the effects of causes are always relative to particular comparisons.
One cannot ask questions about the effect of a particular exposure without
specifying the alternative exposure that provides the basis for the comparison.
For example, smoking a pack of cigarettes a day can be preventive of lung cancer
if the comparison was smoking four packs of cigarettes a day but is clearly causal
if the comparison was with smoking zero packs a day. As in an experiment,
the effect is the difference between two hypothetical exposure conditions.
Third, potential outcomes models limit the types of factors that can be
defined as causes. In particular, attributes of units (e.g., attributes of people
such as gender) are not considered to be causes. This requirement clearly derives
from the experimental, interventionist grounding of this model. To be a cause
(or at least a cause of interest), the factor must be manipulable. In Holland (1986,
p. 959) and Rubin’s terminology, ‘‘No causation without manipulation.’’1
The focus on the effect of causes, the precise definition of the two com-
parison groups, and the emphasis on manipulability clearly root the potential
outcomes approach in experimental traditions. Strengths of this approach
include the clarity of the definition of the causal effect being estimated and
the articulation of the assumptions necessary for this effect to be valid. These
assumptions are (1) that the two groups being compared (e.g., the exposed
and the unexposed) are exchangeable (i.e., they have the same potential out-
comes) and (2) that the stable unit treatment value assumption (SUTVA)
holds. While exchangeability is well understood in epidemiology, the require-
ments of SUTVA may be less accessible.
1. Rubin (1986), in commenting on Holland’s 1986 article, is not as strict as Holland in demand-
ing that causes be, by definition, manipulable. Nonetheless, he contends that one cannot
calculate the causal effect of a nonmanipulable cause and coauthored the ‘‘no causation with-
out manipulation’’ mantra.
Stable Unit Treatment Value Assumption

A valid estimate of this causal effect requires that the two groups being
compared (e.g., the exposed and the unexposed) are exchangeable (i.e., that
is there is no confounding) and that SUTVA is reasonable. SUTVA requires
that (1) the effect of a treatment is the same, no matter how an individual
came to be treated, and (2) the outcome in an individual is not influenced by
the treatment that other individuals receive. In Rubin’s (1986) language,
SUTVA is simply the a priori assumption that the value of Y [i.e., the
outcome] for unit u [e.g., a particular person] when exposed to treat-
ment t [e.g., a particular exposure or risk factor] will be the same no
matter what mechanism is used to assign treatment t to unit u and no
matter what treatments the other units receive . . . SUTVA is violated
when, for example, there exist unrepresented versions of treatments
(Ytu depends on which version of treatment t was received) or inter-
ference between units (Ytu depends on whether unit u0 received treat-
ment t or t0 ). (p. 961)
Thus, if one were to study the effects of a particular form of psychother-

apy, SUTVA would be violated if (1) there were different therapists with
different levels of expertise or some individuals freely agreed to enter therapy
while others agreed only at the behest of a relative and the mode of entry
influenced the effectiveness of the treatment (producing unrepresented ver-
sions of treatments) (Little & Rubin, 2000) or (2) individuals in the treatment
group shared insights they learned in therapy with others in the study (pro-
ducing interference between units) (Little & Rubin, 2000).
The language in which SUTVA is described, the effects of treatment
assignment and versions of treatments, is again indicative of the explicit
connection between the potential outcomes model and randomized experi-
ments. To make observational studies as close to experiments as possible, we
must ensure that those exposed to the ‘‘alternative treatments’’ (i.e., different
exposures) are exchangeable in the sense that the same outcomes would arise
if the individuals in the different exposure groups were reversed. In addition,
we must ensure that we control all factors that violate SUTVA. We do this by
carefully defining exposures or risk factors in terms of narrowly defined
treatments that can be manipulated, at least in theory.
To continue our smoking example, one could ask questions about the
average causal effect of smoking a pack of cigarettes a day for a year (treat-
ment A) compared with never having smoked at all (treatment B) in an
observational study. Since we cannot observe the same people simultaneously
under two different treatments, we compare the disease experience of two
groups of people: one with treatment A, the exposure of interest, and one
with treatment B, the substitute for the potential outcomes of the same group
under the second treatment option. In order for the substitution to yield an
accurate effect estimate (i.e., for exchangeability to hold), we must ensure
that the smokers and nonsmokers are as similar as possible on all causes of
the outcome (other than smoking). This can be accomplished by random
assignment in a randomized controlled trial. To meet SUTVA assumptions,
we have to (1) be vigilant to define our exposure precisely so there is only one
version of each treatment and be certain that how individuals entered the
smoking and nonsmoking groups did not influence their outcome and
(2) ensure the smoking habits of some individuals in our study did not
influence the outcomes of other individuals.
Barring other methodological problems, it would be assumed that if we
did the intervention in real life, that is, if we prevented people from smoking
a pack of cigarettes a day for a year, the average causal effect estimated from
our study would approximate this intervention effect. The potential outcomes
model is an attempt to predict the average causal effect that would arise (or
be prevented) from a particular manipulation under SUTVA. It is self-
consciously interventionist.
Indeed, causal questions are framed in terms of intervention conse-
quences. To ensure the validity of the causal effects uncovered in epidemio-
logic studies, researchers are encouraged to frame the causal question in
these terms. As a prototypical example, Glymour (2007), in a cogent metho-
dologic critique of a study examining the effect of childhood socioeconomic
position on adult health, restated the goal of the study in potential outcome
terms. ‘‘The primary causal question of interest is how adult health would
differ if we intervened to change childhood socio-economic position’’ (p. 566).
It is critical to note that even when we do not explicitly begin with this
type of model, the interventionist focus of the potential outcomes frame
implicitly influences our thinking through its influence on our methods.
For example, this notion is embodied in our understanding of the attributable
risk as the proportion of disease that would be prevented if we were to
remove this exposure (Last, 2001). More generally, authors often end study
reports with a statement about the implications of their findings for inter-
vention or policy that reflect this way of thinking.
Limitations of the Potential Outcomes Model for

Interventions in the Real World
To ensure the internal validity of our inferences, we isolate the effects of our
causes from the context in which they act. We do this by narrowly defining
our treatments, creating exchangeability between treated and untreated

people, and considering social norms and the physical environment as part
of a stable background in which causes act. In order for the causal effect of
an exposure in a study to translate to the effect of its intervention, all of the
controls and conditions imposed in the study must hold in the intervention
and the targeted population (e.g., treatment definition, follow-up time frame,
distribution of other causes).
The problem is that, in most cases, interventions in the real world cannot
replicate the conditions that gave rise to the average causal effect in a study.
It is important to note that this is true for randomized controlled trials as
well as observational studies. It is true for classic risk factors as well as for
exposures in life course and social epidemiology. The artificial world that we
appropriately create to identify causal effects—a narrow swath of temporal,
geographic, and social reality in which exchangeability exists and SUTVA is
not violated—captures a vital but limited part of the world in which we are
interested. Thus, while the approach we use in studies aids in the valid
estimation of a causal effect for the past, it provides a poor indicator of a
causal effect for the future. For these reasons, the causal effects of our
interventions in the real world are unlikely to be the same as the causal
effects of our studies.
This problem is well recognized in the literature on randomized controlled
trials in terms of the difference between efficacy and effectiveness and in the
epidemiologic literature as the difference between internal validity and exter-
nal validity. However, this recognition is rarely reflected in research practice.
We suspect this problem may be better understood by deeper examination of
the causes of the discrepancy between the effects observed in studies and the
effects of interventions. We group these causes into three interrelated cate-
gories: direct violations of SUTVA, unintended consequences of our interven-
tions, and context dependencies.
Direct Violations of SUTVA

Stable Treatment Effect
In order to identify a causal effect, a necessary SUTVA is that there is only

one version of the treatment. To meet this assumption, we need to define the
exposures in our studies in an explicit and narrow way. For example, we
would ask about the effects of a particular form of psychotherapy (e.g., inter-
personal psychotherapy conducted by expert clinicians) rather than about
psychotherapy in general. This is because the specific types of therapy
encompassed within the broad category of ‘‘psychotherapy’’ are likely to
have different effects on the outcome.
While this is necessary for the estimation of precise causal effects in our
studies, it is not likely to reflect the meaning of the exposure or treatment in
the real world. The removal of causes or the provision of treatments, no
matter how well defined, is never surgical. Unlike the removal of causes
by the magic powder in our thought experiments, interventions are often
crude and messy. Public-health interventions are inherently broad. Even in
a clinical context, treatment protocols are never followed precisely in real-
world practice.
In public-health interventions, there are also different ways of getting into
‘‘treatment,’’ and these may well have different effects on the outcome. For
instance, the effect of an intervention offering a service may be very different
for those who use it only after it has become popular (the late adopters). Early
adopters of a low-fat diet, for example, may increase their intake of fruits and
vegetables to compensate for the caloric change. Late adopters may substitute
low-fat cookies instead. A low-fat diet was adopted by both types of people,
but the effect on an outcome (e.g., weight loss) would likely differ. There are
always different versions of treatments, and the mechanisms through which
individuals obtain the treatments will frequently impact the effect of the
treatments on the outcome.
Interference Between Units
When considered in real-world applications over a long enough time frame,

there will always be ‘‘interference between units.’’ Because people live in
social contexts, their behavior influences norms and social expectations.
Behavior is contagious. This can work in positive ways, increasing the effec-
tiveness of an intervention, or lead to negative unintended consequences. An
example of the former would be when the entrance of a friend into a weight-
loss program encourages another friend to lose weight (Christakis & Fowler,
2007). Thus, the outcome for one individual is contingent on the exposure of
another individual. Similarly, changes in individual eating behaviors spread.
This influences not only individuals’ behavior but, eventually, the products
that stores carry, the price of food, and the political clout of like-minded
individuals. It changes the threshold for the adoption of healthy eating
habits. There is an effect not only of the weight-loss program itself but
also of the proportion of people enrolled in weight-loss programs within
the population.
Within the time frame of our studies, the extant norms caused by inter-
actions among individuals and the effect of the proportion of exposure in the
population are captured as part of the background within which causes act,
are held constant, and are invisible. To identify the true effects these causes
had, this approach is reasonable and necessary. The causes worked during
that time frame within that normative context. However, in a public-health

intervention, these norms change over time due to the intervention. This
problem is well recognized in infectious disease studies where the contagion
occurs in a rapid time frame, making noninterference untenable even in the
context of a short-term study. It is hard to imagine, though, any behavior
which is not contagious over long enough time frames. The fact is that the
causal background we must hold constant to estimate a causal effect is
influenced by our interventions.
Unintended Consequences of Interventions

Unintended consequences of interventions are consequences of exposure
removal not represented as part of the causal effect of the exposure on the
outcome under study. The causes of these unintended consequences include
natural confounding and narrowly defined outcomes.
Natural Confounding
Recall that the estimation of the true causal effect requires exchangeability of
potential outcomes between the exposed and unexposed groups in our stu-
dies. Exchangeability is necessary to isolate the causal effect of interest. For
example, in examining the effects of alcohol abuse on vehicular fatalities, we
may control for the use of illicit drugs. We do so because those who abuse
alcohol may be more likely to also abuse other drugs that are related to
vehicular fatalities.
If the association between alcohol abuse and illicit drug use is a form of
‘‘natural confounding,’’ that is, the association between alcohol and drug use
arises in naturally occurring populations and is not an artifact of selection
into the study, then this association is likely to have important influences in a
real-world intervention. That is, the way in which individuals came to be
exposed may influence the effect of the intervention, in violation of SUTVA.
For example, when two activities derive from a similar underlying factor
(social, psychological, or biologic), the removal of one may influence the
presence of the other over time; it may activate a feedback loop. Thus, the
causal effect of alcohol abuse on car accidents may overestimate the effect of
the removal of alcohol abuse from a population if the intervention on alcohol
use inadvertently increases marijuana use.
As this example illustrates, an intervention may influence not only the
exposure of interest but also other causes of the outcome that are linked with
the exposure in the real world. In our studies, we purposely break this link.
We overcome the problem of the violation of SUTVA by imposing narrow
limits on time and place so that SUTVA holds in the study. We control these
variables, precisely because they are also causes of the outcome under study.
In the real world, however, their influence may make the interventions less
effective than our effect estimates suggest. The control in the study was not
incorrect as it was necessary to isolate the true effect that alcohol use did
have on car accidents among these individuals given the extant conditions of
the study. However, outside the context of the study, removal of the exposure
of interest had unintended consequences over time through its link with
other causes of the outcome.
Narrowly Defined Outcomes
Although we may frame our studies as identifying the ‘‘effects of causes,’’

they identify only the effects of causes on the specific outcomes we examine
in our studies. In the real world, causes are likely to have many effects.
Likewise, our interventions have effects on many outcomes, not only those
we intend. Unless we consider the full range of outcomes, our interventions
may be narrowly successful but broadly harmful. For example, successful
treatments for AIDS have decreased the death rate but have also led people
to reconceptualize AIDS from a lethal illness to a manageable chronic disease.
This norm change can lead to a concomitant rise in risk-taking behaviors and
an increase in disease incidence. More optimistically, our interventions may
have beneficial effects that are greater than we assume if we consider unin-
tended positive effects. For example, an intervention designed to increase
high school graduation rates may also reduce alcoholism among teens.
Context Dependency
Most fundamentally, all causal effects are context-dependent, and therefore,
all effects are local. It is unlikely that a public-health intervention will be
applied only in the exact population in which the causal effects were studied.
Public-health interventions often apply to people who do not volunteer for
them, to a broader swath of the social fabric and over a different historical
time frame. Therefore, even if our effect estimates were perfectly valid, we
would expect effects to vary between our studies and our interventions. For
example, psychiatric drugs are often tested on individuals who meet strict
Diagnostic and Statistical Manual of Mental Disorders criteria, do not have
comorbidities, and are placebo nonresponders. Once the drugs are marketed,
however, they are used to treat individuals who represent a much wider
population. It is unlikely that the effects of the drugs will be similar in
real-world usage as in the studies.
For all these reasons, it seems unlikely that the causal effect of any inter-
vention will reflect the causal effect found in our studies. These problems are
well known and much discussed in the social science literature (e.g., Merton,
1936, 1968; Lieberson, 1985) and the epidemiologic literature (e.g.,
Greenland, 2005).
Nonetheless, when carrying out studies, epidemiologists often talk about
trying to identify ‘‘the true causal effect of an exposure,’’ as if this was a
quantification that has some inherent meaning. An attributable risk is inter-
preted as if this provided a quantification of the effect of the elimination of
the exposure under study. Policy implications of etiologic work are discussed
as if they flowed directly from our results. We think that this is an overly
optimistic assessment of what our studies can show. We think that as a field
we tend to estimate the effect exposures had in the past and assume that this
will be the effect in the future. We do this by treating the counterfactual of
the past as equivalent to the potential outcome of the future.
An Alternative Counterfactual Framework (An Integrated

Counterfactual Approach)
An alternative framework, which we will refer to as an ‘‘integrated counter-

factual approach’’ (ICA), distinguishes three sequential tasks in the relation-
ship between etiologic studies and public-health interventions, the first two of
which are not explicit goals in a potential outcomes frame: (1) causal identi-
fication, (2) causal explanation, and (3) the effects of causal manipulation.
Step 1: Causal Identification

In line with the Cook and Campbell tradition (Shadish et al., 2002; Cook &
Campbell 1979), this alternative causal approach uses the insights and meth-
ods of potential outcomes models but reframes the question that these
models address as the identification of a cause rather than the result of a
manipulation. Whereas the potential outcomes model is rooted in experi-
ments, the ICA is rooted in philosophic discussions of counterfactual defini-
tions of a cause, particularly the work of Mackie (1965, 1974). It begins with
Mackie’s definition of a cause rather than a definition of a causal effect.
For Mackie, X is a cause of Y if, within a causal field, with all held
constant, Y would not have occurred if X had not, at least not when and
how it did. Mackie’s formulation begins with a particular outcome and
attempts to identify some of the factors that caused it. Thus, the causal
contrast for Mackie is between what actually happened and what would
have happened had everything remained the same except that one of the
exposures was absent. The contrast represents the difference between a fact
and a counterfactual, rather than two potential outcomes.
Thus, for Mackie, something is a cause if the outcome under exposure is

different from what the outcome would have been under nonexposure. By
beginning with actual occurrences, Mackie gives prominence to the contin-
gency of all causal identification. This approach explicitly recognizes that
causes are always identified within a causal field of interest, where certain
factors are assumed to be part of the background in which causes act, rather
than factors available for consideration as causes. The decision to assign
factors to the background may differ among researchers and time periods.
Thus, there is a subjective element in deciding which, among the myriad of
possible exposures, factor is hypothesized to be a cause of interest.
Rothman and Greenland (1998) provide a definition of a cause in the
context of health that is consistent with Mackie’s view: ‘‘We can define a
cause of a specific disease event as an antecedent event, condition, or char-
acteristic that was necessary for the occurrence of the disease at the moment
it occurred, given that other conditions are fixed’’ (p. 8). As applied to a
health context, both Mackie and Rothman and Greenland begin with the
notion that, for most diseases, an individual can develop a disease from
one of many possible causes, each of which consists of several components
working together. In this model, although none of the components in any
given constellation can cause disease by itself, each makes a nonredundant
and necessary contribution to complete a causal mechanism. A constellation
of components that is minimally sufficient to cause disease is termed a
sufficient cause. Mackie referred to these component causes as ‘‘insufficient
but necessary components of unnecessary but sufficient’’ (INUS) causes.
Rothman’s (1976) sufficient causes are typically depicted as ‘‘causal pies.’’
As an example, assume that the disease of interest is schizophrenia. There
may be three sufficient causes of this disease (see Figure 2.1).
An individual can develop schizophrenia from a genetic variant, a trau-
matic event, and poor nutrition; from stressful life events, childhood neglect,
Poor Prenatal
nutrition Neglect viral
Stressful exposure
Trauma Child
event virus
U1 U2 U3
Gene Toxin Vitamin
deficiency
Sufficient Sufficient Sufficient

Cause Cause Cause
1 2 3
Figure 2.1 Potential Causes of Schizophrenia depicted as Causal Pies.
Adapted from Rothman and Greenland, (1998).
and exposure to an environmental toxin; or from prenatal viral exposures,

childhood viral exposure, and a vitamin deficiency. We have added compo-
nents U1, U2, and U3 to the sufficient causes to represent additional
unknown factors. Each individual develops schizophrenia from one of
these sufficient causes; in no instance does the disease occur from any one
factor—rather, it occurs due to several factors working in tandem.
The ICA and potential outcomes model are quite consistent in many
critical ways in this first step. Indeed, the potential outcomes model provides
a logical, formal, statistical framework applicable to causal inference within
the context of the ICA. Regardless of whether we intend to identify a cause or
estimate a causal effect, the same isolation of the cause is required. Most
essentially, this means that comparison groups must be exchangeable.
However, each framework is intended to answer different questions (see
Table 2.1).
From a potential outcomes perspective, the goal is to estimate the causal
effect of the exposure. From an ICA perspective, the goal is to identify
whether an exposure was a cause. This distinction between the goals of
identifying the effects of causes and the causes of effects is critical and has
many consequences.
First, identifying the effects of causes is future-oriented. We start with a
cause and estimate its effect. The causal contrast is between the potential
Table 2.1 Differences Between the Potential Outcomes Model and an Integrated
Counterfactual Approach
Potential Outcomes Model Integrated Counterfactual

Approach
Goal Estimation of true causal effect Identification of true causes
• Salient • Estimate • Identify
differences • Quantitative • Qualitative
• Effects of causes • Causes of effects
Means Compare two potential outcomes Compare a fact with a
• Salient • Entire population under two counterfactual
differences exposures • Exposed under two exposure
• Manipulable causes conditions
• SUTVA • Any factor
• Mimic random assignment • Construct validity
• Mimic assignment of exposed
Interpretation Potential outcome of the future Causal effect of the past
• Salient • Expect consistency • Expect inconsistency
differences
disease experiences of a group of individuals under two exposure conditions.

Identifying the causes of effects, in contrast, implies that the identification is
about what happened in the past. The causal contrast is between what did
happen to a group of individuals under the condition of exposure, something
explicitly grounded in and limited by a particular sociohistorical reality, and
what would have happened had all conditions remained constant except that
the exposure was absent. This approach identifies factors that actually were
causes of the outcome. Whether or not they will be causes of the outcome
depends on the constellation of the other factors held constant at that parti-
cular sociohistorical moment. The effect of this cause in the future is expli-
citly considered a separate question.
Second, when we consider a potential outcomes model, the causal effect of
interest is most often the causal effect for the entire population. That is, we
conceptualize the causal contrast as the entire study population under two
different treatments. We create exchangeability by mimicking random assign-
ment. Neither exposure condition is ‘‘fact’’ or ‘‘counterfactual.’’ Rather, both
treatment conditions are substitutes for the experience of the entire popula-
tion under that treatment.
In contrast, Mackie’s perspective implies that the counterfactual of interest
is a counterfactual for the exposed. We take as a given what actually hap-
pened to people with the putative causal factor and imagine a counterfactual
in reference to them. We create exchangeability by mimicking the predisposi-
tions of the exposed. This puts a different spin on the issue of confounding
and nonexchangeability.2 The factors that differentiate exposed and unex-
posed people are more easily seen as grounded in characteristics of truly
exposed people and their settings. It makes explicit that the causal effect
for people who are actually exposed may not be the same as the effect that
cause would have on other individuals. Thus, this type of confounding is
seen not as a study artifact but as a form of true differences between exposed
and unexposed people that can be and must be adjusted for in our study but
must also be considered as an active element in any real-life intervention.
Third, the focus on estimating the effects of causes in the potential out-
comes model leads to the requirement of manipulability; any factor which is
not manipulable is not fodder for causal inference. From an ICA perspective,
any factor can be a cause (Shadish et al., 2002). To qualify, it has to be a
factor that, were it absent and with all else the same, this outcome within
this context would not have occurred. Even characteristics of individuals, such
as gender, are grist for a counterfactual thought experiment. The world is
2. Technically, when the effect for the entire population is of interest, full exchangeability is
required. When the effect for the exposed is of interest, only partial exchangeability is required
(Greenland & Robins, 1986).
fixed as it is in this context, say, with a fairly rigid set of social expectations
depending on identified sex at birth. We can ask a question about what an
individual’s life would have been like had he been born male, rather than
female, given this social context.
Fourth, this perspective brings the issue of context dependency front and
center. As Rothman’s (1976) and Mackie’s (1965) models make explicit, shifts
in the component causes and their distributions, variations in the field of
interest, and the sociohistorical context change the impact of the cause and,
indeed, determine whether or not the factor is a cause in this circumstance.
Thus, the impact of a cause is explicitly recognized as context-dependent; the
size of an effect is not universal. A factor can be a cause for some individuals
in some contexts but not in others. Thus, the goal is the ‘‘identification of
causes in the universe,’’ rather than the estimation of universal causal effects.
By ‘‘causes in the universe’’ we mean factors which at some moment in time
have caused the outcome of interest and could theoretically (if all else were
equal) happen again.
Step 2: Causal Explanation

The focus on the causes of effects facilitates an important distinction that
emerges from the Cook and Campbell (1979) tradition—that between causal
identification and causal explanation. From their perspective, in the first step,
we identify whether the exposure of interest did cause the outcome in some
people in our study. We label this ‘‘causal identification.’’3 If we want to
understand the effect altering a cause in the future, an additional step of
causal explanation is required. Causal explanation comprises two compo-
nents, construct validity, an understanding of the ‘‘active ingredients’’ of the
exposure and how they work, and external validity, an identification of the
characteristics of persons, places, and settings that facilitate its effect on the
outcome.
Construct Validity
In causal identification, we examine the causal effects of our variables as

measured. In causal explanation, we ask what it is about these variables
that caused the outcome. Through mediational analyses, we examine both
the active ingredients of the exposure (i.e., what aspects of the exposure are
causal) and the pathways through which the exposure affects the outcome.
Mediational analyses explicitly explore the potential SUTVA violation inherent
3. Shadish et al. (2002) call this step ‘‘causal description.’’ We think ‘‘causal identification: is a
better fit for our purposes.
in different versions of treatments. Exploration of pathways can lead to a

more parsimonious explanation for findings across different exposure mea-
sures. Based on the active ingredients of exposure (and their resultant path-
ways), we can test not only the specific exposure–disease relationship but also
a more integrative theory regarding the underlying ‘‘general causal mechan-
isms’’ (Judd & Kenny, 1981). This theory allows us to make statements about
an observed association that are less bounded by the specific circumstances of
a given study and to generalize based on deep similarities (Judd & Kenny,
1981; Shadish et al., 2002). This generalization has two practical benefits.
First, knowledge of mechanisms enhances our ability to compare study
results across exposures and, thus, integrate present knowledge. Second,
such an analysis can help to identify previously unknown exposures or treat-
ments, due to the fact that they capture the same active ingredient (or work
through the same mechanism) as the exposure or treatment under study
(Hafeman, 2008).
Let us continue the gender example. First, we test the hypothesis that
female gender was a cause of depression for some people in our sample.
By this we mean that there are some people who got depressed as women
who would not have been depressed had they not been women (i.e., if they
were male—or some other gender). Of course, causal inference is tentative as
always. Assume that at this first step we identified something that is not just
an association and we took care to rule out all noncausal alternative explana-
tions to the best of our ability.
Once that step is accomplished, we may ask how female gender causes
disease. Gender is a multifaceted construct with many different aspects—
genetic, hormonal, psychological, and social. Once we know that gender
has a causal effect, probing the construct helps us to identify what it is
about female gender that causes depression. This may help to verify gender’s
causality in depression and to identify other exposures that do the same thing
as gender (i.e., other constructs that have the same active ingredient). For
example, some have suggested that the powerlessness of women’s social roles
is an active ingredient in female gender as a cause of depression. This would
suggest that other social roles related to powerlessness, such as low socio-
economic position, might also be causally related to depression. Probing the
construct of the outcome plays a similar role in causal explanation. It helps to
identify the specific aspects of the outcome that are influenced by the expo-
sure and to refine the definition of the outcome.
External Validity
The other aspect of causal explanation requires an examination of the con-

ditions under which exposures act (Shadish et al., 2002). The context
dependency of causal effects is therefore made explicit. Causal inference is

strengthened through the theoretical consideration and testing of effect var-
iation. From the perspective of the ICA, consistency of effects across settings,
people, and time periods is not the expectation. Rather, variation is expected
and requires examination.
When we identify causes in our studies, we make decisions about the
presumptive world that we hold constant, considering everything as it was
when the exposure arose. Thus, the social effects and norms that may have
been consequences of the exposure are frozen in the context. However, when
we intervene on our causes, we must consider the new context. This aspect of
causal explanation, the specification of the conditions under which exposures
will and will not cause disease, is considered the separate task of external
validity in the Cook and Campbell scheme.
Step 3: Causal Manipulation

While this separation of causal identification and causal explanation has the
benefit of placing contingency and context dependency center stage, it does
not resolve the discrepancy between the effects observed in studies and the
effects of interventions. It does not provide the tools necessary to uncover the
feedback loops and unintended consequences of our interventions. It does
not fully address the violation of the SUTVA of no interference between
units. Even causal explanation is conducted within established methods of
isolation, reductionism, and linearity. Prediction of the effects of causal
manipulation may require a different approach, one rooted in complexity
theories and systems analysis, as the critics contend (e.g., McMichael,
1999; Levins, 1997; Krieger, 1994).
To understand an intervention, the complexity of the system and feed-
backs depends, of course, on the question at hand. The critical issue, as
Levins (1996) notes, is the ability to decide when simplification is constructive
and when it is an obfuscation.
The implementation of systems approaches within epidemiology requires
considerable methodological and conceptual development but may be a
required third step to link etiologic research to policy.
The integrated causal approach does not provide a solution to the discre-
pancy between the results of etiologic studies and the results of public-health
interventions. It does, however, provide a way of thinking in which causal
identification is explicitly conceptualized as a first step rather than a last step
for public-health intervention. It is a road map to a proposed peace treaty in
the epidemiology wars between the counterfactual and dynamic model
camps. It suggests that the models are useful for different types of questions.
Counterfactual approaches, under SUTVA, are essential for identifying causes
of the past. Dynamic models allowing for violations of SUTVA are required to
understand potential outcomes of the future.
Summary
The rigor of causal inference, brought to light in the development of the

potential outcomes model, is essential as the basis for any intervention. Rigor
is demanded because interventions developed around noncausal associations
are doomed to failure. However, reifying the results of our studies by treating
causes as potential interventions is also problematic.
We suspect that public health will benefit from interventions identified
using an approach that integrates the potential outcomes tradition of Rubin
and Robins in statistics and epidemiology with the counterfactual tradition of
Shadish, Cook, and Campbell in psychology. This integrated approach clari-
fies that the identification of causes facilitated by isolation is only a first step
in policy formation. A second step, causal explanation, aids in the general-
izability of our findings. Here, however, instead of replication of our study in
different contexts, we generalize on the basis of the deep similarities uncov-
ered through causal explanations. The steps of identification and explanation
may require a third step of prediction to understand intervention effects. The
causes that we identify, together with their mediators and effect modifiers,
may be considered nodes in more complex analyses that allow for the con-
sideration of feedback loops and the unintended consequences that are inher-
ent in any policy application. The methods for this final step have not yet
been fully developed. The conceptual separation of these three questions,
grounded in a distinction between counterfactuals of the past and potential
outcomes of the future may prepare the ground for such innovations. For as
Kierkegaard (1943; cited in Hannay, 1996) noted, ‘‘life is to be understood
backwards, but it is lived forwards.’’ At a minimum, we hope that a more
modest assessment of what current epidemiologic methods can provide will
help stem cynicism that inevitably arises when we promise more than we can
possibly deliver.
References
Beaglehole, R., & Bonita, R. (1997). Public health at the crossroads: Achievements and
prospects. New York: Cambridge University Press.
Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social
network over 32 years. New England Journal of Medicine, 357, 370–379.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues
for field settings. Chicago: Rand McNally.
Glymour, M. M. (2007). Selected samples and nebulous measures: Some methodolo-

gical difficulties in life-course epidemiology. International Journal of Epidemiology, 36,
566–568.
Greenland, S. (2005). Epidemiologic measures and policy formulation: Lessons from
potential outcomes. Emerging Themes in Epidemiology, 2, 1–7.
Greenland, S., & Robins, J. M. (1986). Identifiability, exchangeability and epidemiolo-
gical confounding. International Journal of Epidemiology, 15, 413–419.
Hafeman, D. (2008). Opening the black box: A re-assessment of mediation from a counter-
factual perspective. Unpublished doctoral dissertation, Columbia University, New York.
Hannay, A. (1996). Soren Kierkegaard S. (1843): Papers and journals. London: Penguin
Books.
Hernan, M. A. (2004). A definition of causal effect for epidemiological research.
Journal of Epidemiology and Community Health, 58, 265–271.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical
Institute of Medicine (1988). The Future of Public Health. Washington, DC: National
Academy Press.
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treat-
ment evaluations. Evaluation Review, 5, 602–619.
Krieger, N. (1994). Epidemiology and the web of causation: Has anyone seen the
spider? Social Science and Medicine, 39, 887–903.
Last, J. M. (2001). A dictionary of epidemiology. New York: Oxford University Press.
Levins, R. (1996). Ten propositions on science and anti-science. Social Text, 46/47,
101–111.
Levins, R. (1997). When science fails us. Forests, Trees and People Newsletter, 32/33,
1–18.
Lieberson, S. (1985). Making it count: The improvement of social research and theory.
Berkeley: University of California Press.
Little, R. J., & Rubin, D. B. (2000). Causal effects in clinical and epidemiological
studies via potential outcomes: Concepts and analytical approaches. Annual Review
of Public Health, 21, 121–145.
Mackie, J. L. (1965). Causes and conditions. American Philosophical Quarterly, 4,
245–264.
Mackie, J. L. (1974). Cement of the universe: A study of causation. Oxford: Oxford
University Press.
Maldonado, G. & Greenland S. (2002). Estimating causal effects. International Journal
of Epidemiology, 31, 422–429.
McMichael, A. J. (1999). Prisoners of the proximate: Loosening the constraints on
epidemiology in an age of change. American Journal of Epidemiology, 149, 887–897.
Merton, R. K. (1936). The unanticipated consequences of purposive social action.
American Sociological Review, 1, 894–904.
Merton, R. K. (1968). Social theory and social structure. New York: Free Press.
Milbank Memorial Fund Commission (1976). Higher education for public health: A
report of the Milbank Memorial Fund Commission. New York: Prodist.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and
principles for social research. Cambridge: Cambridge University Press.
Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592.
Rothman, K.J. & Greenland, S. (1998). Modern epidemiology. Philadelphia: Lippincott-
Raven Publishers.
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization.
Annals of Statistics, 6, 34–58.
Rubin, D. B. (1986). Statistics and causal inference comment: Which ifs have causal
answers. Journal of the American Statistical Association, 81, 961–962.
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling
decisions. Journal of the American Statistical Association, 100, 322–331.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experi-
mental designs for generalized causal inference. Boston: Houghton Mifflin.
Shy, C. M. (1997). The failure of academic epidemiology: Witness for the prosecution.
American Journal of Epidemiology, 145, 479–484.
VanderWeele, T. J., & Hernan, M. A. (2006). From counterfactuals to sufficient com-
ponent causes and vice versa. European Journal of Epidemiology, 21, 855–858.
3
The Mathematics of Causal Relations

judea pearl
Introduction
Almost two decades have passed since Paul Holland published his highly
cited review paper on the Neyman-Rubin approach to causal inference
(Holland, 1986). Our understanding of causal inference has since increased
severalfold, due primarily to advances in three areas:
1. Nonparametric structural equations
2. Graphical models
3. Symbiosis between counterfactual and graphical methods
These advances are central to the empirical sciences because the research
questions that motivate most studies in the health, social, and behavioral
sciences are not statistical but causal in nature. For example, what is the
efficacy of a given drug in a given population? Can data prove an employer
guilty of hiring discrimination? What fraction of past crimes could have been
avoided by a given policy? What was the cause of death of a given individual
in a specific incident?
Remarkably, although much of the conceptual framework and many of the
algorithmic tools needed for tackling such problems are now well established,
they are hardly known to researchers in the field who could put them into
practical use. Why?
Solving causal problems mathematically requires certain extensions in the
standard mathematical language of statistics, and these extensions are not
generally emphasized in the mainstream literature and education. As a
result, large segments of the statistical research community find it hard to
appreciate and benefit from the many results that causal analysis has pro-
duced in the past two decades.
47
This chapter aims at making these advances more accessible to the gen-
eral research community by, first, contrasting causal analysis with standard
statistical analysis and, second, comparing and unifying various approaches
to causal analysis.
From Associational to Causal Analysis:

Distinctions and Barriers
The Basic Distinction: Coping with Change

The aim of standard statistical analysis, typified by regression, estimation,
and hypothesis-testing techniques, is to assess parameters of a distribution
from samples drawn of that distribution. With the help of such parameters,
one can infer associations among variables, estimate the likelihood of past
and future events, as well as update the likelihood of events in light of new
evidence or new measurements. These tasks are managed well by standard
statistical analysis so long as experimental conditions remain the same.
Causal analysis goes one step further; its aim is to infer not only the like-
lihood of events under static conditions but also the dynamics of events
under changing conditions, for example, changes induced by treatments or
external interventions.
This distinction implies that causal and associational concepts do not mix.
There is nothing in the joint distribution of symptoms and diseases to tell us
that curing the former would or would not cure the latter. More generally,
there is nothing in a distribution function to tell us how that distribution
would differ if external conditions were to change—say, from observational to
experimental setup—because the laws of probability theory do not dictate
how one property of a distribution ought to change when another property
is modified. This information must be provided by causal assumptions which
identify relationships that remain invariant when external conditions change.
These considerations imply that the slogan ‘‘correlation does not imply
causation’’ can be translated into a useful principle: One cannot substantiate
causal claims from associations alone, even at the population level—behind
every causal conclusion there must lie some causal assumption that is not
testable in observational studies.
Formulating the Basic Distinction

A useful demarcation line that makes the distinction between associational
and causal concepts crisp and easy to apply can be formulated as follows. An
associational concept is any relationship that can be defined in terms of a joint
distribution of observed variables, and a causal concept is any relationship that

cannot be defined from the distribution alone. Examples of associational
concepts are correlation, regression, dependence, conditional independence,
likelihood, collapsibility, risk ratio, odd ratio, propensity score, ‘‘Granger caus-
ality,’’ marginalization, conditionalization, and ‘‘controlling for.’’ Examples of
causal concepts are randomization, influence, effect, confounding, ‘‘holding
constant,’’ disturbance, spurious correlation, instrumental variables, ignor-
ability, exogeneity, exchangeability, intervention, explanation, and attribution.
The former can, while the latter cannot, be defined in term of distribution
functions.
This demarcation line is extremely useful in causal analysis for it helps
investigators to trace the assumptions that are needed for substantiating
various types of scientific claims. Every claim invoking causal concepts
must rely on some premises that invoke such concepts; it cannot be inferred
from, or even defined in terms of, statistical notions alone.
Ramifications of the Basic Distinction

This principle has far-reaching consequences that are not generally recog-
nized in the standard statistical literature. Many researchers, for example,
are still convinced that confounding is solidly founded in standard, frequen-
tist statistics and that it can be given an associational definition, saying
(roughly) ‘‘U is a potential confounder for examining the effect of treatment
X on outcome Y when both U and X and both U and Y are not independent’’
(Pearl 2009b, p. 338). That this definition and all of its many variants must
fail is obvious from the demarcation line above; ‘‘independence’’ is an asso-
ciational concept, while confounding is for a tool used in establishing causal
relations. The two do not mix; hence, the definition must be false. Therefore,
to the bitter disappointment of generations of epidemiology researchers, con-
founding bias cannot be detected or corrected by statistical methods alone;
one must make some judgmental assumptions regarding causal relationships
in the problem before an adjustment (e.g., by stratification) can safely correct
for confounding bias.
Another ramification of the sharp distinction between associational and
causal concepts is that any mathematical approach to causal analysis must
acquire new notation for expressing causal relations—probability calculus is
insufficient. To illustrate, the syntax of probability calculus does not permit
us to express the simple fact that ‘‘symptoms do not cause diseases,’’ let
alone to draw mathematical conclusions from such facts. All we can say is
that two events are dependent—meaning that if we find one, we can expect
to encounter the other but we cannot distinguish statistical dependence,
quantified by the conditional probability p(disease | symptom) from causal
dependence, for which we have no expression in standard probability calcu-

lus. Scientists seeking to express causal relationships must therefore supple-
ment the language of probability with a vocabulary for causality, one in which
the symbolic representation for the relation ‘‘symptoms cause disease’’ is
distinct from the symbolic representation of ‘‘symptoms are associated with
disease.’’
Two Mental Barriers: Untested Assumptions

and New Notation
The preceding requirements—(1) to commence causal analysis with
untested,1 theoretically or judgmentally based assumptions and (2) to
extend the syntax of probability calculus—constitute the two main obstacles
to the acceptance of causal analysis among statisticians and among profes-
sionals with traditional training in statistics.
Associational assumptions, even untested, are testable in principle, given a
sufficiently large sample and sufficiently fine measurements. Causal assump-
tions, in contrast, cannot be verified even in principle, unless one resorts to
experimental control. This difference stands out in Bayesian analysis. Though
the priors that Bayesians commonly assign to statistical parameters are
untested quantities, the sensitivity to these priors tends to diminish with
increasing sample size. In contrast, sensitivity to prior causal assump-
tions—say, that treatment does not change gender—remains substantial
regardless of sample size.
This makes it doubly important that the notation we use for expressing
causal assumptions be meaningful and unambiguous so that one can clearly
judge the plausibility or inevitability of the assumptions articulated.
Statisticians can no longer ignore the mental representation in which scien-
tists store experiential knowledge since it is this representation and the lan-
guage used to access this representation that determine the reliability of the
judgments upon which the analysis so crucially depends.
How does one recognize causal expressions in the statistical literature?
Those versed in the potential-outcome notation (Neyman, 1923; Rubin, 1974;
Holland, 1986) can recognize such expressions through the subscripts
that are attached to counterfactual events and variables, for example, Yx(u)
or Zxy—some authors use parenthetical expressions, such as Y(x, u) or
Z(x, y). The expression Yx(u), for example, stands for the value that outcome
Y would take in individual u had treatment X been at level x. If u is chosen at
random, Yx is a random variable and one can talk about the probability that Yx
1. By ‘‘untested’’ I mean untested using frequency data in nonexperimental studies.

would attain a value y in the population, written p(Yx = y). Alternatively, Pearl
(1995) used expressions of the form p[Y = y | set(X = x)] or p[Y = y | do(X = x)] to
denote the probability (or frequency) that event (Y = y) would occur if treatment
condition (X = x) were enforced uniformly over the population.2 Still a third
notation that distinguishes causal expressions is provided by graphical models,
where the arrows convey causal directionality.3
However, few have taken seriously the textbook requirement that any
introduction of new notation must entail a systematic definition of the
syntax and semantics that govern the notation. Moreover, in the bulk of
the statistical literature before 2000, causal claims rarely appear in the mathe-
matics. They surface only in the verbal interpretation that investigators occa-
sionally attach to certain associations and in the verbal description with
which investigators justify assumptions. For example, the assumption that
a covariate is not affected by a treatment, a necessary assumption for the
control of confounding (Cox, 1958), is expressed in plain English, not in a
mathematical expression.
Remarkably, though the necessity of explicit causal notation is now recog-
nized by most leaders in the field, the use of such notation has remained
enigmatic to most rank-and-file researchers and its potentials still lay grossly
underutilized in the statistics-based sciences. The reason for this, I am firmly
convinced, can be traced to the way in which causal analysis has been pre-
sented to the research community, relying primarily on outdated paradigms
of controlled randomized experiments and black-box ‘‘missing-data’’ models
(Rubin, 1974; Holland, 1986).
The next section provides a conceptualization that overcomes these mental
barriers; it offers both a friendly mathematical machinery for cause–effect
analysis and a formal foundation for counterfactual analysis.
The Language of Diagrams and Structural Equations
Semantics: Causal Effects and Counterfactuals

How can one express mathematically the common understanding that symp-
toms do not cause diseases? The earliest attempt to formulate such a relation-
ship mathematically was made in the 1920s by the geneticist Sewall Wright
2. Clearly, P[Y = y|do(X = x)] is equivalent to P(Yx = y). This is what we normally assess in a
controlled experiment, with X randomized, in which the distribution of Y is estimated for each
level x of X.
3. These notational clues should be useful for detecting inadequate definitions of causal concepts;
any definition of confounding, randomization, or instrumental variables that is cast in standard
probability expressions, void of graphs, counterfactual subscripts, or do(*) operators, can safely be
discarded as inadequate.
(1921), who used a combination of equations and graphs. For example, if X

stands for a disease variable and Y stands for a certain symptom of the
disease, Wright would write a linear equation
y ¼ x þ u ð1Þ
where x stands for the level (or severity) of the disease, y stands for the level
(or severity) of the symptom, and u stands for all factors, other than the
disease in question, that could possibly affect Y.4 In interpreting this equa-
tion, one should think of a physical process whereby nature examines the
values of X and U and, accordingly, assigns variable Y the value y = x + u.
To express the directionality inherent in this process, Wright augmented
the equation with a diagram, later called a ‘‘path diagram,’’ in which arrows
are drawn from (perceived) causes to their (perceived) effects and, more
importantly, the absence of an arrow makes the empirical claim that the
value nature assigns to one variable is not determined by the value taken
by another.5
The variables V and U are called ‘‘exogenous’’; they represent observed or
unobserved background factors that the modeler decides to keep unexplained,
that is, factors that influence, but are not influenced by, the other variables
(called ‘‘endogenous’’) in the model.
If correlation is judged possible between two exogenous variables, U and
V, it is customary to connect them by a dashed double arrow, as shown in
Figure 3.1b.
To summarize, path diagrams encode causal assumptions via missing
arrows, representing claims of zero influence, and missing double arrows
(e.g., between V and U), representing the (causal) assumption Cot(U, V) = 0.
V U V U
x=v
y = βx + u
X β Y X β Y
(a) (b)
Figure 3.1 A simple structural equation model, and its associated diagrams.
Unobserved exogenous variables are connected by dashed arrows.
4. We use capital letters (e.g., X,Y,U) for variable names and lower case letters (e.g., x,y,u) for
values taken by these variables.
5. A weaker class of causal diagrams, known as ‘‘causal Bayesian networks,’’ encodes interven-
tional, rather than functional dependencies; it can be used to predict outcomes of randomized
experiments but not probabilities of counterfactuals (for formal definition, see Pearl, 2000a,
pp. 22–24).
W V U W V U
x0
Z X Y Z X Y
(a) (b)
Figure 3.2 (a) The diagram associated with the structural model of Eq. (2). (b) The
diagram associated with the modified model of Eq. (3), representing the intervention
do (X = x0).
The generalization to a nonlinear system of equations is straightforward.

For example, the nonparametric interpretation of the diagram of Figure 3.2a
corresponds to a set of three functions, each corresponding to one of the
observed variables:
z ¼ fZ ðwÞ
x ¼ fX ðz; Þ ð2Þ
y ¼ fY ðx; uÞ
where W, V, and U are here assumed to be jointly independent but, other-

wise, arbitrarily distributed.
Remarkably, unknown to most economists and philosophers, structural
equation models provide a formal interpretation and symbolic machinery
for analyzing counterfactual relationships of the type ‘‘Y would be y had X
been x in situation U = u,’’ denoted Yx(u) = y. Here, U stands for the vector
of all exogenous variables and represents all relevant features of an experi-
mental unit (i.e., a patient or a subject).
The key idea is to interpret the phrase ‘‘had X been x0’’ as an instruction
to modify the original model M and replace the equation for X by a constant,
x0, yielding a modified model, Mx0 :
z ¼ fZ ðwÞ
x ¼ x0 ð3Þ
y ¼ fY ðx; uÞ
the graphical description of which is shown in Figure 3.2b.

This replacement permits the constant x0 to differ from the actual value of
X—namely, fX(z, v)—without rendering the system of equations inconsistent,
thus yielding a formal definition of counterfactuals in multistage models,
where the dependent variable in one equation may be an independent vari-
able in another (Balke & Pearl, 1994a, 1994b; Pearl, 2000b). The general
definition reads as follows:
4
Yx ðuÞ ¼ YMx ðuÞ: ð4Þ
In words, the counterfactual Yx(u) in model M is defined as the solution

for Y in the modified submodel Mx, in which the equation for X is replaced
by X = x. For example, to compute the average causal effect of X on Y, that is,
EðYx0 Þ, we solve equation 3 for Y in terms of the exogenous variables, yield-
ing Yx0 ¼ fY ðx0 ; uÞ, and average over U and V. To answer more sophisticated
questions, such as whether Y would be y1 if X were x1 given that in fact Y is
y0 and X is x0, we need to compute the conditional probability,
PðYx1 ¼ y1 jY ¼ y0 ; X ¼ x0 Þ, which is well defined once we know the forms
of the structural equations and the distribution of the exogenous variables in
the model.
This formalization of counterfactuals, cast as solutions to modified sys-
tems of equations, provides the conceptual and formal link between struc-
tural equation models used in economics and social science, the potential-
outcome framework, to be discussed later under The Language of Potential
Outcomes; Lewis’ (1973) ‘‘closest-world’’ counterfactuals; Woodward’s (2003)
‘‘interventionalism’’ approach; Mackie’s (1965) ‘‘insufficient but necessary
components of unnecessary but sufficient’’ (INUS) condition; and
Rothman’s (1976) ‘‘sufficient component’’ framework (see VanderWeele and
Robins, 2007). The next section discusses two long-standing problems that
have been completely resolved in purely graphical terms, without delving into
algebraic techniques.
Confounding and Causal Effect Estimation

The central target of most studies in the social and health sciences is the
elucidation of cause–effect relationships among variables of interests, for
example, treatments, policies, preconditions, and outcomes. While good sta-
tisticians have always known that the elucidation of causal relationships from
observational studies must rest on assumptions about how the data were
generated, the relative roles of assumptions and data and the ways of
using those assumptions to eliminate confounding bias have been a subject
of much controversy. The preceding structural framework puts these contro-
versies to rest.
Covariate Selection: The Back-Door Criterion
Consider an observational study where we wish to find the effect of X on Y,

for example, treatment on response, and assume that the factors deemed
relevant to the problem are structured as in Figure 3.3; some are affecting
the response, some are affecting the treatment, and some are affecting both
treatment and response. Some of these factors may be unmeasurable, such as
genetic trait or lifestyle, while others are measurable, such as gender, age,
Z
1 Z
2
W1
Z W
3 2
X
W
3 Y
Figure 3.3 Graphical model illustrating the back-door criterion. Error terms are not
shown explicitly.
and salary level. Our problem is to select a subset of these factors for mea-
surement and adjustment so that if we compare treated vs. untreated subjects
having the same values of the selected factors, we get the correct treatment
effect in that subpopulation of subjects. Such a set of factors is called a
‘‘sufficient set,’’ ‘‘admissible’’ or a set ‘‘appropriate for adjustment.’’ The
problem of defining a sufficient set, let alone finding one, has baffled epide-
miologists and social scientists for decades (for review, see Greenland, Pearl,
& Robins, 1999; Pearl, 2000a, 2009a).
The following criterion, named the ‘‘back-door’’ criterion (Pearl, 1993a),
provides a graphical method of selecting such a set of factors for adjustment.
It states that a set, S, is appropriate for adjustment if two conditions hold:
1. No element of S is a descendant of X.
2. The elements of S ‘‘block’’ all back-door paths from X to Y, that is, all
paths that end with an arrow pointing to X.6
Based on this criterion we see, for example, that each of the sets {Z1, Z2,
Z3}, {Z1, Z3}, and {W2, Z3} is sufficient for adjustment because each blocks
all back-door paths between X and Y. The set {Z3}, however, is not sufficient
for adjustment because it does not block the path X W1 Z1 ! Z3 Z2
! W2 ! Y.
The implication of finding a sufficient set, S, is that stratifying on S is
guaranteed to remove all confounding bias relative to the causal effect of X
on Y. In other words, it renders the causal effect of X on Y identifiable, via
PðY ¼ yjdoðX ¼ xÞÞ

X
¼ PðY ¼ yjX ¼ x; S ¼ sÞPðS ¼ sÞ ð5Þ
s
6. In this criterion, a set, S, of nodes is said to block a path, P, if either (1) P contains at least one
arrow-emitting node that is in S or (2) P contains at least one collision node (e.g., ! Z ) that
is outside S and has no descendant in S (see Pearl, 2009b, pp. 16–17, 335–337).
Since all factors on the right-hand side of the equation are estimable (e.g.,
by regression) from the preinterventional data, the causal effect can likewise
be estimated from such data without bias.
The back-door criterion allows us to write equation 5 directly, after selecting
a sufficient set, S, from the diagram, without resorting to any algebraic manip-
ulation. The selection criterion can be applied systematically to diagrams of any
size and shape, thus freeing analysts from judging whether ‘‘X is conditionally
ignorable given S,’’ a formidable mental task required in the potential-response
framework (Rosenbaum & Rubin, 1983). The criterion also enables the analyst
to search for an optimal set of covariates—namely, a set, S, that minimizes
measurement cost or sampling variability (Tian, Paz, & Pearl, 1998).
General Control of Confounding
Adjusting for covariates is only one of many methods that permit us to

estimate causal effects in nonexperimental studies. A much more general
identification criterion is provided by the following theorem:
Theorem 1 (Tian & Pearl, 2002)

A sufficient condition for identifying the causal effect P[y|do(x)] is that
every path between X and any of its children traces at least one arrow
emanating from a measured variable.7
For example, if W3 is the only observed covariate in the model of Figure

3.3, then there exists no sufficient set for adjustment (because no set of
observed covariates can block the paths from X to Y through Z3), yet
P[y|do(x)] can nevertheless be estimated since every path from X to W3 (the
only child of X) traces either the arrow X ! W3, or the arrow W3 ! Y, each
emanating from a measured variable. In this example, the variable W3 acts as
a ‘‘mediating instrumental variable’’ (Pearl, 1993b; Chalak & White, 2006)
and yields the following estimand:
PðY ¼ yjdo ðX ¼ xÞÞ

X
¼ PðW3 ¼ wjdo ðX ¼ xÞÞPðY ¼ yjdo ðW3 ¼ wÞÞ ð6Þ
w3
X X
¼ PðwjxÞ Pð y j w; x 0 ÞPðx 0 Þ
w x
More recent results extend this theorem by (1) presenting a necessary

and sufficient condition for identification (Shpitser & Pearl, 2006) and
7. Before applying this criterion, one may delete from the causal graph all nodes that are not
ancestors of Y.
(2) extending the condition from causal effects to any counterfactual expres-
sion (Shpitser & Pearl, 2007). The corresponding unbiased estimands for
these causal quantities are readable directly from the diagram.
The Language of Potential Outcomes
The elementary object of analysis in the potential-outcome framework is the

unit-based response variable, denoted Yx(u)—read ‘‘the value that Y would
obtain in unit u had treatment X been x’’ (Neyman, 1923; Rubin, 1974).
These subscripted variables are treated as undefined quantities, useful for
expressing the causal quantities we seek, but are not derived from other
quantities in the model. In contrast, in the previous section counterfactual
entities were derived from a set of meaningful physical processes, each repre-
sented by an equation, and unit was interpreted a vector u of background
factors that characterize an experimental unit. Each structural equation
model thus provides a compact representation for a huge number of counter-
factual claims, guaranteed to be consistent.
In view of these features, the structural definition of Yx(u) (equation 4) can
be regarded as the formal basis for the potential-outcome approach. It inter-
prets the opaque English phrase ‘‘the value that Y would obtain in unit u had
X been x’’ in terms of a scientifically-based mathematical model that allows
such values to be computed unambiguously. Consequently, important con-
cepts in potential-response analysis that researchers find ill-defined or eso-
teric often obtain meaningful and natural interpretation in the structural
semantics. Examples are ‘‘unit’’ (‘‘exogenous variables’’ in structural semantics),
‘‘principal stratification’’ (‘‘equivalence classes’’ in structural semantics)
(Balke & Pearl, 1994b; Pearl, 2000b), ‘‘conditional ignorability’’ (‘‘back-door
condition’’ in Pearl, 1993a), and ‘‘assignment mechanism’’ [P(x|direct causes
of X) in structural semantics]. The next two subsections examine how
assumptions and inferences are handled in the potential-outcome approach
vis à vis the graphical–structural approach.
Formulating Assumptions
The distinct characteristic of the potential-outcome approach is that, although
its primitive objects are undefined, hypothetical quantities, the analysis itself
is conducted almost entirely within the axiomatic framework of probability
theory. This is accomplished by postulating a ‘‘super’’ probability function on
both hypothetical and real events, treating the former as ‘‘missing data.’’ In
other words, if U is treated as a random variable, then the value of the
counterfactual Yx(u) becomes a random variable as well, denoted as Yx.
The potential-outcome analysis proceeds by treating the observed distribution

P(x1 . . . xn) as the marginal distribution of an augmented probability func-
tion (P*) defined over both observed and counterfactual variables. Queries
about causal effects are phrased as queries about the probability distribution
of the counterfactual variable of interest, written P*(Yx = y). The new hypothe-
tical entities Yx are treated as ordinary random variables; for example, they are
assumed to obey the axioms of probability calculus, the laws of conditioning,
and the axioms of conditional independence. Moreover, these hypothetical
entities are not entirely whimsy but are assumed to be connected to observed
variables via consistency constraints (Robins, 1986) such as
X ¼ x ) Yx ¼ Y; ð7Þ
which states that for every u, if the actual value of X turns out to be x,
then the value that Y would take on if X were x is equal to the actual value of
Y. For example, a person who chose treatment x and recovered would also
have recovered if given treatment x by design.
The main conceptual difference between the two approaches is that,
whereas the structural approach views the subscript x as an operation that
changes the distribution but keeps the variables the same, the potential-
outcome approach views Yx to be a different variable, unobserved and loosely
connected to Y through relations such as equation 7.
Pearl (2000a, chap. 7) shows, using the structural interpretation of Yx(u),
that it is indeed legitimate to treat counterfactuals as jointly distributed
random variables in all respects, that consistency constraints like equation
7 are automatically satisfied in the structural interpretation, and, moreover,
that investigators need not be concerned about any additional constraints
except the following two:8
Yyz ¼ y for all y and z ð8Þ

Xz ¼ x ) Yxz ¼ Yz for all x and z ð9Þ
Equation 8 ensures that the intervention do(Y = y) results in the condition
Y = y, regardless of concurrent interventions, say do(Z = z), that are applied to
variables other than Y. Equation 9 generalizes equation 7 to cases where Z is
held fixed at z.
To communicate substantive causal knowledge, the potential-outcome ana-
lyst must express causal assumptions as constraints on P*, usually in the
8. This completeness result is due to Halpern (1998), who noted that an additional axiom
fYxz ¼ yg & fZxy ¼ zg ) Yx ¼ y

must hold in nonrecursive models. This fundamental axiom may come to haunt economists and
social scientists who blindly apply Neyman-Rubin analysis in their fields.
form of conditional independence assertions involving counterfactual vari-

ables. In Figure 3.2(a), for instance, to communicate the understanding
that a treatment assignment (Z) is randomized (hence independent of both
U and V), the potential-outcome analyst needs to use the independence con-
straint Z ?? fXz ; Yx g. To further formulate the understanding that Z does not
affect Y directly, except through X, the analyst would write a so-called exclu-
sion restriction: Yxz = Yx. Clearly, no mortal can judge the validity of such
assumptions in any real-life problem without resorting to graphs.9
Performing Inferences
A collection of assumptions of this type might sometimes be sufficient to
permit a unique solution to the query of interest; in other cases, only bounds
on the solution can be obtained. For example, if one can plausibly assume
that a set, Z, of covariates satisfies the conditional independence
Yx ?? XjZ ð10Þ
(an assumption that was termed ‘‘conditional ignorability’’ by Rosenbaum &

Rubin, 1983), then the causal effect, P*(Yx = y), can readily be evaluated to
yield
X
P ðYx ¼ yÞ ¼ P ðYx ¼ yjzÞPðzÞ
z
X
¼ P ðYx ¼ yjx; zÞPðzÞ ðusing ð10ÞÞ
z
X
¼ P ðY ¼ yjx; zÞPðzÞ ðusing ð7ÞÞ
z
X
¼ Pð yjx; zÞPðzÞ:
z
which is the usual covariate-adjustment formula, as in equation 5.

Note that almost all mathematical operations in this derivation are con-
ducted within the safe confines of probability calculus. Save for an occasional
application of rule 9 or 7, the analyst may forget that Yx stands for a counter-
factual quantity—it is treated as any other random variable, and the entire
derivation follows the course of routine probability exercises.
However, this mathematical illusion comes at the expense of conceptual
clarity, especially at a stage where causal assumptions need to be formulated.
The reader may appreciate this aspect by attempting to judge whether the
assumption of conditional ignorability (equation 10), the key to the derivation
9. Even with the use of graphs the task is not easy; for example, the reader should try to verify
whether fZ ?? Xz jYg holds in the simple model of Figure 3.2(a). The answer is given in Pearl
(2000a, p. 214).
of equation 11, holds in any familiar situation—say, in the experimental

setup of Figure 3.2(a). This assumption reads ‘‘the value that Y would
obtain had X been x is independent of X, given Z’’ (see footnote 4). Such
assumptions of conditional independence among counterfactual variables are
not straightforward to comprehend or ascertain for they are cast in a lan-
guage far removed from ordinary understanding of cause and effect. When
counterfactual variables are not viewed as by-products of a deeper, process-
based model, it is also hard to ascertain whether all relevant counterfactual
independence judgments have been articulated, whether the judgments
articulated are redundant, or whether those judgments are self-consistent.
The need to express, defend, and manage formidable counterfactual rela-
tionships of this type explains the slow acceptance of causal analysis among
epidemiologists and statisticians and why economists and social scientists
continue to use structural equation models instead of the potential-outcome
alternatives advocated in Holland (1988); Angrist, Imbens, and Rubin (1996);
and Sobel (1998).
On the other hand, the algebraic machinery offered by the potential-out-
come notation, once a problem is properly formalized, can be powerful in
refining assumptions (Angrist et al., 1996), deriving consistent estimands
(Robins, 1986), analyzing mediation (Pearl, 2001), bounding probabilities of
causation (Tian & Pearl, 2000), and combining data from experimental and
nonexperimental studies (Pearl, 2000a, pp. 302–303).
Combining Graphs and Counterfactuals—The Mediation Formula

Pearl (2000a, p. 232) presents a way of combining the best features of the two
approaches. It is based on encoding causal assumptions in the language of
diagrams, translating these assumptions into potential-outcome notation,
performing the mathematics in the algebraic language of counterfactuals,
and, finally, interpreting the result in plain causal language. Often, the
answer desired can be obtained directly from the diagram, and no translation
is necessary (as demonstrated earlier, Confounding and Causal Effect
Estimation).
One area that has benefited substantially from this symbiosis is the ana-
lysis of direct and indirect effects, also known as ‘‘mediation analysis’’
(Shrout & Bolger, 2002), which has resisted generalizations to discrete vari-
ables and nonlinear interactions for several decades (Robins & Greenland,
1992; Mackinnon, Lockwood, Brown, Wang, & Hoffman, 2007). The obstacles
were definitional; the direct effect is sensitive to the level at which we con-
dition the intermediate variable, while the indirect effect cannot be defined by
conditioning on a third variable or taking the difference between the total and
direct effects.
The structural definition of counterfactuals (equation 4) and the graphical

analysis (see Confounding and Causal Effect Estimation) combined to pro-
duce formal definitions of, and graphical conditions under which, direct and
indirect effects can be estimated from data (Pearl, 2001; Petersen, Sinisi, &
van der Laan, 2006). In particular, under conditions of no unmeasured
(or uncontrolled for) confounders, this symbiosis has produced the following
Mediation Formulas for the expected direct (DE) and indirect (IE) effects of
the transition from X = x to X = x’ (with outcome Y and mediating set Z):
X
DE ¼ ½EðYjx 0 ; zÞ EðYjx; zÞPðzjxÞ: ð12Þ
z
X
IE ¼ EðYjx; zÞ½Pðzjx 0 Þ PðzjxÞ ð13Þ
z
These general formulas are applicable to any type of variables,10 any non-
linear interactions, and any distribution and, moreover, are readily estimable
by regression. IE (respectively, DE) represents the average increase in the
outcome Y that the transition from X = x to X = x’ is expected to produce
absent any direct (respectively, indirect) effect of X on Y. When the outcome Y
is binary (e.g., recovery or hiring), the ratio (1 – IE/TE) represents the frac-
tion of responding individuals who owe their response to direct paths, while
(1 – DE/TE) represents the fraction who owe their response to Z-mediated
paths. TE stands for the total effect, TE = E(Y|x’) – E(Y|x), which, in non-
linear systems may or may not be the sum of the direct and indirect effects.
Additional results spawned by the structural–graphical–counterfactual
symbiosis include effect estimation under noncompliance (Balke & Pearl,
1997; Chickering & Pearl, 1997), mediating instrumental variables (Pearl,
1993b; Brito & Pearl, 2006), robustness analysis (Pearl, 2004), selecting pre-
dictors for propensity scores (Pearl, 2010a, 2010c), and estimating the effect
of treatment on the treated (Shpitser & Pearl, 2009). Detailed descriptions of
these results are given in the corresponding articles (available at http://
bayes.cs.ucla.edu/csl_papers.html).
Conclusions
Statistics is strong in devising ways of describing data and inferring distribu-

tional parameters from a sample. Causal inference require two additional
10. Integrals should replace summations when Z is continuous. Generalizations to cases invol-
ving observed or unobserved confounders are given in Pearl (2001) and exemplified in Pearl
(2010a, 2010b). Conceptually, IE measures the average change in Y under the operation of
setting X to x and, simultaneously, setting Z to whatever value it would have obtained under X
= x’ (Robins & Greenland, 1992).
ingredients: a science-friendly language for articulating causal knowledge and

a mathematical machinery for processing that knowledge, combining it with
data, and drawing new causal conclusions about a phenomenon. This chapter
introduces nonparametric structural equation models as a formal and mean-
ingful language for formulating causal knowledge and for explicating causal
concepts used in scientific discourse. These include randomization, interven-
tion, direct and indirect effects, confounding, counterfactuals, and attribution.
The algebraic component of the structural language coincides with the poten-
tial-outcome framework, and its graphical component embraces Wright’s
method of path diagrams (in its nonparametric version). When unified and
synthesized, the two components offer investigators a powerful methodology
for empirical research (e.g., Morgan & Winship, 2007; Greenland et al., 1999;
Glymour & Greenland, 2008; Chalak & White, 2006; Pearl, 2009a).
Perhaps the most important message of the discussion and methods pre-
sented in this chapter would be a widespread awareness that (1) all studies
concerning causal relations must begin with causal assumptions of some sort
and (2) a friendly and formal language is currently available for articulating
such assumptions. This means that scientific articles concerning questions of
causation must contain a section in which causal assumptions are articulated
using either graphs or subscripted formulas. Authors who wish their assump-
tions to be understood, scrutinized, and discussed by readers and colleagues
would do well to use graphs. Authors who refrain from using graphs would
be risking a suspicion of attempting to avoid transparency of their working
assumptions.
Another important implication is that every causal inquiry can be mathema-
tized. In other words, mechanical procedures can now be invoked to determine
what assumptions investigators must be willing to make in order for desired
quantities to be estimable consistently from the data. This is not to say that
the needed assumptions would be reasonable or that the resulting estimation
method would be easy. It means that the needed causal assumptions can be
made transparent and brought up for discussion and refinement and that,
once consistency is assured, causal quantities can be estimated from data
through ordinary statistical methods, free of the mystical aura that has
shrouded causal analysis in the past.
References
Angrist, J., Imbens, G., & Rubin, D. (1996). Identification of causal effects using
instrumental variables (with comments). Journal of the American Statistical
Association, 91(434), 444–472.
Balke, A., & Pearl, J. (1994a). Counterfactual probabilities: Computational methods,
bounds, and applications. In R. L. de Mantaras and D. Poole (Eds.), Proceedings of the
Tenth Conference on Uncertainty in Artificial Intelligence (pp. 46–54). San Mateo, CA:
Morgan Kaufmann.
Balke, A., & Pearl, J. (1994b). Probabilistic evaluation of counterfactual queries.
In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. I,
pp. 230–237). Menlo Park, CA: MIT Press.
Balke, A., & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect
compliance. Journal of the American Statistical Association, 92(439), 1172–1176.
Brito, C., & Pearl, J. (2006). Graphical condition for identification in recursive SEM.
In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence
(pp. 47–54). Corvallis, OR: AUAI Press.
Chalak, K., & White, H. (2006). An extended class of instrumental variables for the
estimation of causal effects (Tech. Rep. Discuss. Paper). San Diego: University of
California, San Diego, Department of Economics.
Chickering, D., & Pearl, J. (1997). A clinician’s tool for analyzing non-compliance.
Computing Science and Statistics, 29(2):424–431.
Cox, D. (1958). The Planning of Experiments. New York: John Wiley & Sons.
Glymour, M., & Greenland, S. (2008). Causal diagrams. In K. Rothman, S. Greenland,
and T. Lash (Eds.), Modern Epidemiology (3rd ed., pp. 183–209). Philadelphia:
Lippincott Williams & Wilkins.
Greenland, S., Pearl, J., & Robins, J. (1999). Causal diagrams for epidemiologic
research. Epidemiology, 10(1):37–48.
Halpern, J. (1998). Axiomatizing causal reasoning. In G. Cooper and S. Moral (Eds.),
Uncertainty in Artificial Intelligence (pp. 202–210). San Francisco: Morgan Kaufmann.
(Reprinted in Journal of Artificial Intelligence Research, 12, 17–37, 2000.)
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical
Association, 81(396), 945–960.
Holland, P. (1988). Causal inference, path analysis, and recursive structural equations
models. In C. Clogg (Ed.), Sociological Methodology (pp. 449–484). Washington, DC:
American Sociological Association.
Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press.
Mackie, J. (1965). Causes and conditions. American Philosophical Quarterly, 2/4:261–264.
(Reprinted in E. Sosa and M. Tooley [Eds.], Causation. Oxford: Oxford University
Press, 1993.)
MacKinnon, D., Lockwood, C., Brown, C., Wang, W., & Hoffman, J. (2007).
The intermediate endpoint effect in logistic and probit regression. Clinical Trials,
4, 499–513.
Morgan, S., & Winship, C. (2007). Counterfactuals and Causal Inference: Methods and
Principles for Social Research (Analytical Methods for Social Research). New York:
Neyman, J. (1923). On the application of probability theory to agricultural experiments.
Essay on principles. Statistical Science, 5(4), 465–480.
Pearl, J. (1993a). Comment: Graphical models, causality, and intervention. Statistical
Science, 8(3), 266–269.
Pearl, J. (1993b). Mediating instrumental variables (Tech. Rep. No. TR-210). Los
Angeles: University of California, Los Angeles, Department of Computer Science.
http://ftp.cs.ucla.edu/pub/stat_ser/R210.pdf.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–710.
Pearl, J. (2000a). Causality: Models, Reasoning, and Inference. New York: Cambridge
University Press.
Pearl, J. (2000b). Comment on A. P. Dawid’s, Causal inference without counterfac-

tuals. Journal of the American Statistical Association, 95(450), 428–431.
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference
on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan
Kaufmann.
Pearl, J. (2004). Robustness of causal claims. In M. Chickering and J. Halpern (Eds.),
Proceedings of the Twentieth Conference Uncertainty in Artificial Intelligence (pp. 446–
453). Arlington, VA: AUAI Press.
Pearl, J. (2009a). Causal inference in statistics: An overview. Statistics Surveys, 3,
96–146. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf.
Pearl, J. (2009b). Causality: Models, Reasoning, and Inference. New York: Cambridge
University Press, 2nd edition.
Pearl, J. (2009c). Remarks on the method of propensity scores. Statistics in Medicine,
28, 1415–1416. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r345-sim.pdf.
Pearl, J. (2010a). The foundation of causal inference. (Tech. Rep. No. R-355). Los
Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/stat_ser/
r355.pdf. Forthcoming, Sociological Methodology.
Pearl, J. (2010b). The mediation formula: A guide to the assessment of causal path-
ways in non-linear models. (Tech. Rep. No. R-363). Los Angeles: University of
California, Los Angeles, http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf.
Pearl, J. (2010c). On a class of bias-amplifying variables that endanger effect estimates.
(In P. Grunwald and P. Spirtes (Eds.), Proceedings of the Twenty-Sixth Conference on
Uncertainty in Artificial Intelligence (pp. 425–432). Corvallis, OR: AUAI. http://
ftp.cs.ucla.edu/pub/stat_ser/r356.pdf.
Petersen, M., Sinisi, S., & van der Laan, M. (2006). Estimation of direct causal effects.
Epidemiology, 17(3), 276–284.
Robins, J. (1986). A new approach to causal inference in mortality studies with a
sustained exposure period—applications to control of the healthy workers survivor
Robins, J., & Greenland, S. (1992). Identifiability and exchangeability for direct and
indirect effects. Epidemiology, 3(2), 143–155.
Rosenbaum, P., & Rubin, D. (1983). The central role of propensity score in observa-
tional studies for causal effects. Biometrika, 70, 41–55.
Rothman, K. (1976). Causes. American Journal of Epidemiology, 104, 587–592.
Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonran-
domized studies. Journal of Educational Psychology, 66, 688–701.
Shpitser, I., & Pearl, J. (2006). Identification of joint interventional distributions in
recursive semi-Markovian causal models. In Proceedings of the Twenty-First National
Conference on Artificial Intelligence (pp. 1219–1226). Menlo Park, CA: AAAI Press.
Shpitser, I., & Pearl, J. (2007). What counterfactuals can be tested. In Proceedings of
the Twenty-Third Conference on Uncertainty in Artificial Intelligence (pp. 352–359).
Vancouver, Canada: AUAI Press. (Reprinted in Journal of Machine Learning
Research, 9, 1941–1979, 2008.)
Shpitser, I., & Pearl, J. (2009). Effects of treatment on the treated: Identification and
generalization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence. Montreal: AUAI Press.
Shrout, P., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies:
New procedures and recommendations. Psychological Methods, 7(4), 422–445.
Sobel, M. (1998). Causal inference in statistical models of the process of socioeco-

nomic achievement. Sociological Methods & Research, 27(2), 318–348.
Tian, J., Paz, A., & Pearl, J. (1998). Finding minimal separating sets (Tech. Rep. No. R-254).
Los Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/
stat_ser/r254.pdf.
Tian, J., & Pearl, J. (2000). Probabilities of causation: Bounds and identification. Annals
of Mathematics and Artificial Intelligence, 28, 287–313.
Tian, J., & Pearl, J. (2002). A general identification condition for causal effects. In
Proceedings of the Eighteenth National Conference on Artificial Intelligence (pp. 567–573).
Menlo Park, CA: AAAI Press/MIT Press.
VanderWeele, T., & Robins, J. (2007). Four types of effect modification: A classification
based on directed acyclic graphs. Epidemiology, 18(5), 561–568.
Woodward, J. (2003). Making Things Happen. New York: Oxford University Press.
Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20,
557–585.
4
Causal Thinking in Psychiatry

A Genetic and Manipulationist Perspective
kenneth s. kendler
A large and daunting philosophical literature exists on the nature and mean-
ing of causality. Add to that the extensive discussions in the statistical litera-
ture about what it means to claim that C causes E, and it can be
overwhelming for the scientists, who, after all, are typically just seeking
guidelines about how to conduct and analyze their research. Add to this
mix the inherent problems in psychiatry—which examines an extraordinarily
wide array of potential causal processes from molecules to minds and socie-
ties, some of which permit experimental manipulation but many of which do
not—and you can readily see the sense of frustration and, indeed, futility
with which this issue might be addressed.
In the first section of this chapter, I reflect on two rather practical aspects
of causal inference that I have confronted in my research career in psychia-
tric genetics. The first of these is what philosophers call a ‘‘brute fact’’ of our
world—the unidirectional causal relationship between variation in genomic
DNA and phenotype. The second is the co-twin–control method—a nice
example of trying to use twins as a ‘‘natural experiment’’ to clarify causal
processes when controlled trials are infeasible.
In the second section, I briefly outline and advocate for a particular
approach to causal inference developed by Jim Woodward (2003) that I
term ‘‘interventionism.’’ I argue that this approach is especially well suited
to the needs of our unusual field of psychiatry.
Two Practical Aspects of Causal Inference in Psychiatric

Genetics Research
I often teach students that it is almost too easy in psychiatric research to

show that putative risk factors correlate with outcomes. It is much harder to
66
determine if that relationship is a causal one. Indeed, assuming that for

practical or ethical reasons a randomized trial of exposure to the risk factor
is not feasible, one must rely on observational data. In these instances, it can
be ‘‘damn near impossible’’ to confidently infer causation. However, in this
mire of casual uncertainty, it is interesting to note that one relationship
stands out in its causal clarity: the relationship between variation in germline
DNA (gDNA) and phenotypes.
It did not have to be this way. Indeed, folk wisdom has long considered
the inheritance of acquired characteristics (which implies a phenotype !
gDNA relationship, interpreted as phenotype causes gDNA) to be a plausible
mechanism of heredity. In the eighteenth century, this concept (of the inheri-
tance of acquired characteristics) was most closely associated with the name
of Lamark. In the twentieth century, due to an unfortunate admixture of bad
science and repressive politics, this same process came to dominate Soviet
biology through the efforts of Lysenko.
However, acquired characteristics are not inherited through changes in
our DNA sequence. Rather, the form of life of which we are a product evolved
in such a manner as to render the sequence of gDNA relatively privileged
and protected. Therefore, causal relationships between genes and phenotypes
are, of necessity, unidirectional. To put this more crudely, genes can influence
phenotypes but phenotypes cannot influence genes.
I do not mean to imply that gDNA never varies. It is subject to a range of
random features, from point mutations to insertions of transposons and
slippage of replication machinery leading to deletions and duplications.
However, I am unaware of any widely accepted claims that such changes
in gDNA can occur in a systematic and directed fashion so as to establish
a true phenotype ! gDNA causal pathway.
Let me be a bit more precise. Assume that our predictor variable is a measure
of genetic variation (GV). This can be either latent—as it might be if we are
studying twins or adoptees—or observed—if we are directly examining varia-
tion in genomic sequence (e.g., via single nucleotide polymorphisms). Assume
our dependent measure is the liability toward having a particular psychiatric
disease, which we will term ‘‘risk.’’ We can then assume that GV causes risk:
GV ! Risk
Also, we can be certain that risk does not cause GV:
Risk ! GV
This claim is specific and limited. It does not apply to other features of our
genetic machinery such as gene expression—the product of genes at the level
of either mRNA or protein—and epigenetic modifications of DNA. Our

expression levels can be exquisitely sensitive to environmental effects.
There is also evidence that environmental factors can alter methylation of
DNA. Thus, it is not true that in all of genetics research causal relationships
are unambiguous. If, however, we consider only the sequence of gDNA, it is,
I argue, a little corner of causal clarity that we should cherish.
The practical consequence of these unidirectional causal relationships does
not end with the simple bivariate relationship noted between gDNA and
phenotype. For example, using structural equation modeling, far more ela-
borate models can be developed that involve multiple phenotypes, develop-
mental pathways, gene–environment interaction, differences in gene
expression by sex or cohort, and gene–environment correlation. Under
some situations (e.g., van den Oord & Snieder, 2002), including genetic
effects can help to clarify other causal relationships. For example, if two
disorders (A and B) are comorbid, the identification of genetic risk factors
should help to determine whether this comorbidity results from shared risk
factors like genes or a phenotypic pathway in which developing disorder
A directly contributes to the risk of developing disorder B.
While theoretically clear, it is unfortunately not possible to study such
gene to phenotype pathways in the real world without introducing other
assumptions. For example, if we are studying a population which contains
two subpopulations that differ in frequency of both gene and disease, stan-
dard case–control association studies can artifactually produce significant
findings in the absence of any true gDNA-to-phenotype relationship. The
ability to infer the action of genetic risk factors in twin studies is based on
the equal-environment assumption as well as assumptions about assortative
mating and the relationship of additive to nonadditive genetic effects.
The bottom-line message here is a simple one: The world in which we live
is often causally ambiguous; this cannot be better demonstrated than in
many areas of psychiatric research. Because of the way our life forms have
evolved, gDNA is highly protected. Our bodies work very hard to ensure that
nothing, including our own behavior or environmental experiences, messes
with our gDNA. This quirk in our biology gives us causal purchase that we
might not otherwise have. We should take this gift from nature, grasp it
hard, and use it for all it is worth.
The Co-Twin–Control Method
One common approach to understanding the causal relationship between a

putative risk factor and a disease is to match individuals on as many variables
as possible except that one group has been exposed to the putative risk factor
and the other group has not. If the exposed group has a higher rate of disease,
then we can argue on this basis that the risk factor truly causes the disease.
While intuitively appealing, this common nonexperimental approach—like
many in epidemiology—has a key point of vulnerability. While it may be that
the risk factor causes the disease, it is also possible that a set of ‘‘third
variables’’ predispose to both the risk factor and the disease. Such a case
will produce a noncausal risk factor–disease association.
This is a particular problem in psychiatric epidemiology because so many
exposures of interest—stressful life events, social support, educational status,
social class—are themselves complex and the result not only of the environ-
ment (with causal effects flowing from environment to person) but also of
the actions of human beings themselves (with causal effects flowing from
people to their environment) (Kendler & Prescott, 2006). As humans, we
actively create our own environments, and this activity is substantially influ-
enced by our genes (Kendler & Baker, 2007). Thus, for behavioral traits, our
phenotypes quite literally extend far beyond our skin.
Can we use genetically informative designs to get any purchase on these
possible confounds? Sometimes. Let me describe one such method—the co-
twin–control design. I will first illustrate its potential utility with one example
and then describe a critical limitation.
A full co-twin–control design involves the comparison of the association
between a risk factor and an outcome in three samples: (1) an unselected
population, (2) dizygotic (DZ) twin pairs discordant for exposure to the risk
factor, and (3) monozygotic (MZ) pairs discordant for exposure to the risk
factor (Kendler et al., 1993). Three possible different patterns of results are
illustrated in Figure 4.1. The results on the left side of the figure show the
4
All subjects
DZ pairs
MZ pairs
3
OR 2
1.0
1
All Causal Partly Non-causal - Non-causal - All
Genetic Genetic
Figure 4.1 Interpretation of Results Obtained from Studies Using a Cotwin-Control

Design.
pattern that would be expected if the risk factor–outcome association was

truly causal (note: this assumes that no other environmental confounding
is present). Controlling for family background or genetic factors would,
within the limits of statistical fluctuation, make no difference to the estimate
of the association.
The middle set of results in Figure 4.1 shows an example where part of
the risk factor–outcome association is due to genetic factors that influence
both the risk factor and the outcome. Here, the association is strongest in the
entire sample (where genetic and causal effects are entirely confounded as we
control for neither genetic nor shared environmental factors), intermediate
among discordant DZ twins (where we control for shared environmental
factors and partly for genetic background), and lowest among discordant
MZ pairs (where we control entirely for both shared environmental and
genetic backgrounds). The degree to which the association declines from
the entire sample to discordant MZ pairs is a rough measure of the propor-
tion of the association that is noncausal and the result of shared genetic
influences on the risk factor and the outcome.
The results on the right side of Figure 4.1 show the extreme case, where
all of the risk factor–outcome association is due to shared genetic effects and
the risk factor has no real causal effect on the outcome. Thus, within dis-
cordant MZ pairs there will be no association between the risk factor and the
outcome (i.e., an odds ratio [OR] of 1.0), and the association within discor-
dant DZ pairs would be expected to be roughly midway between 1 and the
value for the entire sample.
Let me illustrate this method with a striking real-world example taken
from research conducted primarily with my long-term colleague, Dr. Carol
Prescott (Kendler & Prescott, 2006; Prescott & Kendler, 1999). In general
population samples, early age at drinking onset (the risk factor of interest)
has been consistently associated with an increased risk for developing
alcohol-use disorders (Grant & Dawson, 1997). The prevalence of alcohol-
use disorders among individuals who first try alcohol before age 15 is
as high as 50% in some studies. Several studies reporting this effect inter-
preted it to be a causal one—that early drinking directly produces an
increased risk for later alcohol problems. On the basis of this interpretation,
calls have been made to delay the age at first drink among early adolescents
as a means of decreasing risk for adult alcohol problems (Pedersen &
Skrondal, 1998).
This risk factor–outcome relationship, however, need not be a causal one.
For example, early drinking could be one manifestation of a broad liability to
deviance which might be evident in a host of problem behaviors, such as use
of illicit substances, antisocial behavior, and adult alcoholism (Jessor & Jessor,
1977). If this were the case, delaying the first exposure to alcohol use would
not alter the underlying liability to adolescent problem behavior or to adult

alcoholism.
Using data from the Virginia Adult Twin Study of Psychiatric and
Substance Use Disorders, we tested these two hypotheses about why early
drinking correlates with alcoholism. The results are depicted in Figure 4.2.
As in prior studies, we found a strong association between lifetime preva-
lence of alcoholism and age at first drink among both males and females
(Prescott & Kendler, 1999). As shown in Figure 4.2, males who began drink-
ing before age 15 were twice as likely (OR = 2.0) to develop Diagnostic and
Statistical Manual of Mental Disorders (fourth edition) alcohol dependence
(AD) as those who did not drink early. The association for females was
even more dramatic: Early drinkers were more than four times as likely to
develop AD as other women.
The data to test causality come from twin pairs who were discordant for
early drinking. Under the causal hypothesis, we would expect that the twins
with earlier drinking onset would have a higher risk for alcoholism than their
later-drinking co-twins and that the same pattern would hold for MZ and DZ
pairs. However, if early age at drinking is just an index of general deviance
which influences (among other things) risk of developing alcoholism, we
would expect that the prevalence would be similar for members of MZ dis-
cordant-onset pairs. The ‘‘unexposed’’ twins (with a later onset of drinking)
would share their co-twins’ risk for behavioral deviance and, thus, have a
higher risk for alcoholism than the pairs in which neither twin drank
early. The pattern observed in MZ vs. DZ discordant-onset pairs tells us to
what degree familial resemblance for behavioral deviance is due to shared
environmental vs. genetic factors. If it is due to shared environmental
5
All subjects
DZ pairs
4 MZ pairs
3
OR
1
Males Females
Figure 4.2 Odds Ratios from Cotwin-Control Analyses of the Association between
Drinking Before Age 15 and Alcohol Dependence.
factors, the risk for alcoholism among the unexposed twins from DZ discordant-
onset pairs would be expected to be the same as that in the MZ pairs.
However, if familial resemblance for deviance is due to genetic factors, the
risk for alcoholism in an unexposed individual would be lower among DZ
than MZ pairs.
As shown in Figure 4.2, the twin pair resemblance was inconsistent with
the causal hypothesis. Instead, the results suggested that early drinking and
later alcoholism are both the result of a shared genetic liability. For example,
among the 213 male and 69 female MZ pairs who were discordant for early
drinking, there was only a slight difference in the prevalence of AD between
the twins who drank early and the co-twins who did not. The ORs were 1.1
for both sexes and were not statistically different from the 1.0 value predicted
by the noncausal model for MZ pairs. The ORs for the DZ pairs were
midway between those of the MZ pairs and the general sample, indicating
that the source of the familial liability is genetic rather than environmental.
I am not claiming that these results are definitive, and they certainly
require replication. It is frankly unlikely that early onset of alcohol consump-
tion has no impact on subsequent risk for problem drinking. Surely, however,
these results should give pause to those who want to stamp out alcohol
problems by restricting the access of adolescents to alcohol and suggest
that non-casual processes might explain at least some of the early drink-
ing–later alcoholism relationship. For those interested in other psychiatric
applications of the co-twin–control method, our group has applied it to clarify
the association between smoking and major depression (Kendler et al., 1993),
stressful life events and major depression (Kendler, Karkowski, & Prescott,
1999), and childhood sexual abuse and a range of psychiatric outcomes
(Kendler et al., 2000).
Lest you leap to the conclusion that this method is a panacea for our
problems of causal inference, I have some bad news. The co-twin–control
method is asymmetric with regard to the causal clarity of its results. Studies
in which MZ twins discordant for risk-factor exposure have equal rates of the
disease can, I think, permit the rather strong inference that the risk factor–
disease association is not causal. However, if in MZ twins discordant for risk-
factor exposure the exposed twin has a significantly higher risk of illness than
the unexposed twin, it is not possible to infer with such confidence that the
risk factor–disease association is causal. This is because in the typical design
it is not possible to rule out the potential that some unique environmental
event not shared with the co-twin produced both the risk factor and the
disease.
For example, imagine we are studying the relationship between early
conduct disorder and later drug dependence. Assume further that we find
many MZ twin pairs of the following type: the conduct-disordered twin
(twin A) develops drug dependence, while the nondisordered co-twin (twin B)

does not. We might wish to argue that this strongly proves the causal path
from conduct disorder to later drug dependence. Alas, it is not so simple. It
is perfectly possible that twin A had some prior environmental trauma not
shared with twin B (an obstetric complication or a fall off a bicycle with a
resulting head injury) that predisposed to both conduct disorder and drug
dependence. While the MZ co-twin–control design excludes the possibility
that a common set of genes or a class of shared environmental experiences
predisposes to both risk factor and outcome, it cannot exclude the possibility
that a ‘‘third factor’’ environmental experience unique to each twin plays a
confounding role.
Genetic strategies can occasionally provide some traction on issues of
causation in psychiatric epidemiology that might be otherwise difficult to
establish. In addition to the co-twin–control method, genetic randomization
is another potentially powerful natural experiment that relies on using genes
as instrumental variables (again taking advantage of the causal asymmetry
between genotype and phenotype) (Davey-Smith, 2006; also see Chapter 9).
Neither of these methods can entirely substitute for controlled trials, but for
many interesting questions in psychiatry such an approach is either imprac-
tical or unethical. These methods are far from panaceas, but they may
be underused by some in our field who are prone to slip too easily from
correlative to causal language.
Interventionism as an Approach to Causality

Well-Suited for Psychiatry
I have been reading for some years in the philosophy of science (and a bit in
metaphysics) about approaches to causation and explanation. For understand-
able reasons, this is an area often underutilized by psychiatric researchers. I
am particularly interested in the question of what general approach to caus-
ality is most appropriate for the science of psychiatry, which itself is a hybrid
of the biological, psychological, and sociological sciences.
First, I would argue that the deductive-nomological approach emerging
from the logical positivist movement is poorly suited to psychiatric research.
This position—which sees true explanation as being deduced from general
laws as applied to specific situations—may have its applications in physics.
However, psychiatry lacks the broad and deep laws that are seen at the core of
physics. Many, myself included, doubt that psychiatry will ever have laws of
the wide applicability of general relativity or quantum mechanics. It is simply
not, I suggest, in the nature of our discipline to have such powerful and
simple explanations. A further critical limitation of this approach for
psychiatry, much discussed in the literature, is that it does a poor job at the
critical discrimination between causation and correlation—which I consider a
central problem for our field. The famous example that is most commonly
used here is of flagpoles and shadows. Geometric laws can equally predict the
length of a shadow from the height of a flagpole or the height of the flagpole
from the length of the shadow. However, only one of these two relationships
is causally sensible.
Second, while a mechanistic approach to causation is initially intuitively
appealing, it is also ill-suited as a general approach for our field. By a ‘‘mechan-
istic approach,’’ I mean the idea that causation is best understood as the result
of some direct physical contact, a spatiotemporal process, which involves the
transfer of some process or energy from one object to another. One might
think of this as the billiard ball model of causality—that satisfying click we
hear when our cue ball knocks against another ball, sending it, we hope, into
the designated pocket. How might this idea apply to psychiatric phenomena?
Consider the empirical observation that the rate of suicide declined dra-
matically in England in the weeks after September 11, 2001 (9/11) (Salib,
2003). How would a mechanistic model approach this causal process? It
would search for the specific nature of the spatiotemporal processes that
connected the events of 9/11 in the United States to people in England.
For example, it would have to determine the extent to which information
about the events of 9/11 were conveyed to the English population through
radio, television, e-mail, word of mouth, and newspapers. Then, it would
have to trace the physical processes whereby this news influenced the
needed brain pathways, etc. I am not a Cartesian dualist, so do not misunder-
stand me here. I am not suggesting that in some ultimate way physical
processes were not needed to explain why the suicide rate declined in
England in September 2001. Instead, perhaps time spent figuring out the
physical means by which news of 9/11 arrived in England is the wrong level
on which to understand this process. Mechanistic models fail for psychiatry
for the same reasons that hard reductionist models fail. Critical causal pro-
cesses in psychiatric illnesses exist at multiple levels, only some of which are
best understood at a physical–mechanical level.
A third approach is the interventionist model (IM), which evolved out of
the counterfactual approach to causation. The two perspectives share the
fundamental idea that in thinking about causation we are ultimately asking
questions about what would have happened if things had been different.
While some counterfactual literature discusses issues around closest parallel
worlds, the IM approach is a good deal more general and can be considered
‘‘down to earth.’’
What is the essence of the IM? Consider a simple, idealized case.
Suppose we want to determine whether stress (S) increases the risk for
major depression (MD). The ‘‘ideal experiment’’ here would be the unethical
one in which, in a given population, we randomly intervene on indivi-
duals, exposing them to a stressful experience such as severe public humili-
ation (H). This experience increases their level of S, and we heartlessly
observe if they subsequently suffer from an increased incidence of MD.
Our design is
H intervenes on S ! MD
Thus, we are assuming that intervention on S will make a difference to

risk for MD. For this to work, according to the IM, the intervention must
meet several conditions (for more details, see Pearl, 2001; Woodward &
Hitchcock, 2003). We will illustrate these with our thought experiment as
follows:
1. In individuals who are and are not exposed to our intervention, H must
be the only systematic cause of S that is unequally distributed among
the exposed and the unexposed (so that all of the averaged differences
in level of S in each cohort of our exposed and unexposed subjects
result entirely from H).
2. H must not affect the risk for MD by any route that does not go
through S (e.g., by causing individuals to stop taking antidepressant
medication).
3. H is not itself influenced by any cause that affects MD via a route that
does not go through S, as might occur if individuals prone to depres-
sion were more likely to be selected for H.
In sum, the IM says that questions about whether X causes Y are ques-
tions about what would happen to Y if there were an intervention on X. One
great virtue of the IM is that it allows psychiatrists freedom to use whatever
family of variables seems appropriate to the characterization of a particular
problem. There is no assumption that the variables have to be capable of
figuring in quite general laws of nature, as with the deductive-nomological
approach, or that the variables have to relate to basic spatiotemporal pro-
cesses, as with the mechanistic approach. The fact is that the current evi-
dence points to causal roles for variables of many different types, and the
interventionist approach allows us to make explicit just what those roles are.
For all that, there is a sense in which the approach is completely rigorous. It
is particularly unforgiving in assuring that causation is distinguished from
correlation. Though our exposition here is highly informal, we are providing
an intuitive introduction to ideas whose formal development has been
vigorously pursued by others (e.g., Spirtes, Glymour, & Scheines, 1993; Pearl,
2001; Woodward, 2003).
If I were to try to put the essence of the IM of causality into a verbal
description, it would be as follows:
I know C to be a true cause of E if I can go into the world with its

complex web of causal interrelationships, hold all these background
relationships constant, and make a ‘‘surgical’’ intervention (or ‘‘twiddle’’)
on C. If E changes, then I know C causes E.
I see the nonreductive nature of the IM to be a critical strength for

psychiatry. Unlike the mechanistic model, it makes no a priori judgment
on the level of abstraction on which the causal processes can be best under-
stood. The IM requires only that, at whatever level it is conceived, the cause
makes a difference in the world. This is so important that it deserves repeat-
ing. The IM provides a single, clear empirical framework for the evaluation of
all causal claims of relevance to psychiatry, from molecules to neural systems
to psychological constructs to societies.
The IM and Mechanisms
Before closing, two points about the possible relationship between the IM
and mechanistic causal models are in order. First, it is in the nature of
science to want to move from findings of causality to a clarification of the
mechanisms involved—whether they are social, psychological, or molecular.
The IM can play a role in this process by helping scientists to focus on
the level at which causal mechanisms are most likely to be operative.
However, a word of caution is in order. Given the extraordinary complexity
of most psychiatric disorders, causal effects (and the mechanisms that
underlie them) may be occurring on several levels. For example, because
cognitive behavioral therapy works for MD and psychological mechanisms
are surely the level at which this process can be currently best understood,
this does not therefore mean that neurochemical interventions on MD
(via pharmacology) cannot also work. On the other hand, although pharma-
cological tools can impact on symptoms of eating disorders, cultural models
of female beauty, although operating at a very different level, can also impact
on risk.
Second, we should briefly ponder the following weighty question: Should
the plausibility of a causal mechanism impact on our interpretation of IMs?
Purists will say ‘‘No!’’ If we design the right study and the results are clear,
then causal imputations follow. Pragmatists, whose position is well
represented by the influential criteria of Hill (1965), will disagree. The

conversation would go something like this:
Pragmatist: Surely you cannot be serious! Do you mean if you find

evidence for the efficacy of astrology or ESP, my interpretation of
these results should not be influenced by the fact that we have no
bloody idea of how such processes could work in the world as we
understand it?
Purist: I am entirely serious. Your comments about astrology and
ESP clearly illustrate the problem. You have said that you are quite
willing to impose your preconceptions of how you think the world
should work on your interpretation of the data. The whole point in
science is to get away from our biases, not embrace them. This is
extra important in psychiatry, where our biases are often strong and
our knowledge of how the world really works is typically nonexistent or
at best meager.
Personally, I am a bit on the pragmatist’s side, but the purists have a point
well worth remembering.
Summary of the IM
To summarize, the IM is attractive for psychiatry for four major reasons

(Kendler & Campbell, 2009). First, the IM is anchored in the practical and
reflects the fundamental goals of psychiatry, which are to intervene in the
world to prevent and cure psychiatric disorders. Second, the IM provides a
single, clear empirical framework for the evaluation of all causal claims in
psychiatry. It provides a way by which different theoretical orientations within
psychiatry can be judged by a common metric. Third, the framework pro-
vided by the IM can help us to find the optimal level for explanation and
ultimately for intervention. Finally, the IM is explicitly agnostic to issues of
the mind–body problem. Its application can help us replace the sterile meta-
physical arguments about mind and brain which have yielded little of prac-
tical benefit with productive empirical research followed by rigorous
conceptual and statistical analysis.
Acknowledgements
This work was supported in part by grant DA-011287 from the US National
Institutes of Health. Much of my thinking in this area has been stimulated
by and developed in collaboration with John Campbell, a philosopher now at

UC Berkeley (Kendler & Campbell, 2009).
References
Davey-Smith, G. (2006). Randomized by (your) god: Robust inference from a non-

experimental study design. Journal of Epidemiology and Community Health, 60,
382–388.
Grant, B. F., & Dawson, D. A. (1997). Age at onset of alcohol use and its association
with DSM-IV alcohol abuse and dependence: results from the National Longitudinal
Alcohol Epidemiologic Survey. Journal of Substance Abuse, 9, 103–110.
Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings
of the Royal Society of Medicine, 58, 295–300.
Jessor, R., & Jessor, S. L. (1977). Problem behavior and psychosocial development: A
longitudinal study of youth. New York: Academic Press.
Kendler, K. S., & Baker, J. H. (2007). Genetic influences on measures of the environ-
ment: A systematic review. Psychological Medicine, 37, 615–626.
Kendler, K. S., Bulik, C. M., Silberg, J., Hettema, J. M., Myers, J., & Prescott, C. A.
(2000). Childhood sexual abuse and adult psychiatric and substance use disorders in
women: An epidemiological and cotwin control analysis. Archives of General
Psychiatry, 57, 953–959.
Kendler, K. S., & Campbell, J. (2009). Interventionist causal models in psychiatry:
Repositioning the mind–body problem. Psychological Medicine, 39, 881–887.
Kendler, K. S., Karkowski, L. M., & Prescott, C. A. (1999). Causal relationship between
stressful life events and the onset of major depression. American Journal of
Kendler, K. S., Neale, M. C., MacLean, C. J., Heath, A. C., Eaves, L. J., & Kessler, R. C.
(1993). Smoking and major depression. A causal analysis. Archives of General
Kendler, K. S., & Prescott, C. A. (2006). Genes, environment, and psychopathology:
Understanding the causes of psychiatric and substance use disorders. New York:
Guilford Press.
Pearl, J. (2001). Causality models, reasoning, and inference. Cambridge: Cambridge
University Press.
Pedersen, W., & Skrondal, A. (1998). Alcohol consumption debut: Predictors and
consequences. Journal of Studies on Alcohol, 59, 32–42.
Prescott, C. A., & Kendler, K. S. (1999). Age at first drink and risk for alcoholism: A
noncausal association. Alcoholism, Clinical and Experimental Research, 23, 101–107.
Salib, E. (2003). Effect of 11 September 2001 on suicide and homicide in England and
Wales. British Journal of Psychiatry, 183, 207–212.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search.
New York: Springer-Verlag.
van den Oord, E. J., & Snieder, H. (2002). Including measured genotypes in statistical
models to study the interplay of multiple factors affecting complex traits. Behavior
Genetics, 32, 1–22.
Woodward, J. (2003). Making things happen. New York: Oxford University Press.
Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, part I: A counter-
factual account. Nous, 37, 1–24.
5
Understanding the Effects of Menopausal

Hormone Therapy
Using the Women’s Health Initiative Randomized Trials and
Observational Study to Improve Inference
garnet l. anderson and ross l. prentice
Introduction
Over the last decade, several large-scale randomized trials have reported results
that disagreed substantially with the motivating observational studies on the
value of various chronic disease–prevention strategies. One high-profile exam-
ple of these discrepancies was related to postmenopausal hormone therapy
(HT) use and its effects on cardiovascular disease and cancer. The Women’s
Health Initiative (WHI), a National Heart, Lung, and Blood Institute–spon-
sored program, was designed to test three interventions for the prevention of
chronic diseases in postmenopausal women, each of which was motivated by a
decade or more of analytic epidemiology. Specifically, the trials were testing the
potential for HT to prevent coronary heart disease (CHD), a low-fat eating
pattern to reduce breast and colorectal cancer incidence, and calcium and
vitamin D supplements to prevent hip fractures. Over 68,000 postmenopausal
women were randomized to one, two, or all three randomized clinical trial
(CT) components between 1993 and 1998 at 40 U.S. clinical centers (Anderson
et al., 2003a). The HT component consisted of two parallel trials testing the
effects of conjugated equine estrogens alone (E-alone) among women with
prior hysterectomy and the effect of combined estrogen plus progestin therapy
(E+P), in this case conjugated equine estrogens plus medroxyprogesterone
acetate, among women with an intact uterus, on the incidence of CHD and
overall health.
In 2002, the randomized trial of E+P was stopped early, based on an assess-
ment of risks exceeding benefits for chronic disease prevention, raising con-
cerns among millions of menopausal women and their care providers about
79
their use of these medicines. The trial confirmed the benefit of HT for
fracture-risk reduction but the expected benefit for CHD, the primary study
end point, was not observed. Rather, the trial results documented increased
risks of CHD, stroke, venous thromboembolism (VTE), and breast cancer
with combined hormones (Writing Group for the Women’s Health Initiative
Investigators, 2002). Approximately 18 months later, the E-alone trial was also
stopped, based on the finding of an adverse effect on stroke rates and the like-
lihood that the study would not confirm the CHD-prevention hypothesis. The
results of this trial revealed a profile of risks and benefits that did not completely
coincide with either the E+P trial results or previous findings from observa-
tional studies (Women’s Health Initiative Steering Committee, 2004).
In conjunction with these trials, the WHI investigators conducted a par-
allel observational study (OS) of 93,676 women recruited from the same
population sources with similar data-collection protocols and follow-up. OS
enrollees were similar in many demographic and chronic disease risk factor
characteristics but were ineligible for or unwilling to be randomized into the
CT (Hays et al. 2003).
Because a substantial fraction of women in the OS were current or former
users of menopausal HT, joint analyses of the effects of HT use in the CT and OS
provide an opportunity to examine design and analysis methods that serve to
compare and contrast these two study designs, to identify some of the strengths
and weakness of each, and to determine the extent to which detailed data ana-
lysis provisions could bring these results into agreement and thereby explain the
discrepancies between these randomized trials and observational studies.
This chapter reviews the motivation for the hormone trials and describes
the major findings for chronic disease effects, with particular attention to the
results that differed from what was hypothesized. Then, the series of joint
analyses of CT and corresponding OS is presented. Finally, some discussion
about the implications of these analyses for the design and analysis of future
studies is provided.
Hormone Therapy Trial Background
Since the 1940s, women have been offered exogenous estrogens to relieve
menopausal symptoms. The use of unopposed estrogens grew until evidence
of an increased risk of endometrial cancer arose in the 1970s and tempered
enthusiasm for these medicines, at least for the majority of women who had
not had a hysterectomy. With the subsequent information that progestin
effectively countered the carcinogenic effects of estrogen in the endome-
trium, HT prescriptions again climbed (Wysowski, Golden, & Burke, 1995).
Observational studies found that use of HT was associated with lower risks
of osteoporosis and fractures; subsequently, the U.S. Food and Drug
Administration approved HT for the treatment and prevention of osteoporo-

sis, leading to still further increases in the prevalence and duration of hor-
mone use (Hersh, Stefanick, & Stafford, 2004).
The pervasiveness of this exposure permitted a large number of observa-
tional studies of both case–control and prospective cohort designs to examine
the relationship between hormone use and a wide range of diseases among
postmenopausal women. Most of the more than 30 observational studies
available at the initiation of the WHI reported substantial reductions in
CHD rates, the major cause of morbidity and mortality among postmeno-
pausal women (Bush et al., 1987; Stampfer & Colditz, 1991; Grady et al.,
1992). Support for the estrogen and heart disease hypothesis, which origi-
nated partly from the male–female differences in CHD rates and the marked
increase in CHD rates after menopause, was further buttressed by mechan-
istic studies showing beneficial effects of HT on blood lipid profiles and
vascular motility in animal models (Pick, Stamler, Robard, & Katz, 1952;
Hough & Zilversmit, 1986; Adams et al., 1990; Clarkson, Anthony, &
Klein, 1996). The estimated effects were substantial, ranging from
30%–70% reductions, prompting considerable public-health interest in HT
as a preventive intervention for CHD in postmenopausal women.
One barrier to more widespread use was the reports of adverse effects,
most notably breast cancer. Numerous observational studies had reported a
modest increase in breast-cancer risk with longer-term exposure to estrogen
(Steinberg et al., 1991; Barrett-Conner & Grady, 1998). The effect of adding
progestin, however, was not clear. Evidence of increased risks of VTE and
biliary disease existed, but possible reductions in risk of colorectal cancer,
strokes, mortality, dementia, and many other conditions associated with
aging, in addition to menopausal symptom control, suggested that HT was
overall beneficial for menopausal women. The overall effects, while still
imprecisely estimated, suggested important benefits for prevention of chronic
disease (Grady et al., 1992). Increasingly, postmenopausal women were
encouraged to use HT to reduce their risks of osteoporosis, fractures, and
CHD; in fact, prescriptions reached approximately 90 million per year in the
United States alone (Hersh et al., 2004). The positive view of HT was so
widely held that the initiation of a long-term, placebo-controlled trial in the
WHI was considered highly controversial (Food and Nutrition Board and
Board on Health Sciences Policy, 1993).
WHI Trial Design
In 1993 and in this environment of considerable optimism regarding an

overall benefit to postmenopausal women, the WHI HT trials were launched.
The final design specified two parallel randomized, double-blind, placebo-

controlled trials testing E-alone in women with prior hysterectomy and E+P
in women with an intact uterus. The primary objective of each trial was to
determine whether HT would reduce the incidence of CHD and provide
overall health benefit with respect to chronic disease rates. Postmenopausal
women aged 50–79 years were eligible if they were free of any condition with
expected survival of less than 3 years and satisfied other criteria related to
ability to adhere to a randomized assignment, safety, and competing risk
considerations (Women’s Health Initiative Study Group, 1998). A total of
10,739 women were recruited into the trial of E-alone and 16,608 were
accrued into the E+P trial.
Although observational studies suggested that about a 45% reduction
in CHD risk could be achieved with HT, the trial design assumed a 21%
reduction, with 81% and 88% power for E-alone and E+P, respectively.
The conservatism in the specified effect size was intended to account
for the anticipated lack of adherence to study pills, lag time to full
intervention effect, loss to follow-up in the trial, and potential anticon-
servatism in the motivating observational studies results (Anderson et al.,
2003a).
Breast-cancer incidence was defined as the primary safety outcome of the
HT trials. The power to detect a 22% increase in breast cancer during
the planned duration of the trial was relatively low (46% for E-alone and
54% for E+P), so the protocol indicated that an additional 5 years of
follow-up without further intervention would be required to assure 79%
and 87% for E-alone and E+P, respectively. Pooling the two trials was also
an option, if the results were sufficiently similar. Additional design considera-
tions have been published (Women’s Health Initiative Study Group, 1998;
Anderson et al., 2003a).
Trial Findings
The independent Data and Safety Monitoring Board terminated the E+P trial
after a mean 5.2 years of follow-up, when the breast-cancer statistic exceeded
the monitoring boundary defined for establishing this adverse effect; and this
statistic was supported by an overall assessment of harms exceeding benefits
for the designated outcomes. Reductions in hip-fracture and colorectal-cancer
incidence rates were observed, but these were outweighed by increases in the
risk of CHD, stroke, and VTE, particularly in the early follow-up period, in
addition to the adverse effect on breast cancer. A prespecified global index,
devised to assist in benefit versus risk monitoring and defined for each
woman as time to the first event for any of the designated clinical events
(CHD, stroke, pulmonary embolism, breast cancer, colorectal cancer, endo-

metrial cancer, hip fractures, or death from other causes), found a 15%
increase in the risk of women having one or more of these events (Writing
Group for the Women’s Health Initiative Investigators, 2002).
The final ‘‘intention-to-treat’’ trial results (mean follow-up 5.6 years,
Table 5.1) confirm the interim findings. The 24% increase in breast-cancer
incidence (Chlebowski et al., 2003), the 31% increase in risk of stroke
Table 5.1 Hypothesized Effects of HT at the Time the WHI Began and the Final
Results of the Two HT Trials
Hypothesized Effect E+P E-Alone

HR 95% CI AR HR 95% CI AR
a b
Coronary heart # 1.24 1.00–1.54 +6 0.95 0.79–1.15 –3
disease
Stroke $# 1.31c 1.02–1.68 +8 1.37d 1.09–1.73 +12
Pulmonary " 2.13e 1.45–3.11 +10 1.37f 0.90–2.07 +4
embolism
Venous " 2.06e 1.57–2.70 +18 1.32f 0.99–1.75 +8
thromboembolism
Breast cancer " 1.24g 1.02–1.50 +8 0.80h 0.62–1.04 –6
Colorectal cancer # 0.56i 0.38–0.81 –7 1.12j 0.77–1.55 +1
Endometrial 0.81k 0.48–1.36 –1 NA
cancer
Hip fractures # 0.67l 0.47–0.96 –5 0.65m 0.45–0.94 –7
Total fractures # 0.76l 0.69–0.83 –47 0.71m 0.64–0.80 –53
Total mortality # 0.98n 0.82–1.18 –1 1.04o 0.91–1.12 +3
Global indexp 1.15n 1.03–1.28 +19 1.01o 1.09–1.12 +2
HR, hazard ratio; CI, confidence interval; AR, attributable risk.
a
From Manson et al. (2003).
b
From Hsia et al. (2006).
c
From Wassertheil-Smoller et al. (2003).
d
From Hendrix et al. (2006).
e
From Cushment et al. (2004).
f
From Curb et al. (2006).
g
From Chlebowski et al. (2003).
h
From Stefanick et al. (2006).
i
From Chlebowski et al. (2004).
j
From Ritenbaugh et al. (2008).
k
From Anderson et al. (2003b).
l
From Cauley et al. (2003).
m
From Jackson et al. (2006).
n
From Writing Group for the Women’s Health Initiative Investigators (2002).
o
From Women’s Health Initiative Steering Committee (2004).
p
Global index defined as time to first event among coronary heart disease, stroke, pulmonary
embolism, breast cancer, colorectal cancer, endometrial cancer (E+P only), hip fractures, and death
from other causes.
(Wassertheil-Smoller et al., 2003), and the doubling of VTE rates (Cushman

et al., 2004) in the E+P group represented attributable risks of 8, 8, and 18
per 10,000 person-years, respectively, in this population. Benefits of seven
fewer colorectal cancers (44% reduction) (Chlebowski et al., 2004) and five
fewer hip fractures (33% reduction) (Cauley et al., 2003) per 10,000 person-
years were also reported.
It was the observed 24% increase in CHD risk or six additional events per
10,000 person-years (Manson et al., 2003), however, that was the most sur-
prising and perhaps the most difficult finding to accept. Neither the usual
95% confidence intervals nor the protocol-defined weighed log-rank statistic
indicate that this is clearly statistically significant. Nevertheless, even the very
conservative adjusted confidence intervals, which controlled for the multiple
testing, ruled out the level of protection described by the previous observa-
tional studies as well as the conservative projection for CHD benefit used in
the trial design (Anderson et al., 2007).
The results of the E-alone trial, stopped by the National Institutes of
Health approximately 18 months later, provided a different profile of risks
and benefits (Women’s Health Initiative Steering Committee, 2004). The final
results (Table 5.1), based on an average of 7.1 years of follow-up, reveal an
increased risk of stroke with E-alone of similar magnitude to that observed
with E+P (Hendrix et al., 2006) but no effect on CHD rates (Hsia et al., 2006).
E-alone appeared to increase the risk of VTE events (Curb et al., 2006) but to
a lesser extent than was observed with E+P. The E-alone hazard ratios for hip,
vertebral, and other fractures were comparable to those for E+P (Jackson
et al., 2006).
Most surprising of the E-alone findings was the estimated 23% reduction
in breast-cancer rates, which narrowly missed being statistically significant
(Stefanick et al., 2006), in contrast to the increased risk seen in a large
number of observational studies and the E+P trial. E-alone had no effect
on colorectal-cancer risk (Ritenbaugh et al., 2008), another finding that dif-
fered from previous studies and the E+P trial. The hazard ratios for total
mortality and the global index were close to one, indicating an overall balance
in the number of women randomized to E-alone or to placebo who died or
experienced one or more of these designated health outcomes (Women’s
Health Initiative Steering Committee, 2004).
Contrasting the WHI CT and OS
To better understand the divergent findings and, if possible, to bring the two
types of studies into agreement, WHI investigators conducted a series of
analyses examining cardiovascular outcomes in the CT and OS data jointly
(Prentice et al., 2005, 2006). The parallel recruitment and follow-up proce-
dures in the OS and CT components of the WHI make this a particularly
interesting exercise since differences in data sources and collection protocols
are minimized.
For both E+P and E-alone, the analogous user and nonuser groups from
the OS were selected for both HT trials. Specifically, for the E+P analyses, OS
women with a uterus who were using an estrogen plus progestin combina-
tion or were not using any HT at baseline were defined as the exposed (n =
17,503) and unexposed (n = 35,551) groups, respectively (Prentice et al.,
2005). Similarly, for E-alone analyses, 21,902 estrogen users and 21,902 nonu-
sers of HT in OS participants reported a prior hysterectomy at baseline
(Prentice et al., 2006). Failure times were defined as time since study enroll-
ment (OS) or randomization (CT). In the CT, follow-up was censored at the
time each intervention was stopped. In the OS, censoring was applied at a
time chosen to give a similar average follow-up time (5.5 years for OS/E+P
and 7.1 years for OS/E-alone). For CT participants, HT exposure was defined
by randomization and analyses were based on the intention-to-treat principle.
In parallel, OS participants’ HT exposure was defined by HT use at the time
of study enrollment.
In OS women, the ratio of age-adjusted event rates in E+P users to that in
nonusers was less than one for CHD (0.71) and stroke (0.77) and close to one
for VTE (1.06), but each was 40%–50% lower than the corresponding statis-
tics from the randomized trial (Table 5.2, upper panel) and therefore similar
to the motivating observational studies. For E-alone, the corresponding ratios
were all less than one (0.68 for CHD, 0.95 for stroke, and 0.78 for VTE) and
30%–40% lower than the CT estimates (Table 5.2, lower panel).
The cardiovascular risk profile (race/ethnicity, education, income, body
mass index [BMI], physical activity, current smoking status, history of cardi-
ovascular disease, and quality of life) among E+P users in the OS was some-
what better than that for OS nonusers (examples of these shown in
Figure 5.1). The distribution of these risk factors in the CT was balanced
across treatment arms but resembled that of the OS nonuser population
more than the corresponding HT user group. A similar pattern of healthy
user bias was observed for E-alone among OS participants.
Aspects of HT exposure also varied between the CT and OS. Among HT
users in the OS, the prevalence of long-term use, defined here as the pre-
enrollment exposure duration for the HT regimen reported at baseline, was
considerably higher than in the CT (Figure 5.2); but few were recent initia-
tors of HT in the OS. In the CT, most participants had never used HT before
or had used it only briefly. In terms of both duration and recency of each
regimen, the distributions in the CT more closely resembled those of the OS
nonusers (Prentice et al., 2005, 2006).
Table 5.2 Hormone Therapy Hazard Ratios (95% Confidence Intervals) for CHD,1
Stroke, and VTE Estimated Separately in the WHI CT and OS and Jointly with a Ratio
Measure of Agreement (OS/CT) Between the Two Study Components
CHD Stroke VTE

CT OS OS/CT CT OS OS/CT CT OS OS/CT
Estrogen+
progestina
Age-adjusted 1.21 0.71 0.59 1.33 0.77 0.58 2.10 1.06 0.50
Multivariate 1.27 0.87 0.69 1.21 0.86 0.71 2.13 1.31 0.62
adjustedb
By time since
initiation
<2 years 1.68 1.12 0.67 1.15 2.10 1.83 3.10 2.37 0.76
2–5 years 1.25 1.05 0.84 1.49 0.48 0.32 1.89 1.52 0.80
5+ years 0.66 0.83 1.26 0.74 0.89 1.20 1.31 1.24 0.95
Combined 0.93 0.76 0.84
OS/CT
<2 years 1.58 (1.12–2.24) 1.41 (0.90–2.22) 3.02 (1.94–4.69)
2–5 years 1.19 (0.87–1.63) 1.14 (0.82–1.59) 1.85 (1.30–2.65)
5+ years 0.86 (0.59–1.26) 1.12 (0.73–1.72) 1.47 (0.96–2.24)
Estrogen alonec
Age-adjusted 0.96 0.68 0.71 1.37 0.95 0.69 1.33 0.78 0.59
Multivariate 0.97 0.74 0.77 1.35 1.00 0.74 1.39 0.88 0.63
adjusted
By time since
initiation
<2 years 1.07 1.20 1.12 1.69 0.37 0.22 2.36 1.48 0.63
2–5 years 1.13 1.09 0.96 1.14 0.89 0.78 1.31 0.91 0.69
5+ years 0.80 0.73 0.91 1.41 1.01 0.72 1.16 0.85 0.73
Combined 0.89 0.68 0.82
OS/CT
<2 years 1.11 (0.73–1.69) 1.48 (0.89–2.44) 2.18 (1.15–4.13)
2–5 years 1.17 (0.88–1.56) 1.18 (0.83–1.67) 1.22 (0.80–1.85)
5+ years 0.81 (0.62–1.06) 1.48 (1.06–2.06) 1.06 (0.72–1.56)
CHD, coronary heart disease; VTE, venous thromboembolism; CT, clinical trial; OS, observational
study.
a
From Prentice et al. (2005).
b
Adjusted for age, race, body mass index, education, smoking status, age at menopause, and
physical functioning. Hazard ratios accompanied by 95% confidence intervals in combined OS
and CT analyses.
c
From Prentice et al. (2006).
In the trials, the HT tested was conjugated equine estrogens (0.625 mg/day)
with or without medroxyprogesterone acetate (2.5 mg/day). OS women had
access to a broader range of regimens, including different formulations,
doses, and routes of administration; but the majority of HT use reported
Women with a uterus Women with prior hysterectomy
OS E+P user OS E-alone user
White
OS Non-user OS Non-user Black
Hispanic
CT E+P CT E-alone American Indian
Asian/Pacific Islander
Unknown
CT Placebo CT Placebo
0% 50% 100% 0% 50% 100%
Underweight
OS Non-user OS Non-user Normal
Overweight
CT E+P CT E-alone Obese I
Obese II
Extremely Obese
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
Figure 5.1 Distribution of selected cardiovascular risk factors by hysterectomy status, study component and hormone use at baseline in the
Observational Study (OS) or randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone.
Derived from Prentice et al, 2005, 2006.
OS Non-user OS Non-user 0-8 years

Some high school
High school diploma
CT E+P CT E-alone
Some college/technical
College degree
0% 50% 100% 0% 20% 40% 60% 80% 100%
OS Non-user OS Non-user
Never smoked
Past smoked
CT E+P CT E-alone
Current smoker
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
Figure 5.1 (Con’t).
Women with a uterus Women with a prior hysterectomy
Duration of prior E+P use
OS E+F user OS E-alone user
OS Non-user OS Non-user Never

<2 years
CT E+F CT E-alone 2-5 years
>5 years
0% 20% 40% 60% 80% 100% 80% 85% 90% 95% 100%
Recency of E+P use at baseline
OS E+F user OS E-alone user

Never
OS Non-user OS Non-user Current
Within past 1-4 years
CT E+F CT E-alone Last use 5-10 years age
Last use 10+ years age
0% 20% 40% 60% 80% 100% 80% 85% 90% 95% 100%
Figure 5.2 Hormone therapy exposure history by hysterectomy status, study component and hormone use at baseline in the Observational Study (OS) or
randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone. Derived from Prentice et al, 2005, 2006.
Duration of prior E-alone use

Never
OS Non-user OS Non-user < 2 years
2-5 years
CT E+F CT E-alone 5-10 years
10-15 years
CT Placebo CT Placebo >15 years
75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%
Recency of E-alone use at baseline

Never
OS Non-user OS Non-user Current
Within past 1-4 years
CT E+F CT E-alone
Last use 5-10 years ago
Last use >10 years ago
75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%

was of the same dose of the same oral estrogens and progestin as used in
the CT, with another substantial fraction taking different doses of the same
preparation (Prentice et al., 2005, 2006), suggesting that differences in the
medications used is an unlikely source of the discrepant results.
Joint Analyses of the CT and OS
To determine the extent to which traditional confounders could explain the

differences in HT effects on CHD, stroke, and VTE between each trial and
the corresponding OS sample, a Cox regression model was fit for each HT
regimen and each outcome.
Because age is such a strong predictor of disease, the model incorporated
separate baseline disease incidence rates for each 5-year age group as well as
a linear predictor of age to control for any residual confounding. For similar
reasons, BMI was modeled in four levels as well as a linear term. Age at
menopause, education, current smoking status, and baseline physical func-
tioning were also included. These multivariate adjusted results moved the CT
and OS results somewhat closer together (Table 5.2), but the ratios of hazard
ratios in the OS and CT were still significantly less than one (range of OS/CT
ratios for these three disease end points was 0.62–0.71 for E+P and 0.63–0.77
for E-alone).
These analyses rely on the common assumption of proportional hazards,
that is, that the effect of HT is to multiply the nonuser event rate by a
constant factor over time. Allowing the E+P hazard ratios to depend
on time since initiation of the current HT episode (defined by the intervals
<2 years, 2–5 years, and 5+ years since randomization in the CT and since
the woman started her current HT regimen in the OS) improved the com-
parison of the OS and CT hazard ratios for all three cardiovascular disease
outcomes, although a nearly statistically significant difference between the
studies remained for stroke (Prentice et al., 2005). In these analyses, the
pattern of early risk for CHD, stroke and VTF which attenuates over time
is found in the OS as well as the CT (Table 5.2).
The relative lack of information on effects during early exposure in the OS
and a corresponding limitation in the CT on longer-term use produce rather
imprecise estimates in these intervals. A combined analysis that allows HT
effects to vary over time but assumes a proportional difference between
HT effects in the two study components (ratio of HT hazard ratios in the
OS to that in the CT is constant across time intervals) capitalizes on the
strength of both studies. For this purpose, a model was specified to estimate
a single HT hazard ratio for each specified time interval and a single OS/CT
ratio to describe the level of agreement between the two studies.
In combined analyses for E+P (Table 5.2, upper panel), the estimates of
the OS/CT ratios were not significantly different from one for CHD and VTE
(0.93 and 0.84, respectively), suggesting that this model provided reasonable
agreement between the two studies. For stroke there is evidence of residual
bias (OS/CT = 0.76). These combined analyses describe a pattern of early
risks for all three of these cardiovascular disease outcomes and continuing
risk for VTE.
Adherence-adjusted versions of these analyses tended to yield hazard
ratios with somewhat greater departures from unity than those shown in
Tables 5.1 and 5.2 but had little effect on comparative hazard ratios between
the CT and OS. Those analyses involved a particularly simple form of adher-
ence adjustment, with follow-up time censored six months after a change in
status from HT user to nonuser or vice versa. Inverse adherence probability-
weighted estimating procedures (e.g., Robbins & Finkelstein, 2000) can also
be recommended for this problem.
For E-alone, the joint analysis produced excellent agreement for CHD (ratio
of E-alone hazard ratios in the OS to that in the CT was OS/CT = 0.89) and some
improvement in the alignment for VTE (OS/CT = 0.82) (Table 5.2, lower
panel). The effects of E-alone on stroke risk also appeared to differ between
the OS and CT (OS/CT = 0.68). These OS/CT ratios were very similar to
those for E+P, but the pattern of E-alone effects was somewhat different.
These combined analyses provide evidence of an early increased risk for both
stroke and VTE, with the adverse effect on stroke rates continuing beyond
year 5 but no significant effects on CHD.
Additional analyses to examine the source of discrepancies between
the estimated hormone effects on breast cancer and colorectal cancer in
the two trials and the motivating literature can be understood by similar
comparisons of the OS and CT cohorts that have recently been published.
In the E+P trial, the elevated risk of breast cancer was generally expected but
the estimated effect size was considerably smaller than other investigators
have reported from observational studies (e.g., Million Women Study
Collaborators, 2003). The fact that the E-alone hazard ratio for breast
cancer was less than one, though not statistically significant, presents another
puzzle in that estrogen has long been considered to have a carcinogenic
effect in the breast. In joint OS/CT analyses, the discrepancy between the
trials and the OS regarding HT effects on breast-cancer risk could be
accounted for only by modeling the effects of the time between menopause
and first HT use as well as the time since initiation of current HT episode,
mammography use, and traditional confounding factors (Prentice et al., 2008a,
2008b). The contrasts for colorectal cancer are similarly interesting in
that a protective effect was observed with E+P but not with E-alone, even
though the literature had identified a potential beneficial role for estrogen.
Joint OS/CT analyses suggest no clear effect of either HT preparation (Prentice

et al., 2009).
Extending the CT Results
The relative success in identifying the statistical adjustments that helped

to reconcile the OS and CT results within the WHI allows one to consider
additional analyses that would be impossible without a new trial or would
be severely underpowered in the CT alone. For example, if biases in the
OS are similar across all HTs, one could apply the models described
above to data for other hormone regimens observed in the OS and thereby
obtain more reliable estimates of their effects. To test this within the
WHI, Prentice et al. (2006) used the CT and OS data for one HT regimen
(e.g., E+P) to adjust the OS data for the alternate therapy (E-alone) and
vice versa. The first adjustment simply applies the OS/CT ratio from one
HT to the OS data for the other HT preparation. A second adjustment
incorporates an additional proportional effect between the two HT prepara-
tions. In Figure 5.3, the simple age-adjusted as well as the two levels of
adjusted results from the OS are displayed in addition to the parallel CT
findings, showing that these adjustments generally tend to improve the OS
estimates.
A second possible use of the joint analyses is to improve the precision of
subgroup analyses. In this regard, interest has focused on the youngest
women (aged 50–59) because they are more likely to seek treatment for
vasomotor symptoms, and hence, the observational studies that motivated
the trials primarily involved women who usually initiated therapy during
this time of life. Further, some argue that early intervention in atherosclerosis
may be helpful whereas later maneuvers could be detrimental. For E-alone,
the CT data suggest that women aged 50–59 may experience some reduction
in CHD risk and in the risk of any of the designated events in the global
index (Women’s Health Initiative Steering Committee, 2004; Hsia et al.,
2006). There was no evidence of a reduction in CHD risk for the younger
women in the E+P trial however (Manson et al., 2003). A recent exploratory
analysis that pooled the data from the two trials suggests that the adverse
CHD effects may be a function of time since menopause, with women
beginning HT soon after menopause having a reduced risk of CHD
(Rossouw et al., 2007). To examine hormone effects on CHD in the younger
women with greater precision, Prentice et al. (2006) conducted similar joint
OS/CT analyses for CHD among women aged 50–59 years at baseline. For
E+P, these analyses reflect the same general pattern of an early increased
risk with E+P that diminished over time. For E-alone, however, the combined
E+P Hazard Ratios for CHD

1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
< 2 years 2-5 years > 5 years
E+P Hazard Ratios for Stroke

3.5
3
2.5
2
1.5
1
0.5
0
E+P Hazard Ratios for VTE

3.5
3
2.5
2
1.5
1
0.5
0
OS Adjustment 1 Adjustment 2 CT
Figure 5.3 HT hazard ratios in the Observational Study based on a simple multivariate
model (OS), with adjustment for the OS/CT hazard ratio estimated from the alternate
trial (Adjustment 1), and assuming proportional hazards for E-alone to E+P
(Adjustment 2), compared to the corresponding Clinical Trial hazard ratios (CT).
Derived from Prentice et al, 2005, 2006.
E-alone Hazard Ratios for CHD

1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
E-alone Hazard Ratios for Stroke

1.8
1.6
1.4
1.2
1
0.5
0.6
0.4
0.2
0
E-alone Hazard Ratios for VTE

2.5
2
1.5
1
0.5
0
OS Adjustment 1 Adjustment 2 CT
OS/CT analyses suggest a reduced risk of CHD among these younger women
with prior hysterectomy.
Discussion
The stark contrasts between the results from a large number of observational
studies and the WHI randomized trials of menopausal HT provide impetus
for reflection on the role of observational studies in evaluating therapies.
Despite the usual effort to control for potential confounders in most previous
observational studies, the replication of findings of CHD benefit and breast-
cancer risk with HT across different study populations and study designs, and
support from mechanistic studies, clinically relevant aspects of the relation-
ship between HT and risk for several chronic diseases were not appreciated
until the WHI randomized trial results were published. The reliance on
lower-level evidence may have exposed millions of women to small increases
in risk of several serious adverse effects.
Randomized trials have their own limitations. In this example, the WHI
HT trials tested two specific regimens in a population considered appropriate
for CHD prevention. As many have claimed, the trial design did not fully
reflect the way HT had been used in practice—prescribed near the time of
menopause, with possible tailoring of regimen to the individual. Also, while
the WHI tested HT in the largest number of women in the 50–59 year age
range ever studied, using the same agents and dosages used by the vast
majority of U.S. women, estimates of HT effects within this subgroup
remain imprecise because of the very low event rate.
This example raises many questions with regard in the public-health
research enterprise. When is it reasonable to rely on second-tier evidence
to test a hypothesis? Are there better methods to test these hypotheses?
Can we learn more from our trials, and can we use this to make observa-
tional studies more reliable?
There are insufficient resources to conduct full-scale randomized trials of
the numerous hypotheses of interest in public health and clinical medicine.
Observational studies will remain a mainstay of our research portfolio but
methods to increase the reliability of observational study results, through
better designs and analytic tools, are clearly needed. Nevertheless, when
assessing an intervention of public-health significance, the WHI
experience suggests that the evaluation needs to be anchored in a rando-
mized trial. It seems highly unlikely that the importance of the time-
dependent effect of HT on cardiovascular disease would have been recognized
without the Heart Estrogen–Progestin Replacement Study (Hulley et al.,
1998) and the WHI randomized trials. Neither observational studies con-
ducted before WHI nor the WHI OS itself would have observed these early
adverse cardiovascular disease effects without the direction from the trials to
look for it.
The statistical alignment of the OS and CT results relied on several other
factors. Detailed information on the history of HT use, an extensive database
of potential confounders, and meticulous modeling of these factors were
critical. For an exposure that is more complex, such as dietary intake or
physical activity, the measurement problems are likely too great to permit
such an approach. Less obvious but probably at least as important was the
uncommon feature of WHI in having both the randomized trials and an OS

conducted in parallel, minimizing methodologic differences in outcome
ascertainment, data collection, and some aspects of the study population.
Such a study design has rarely been used but could be particularly advanta-
geous if there are multiple related therapies already in use but the resources
are available to test only one in a full-scale design.
The work of Prentice and colleagues (2005, 2006, 2008a, 2008b, 2009)
provides important examples of methods to leverage the information from
clinical trials in the presence of a parallel OS. The exercises in which adjust-
ments derived by the joint OS and CT analysis of one HT were applied to OS
results for a related therapy suggest that it may be possible to evaluate one
intervention in a rigorous trial setting and expand the inference to similar
interventions in OS data. The joint analyses of CT and OS data to strengthen
subgroup analyses would have almost universal appeal. Additional effort to
define the requirements and assumptions in these designs and analyses
would be helpful.
In summary, the WHI provides an important example of the weakness of
observational study data, some limitations of randomized trials, and an
approach to combining the two to produce more reliable inference.
References
Adams, M. R., Kaplan, J. R., Manuck, S. B., Koritinik, D. R., Parks, J. S., Wolfe, M. S.,
et al. (1990). Inhibition of coronary artery atherosclerosis by 17-b estradiol in
ovariectomized monkeys: Lack of an effect of added progesterone. Arteriosclerosis,
10, 1051–1057.
Anderson, G. L., Manson, J. E., Wallace, R., Lund, B., Hall, D., Davis, S., et al. (2003a).
Implementation of the WHI design. Annals of Epidemiology, 13, S5–S17.
Anderson, G. L., Judd, H. L., Kaunitz, A. M., Barad, D. H., Beresford, S. A. A.,
Pettinger, M., et al. (2003b). Effects of estrogen plus progestin on gynecologic
cancers and associated diagnostic procedures: The Women’s Health Initiative ran-
domized trial. Journal of the American Medical Association, 290(13), 1739–1748.
Anderson, G. L., Kooperberg, C., Geller, N., Rossouw, J. E., Pettinger, M., & Prentice,
R. L. (2007). Monitoring and reporting of the Women’s Health Initiative randomized
hormone therapy trials. Clinical Trials, 4, 207–217.
Barrett-Connor, E., & Grady, D. (1998). Hormone replacement therapy, heart disease
and other considerations. Annual Review of Public Health, 19, 55–72.
Bush, T. L., Barrett-Connor, E., Cowan, L. D., Criqui, M. H., Wallace, R. B.,
Suchindran, C. M., et al. (1987). Cardiovascular mortality and noncontraceptive
use of estrogen in women: Results from the Lipid Research Clinics Program
Follow-up Study. Circulation, 75, 1102–1109.
Cauley, J. A., Robbins, J., Chen, Z., Cummings, S. R., Jackson, R. D., LaCroix, A. Z.,
et al. (2003). Effects of estrogen plus progestin on risk of fracture and bone mineral
density: The Women’s Health Initiative randomized trial. Journal of the American
Medical Association, 290, 1729–1738.
Chlebowski, R. T., Hendrix, S. L., Langer, R. D., Stefanick, M. L., Gass, M., Lane, D.,
et al. (2003). Influence of estrogen plus progestin on breast cancer and mammo-
graphy in healthy postmenopausal women: The Women’s Health Initiative rando-
mized trial. Journal of the American Medical Association, 289, 3243–3253.
Chlebowski, R. T., Wactawski-Wende, J., Ritenbaugh, C., Hubbell, F. A., Ascensao, J.,
Rodabough, R. J., et al. (2004). Estrogen plus progestin and colorectal cancer in
postmenopausal women. New England Journal of Medicine, 350, 991–1004.
Clarkson, T. B., Anthony, M. S., & Klein, K. P. (1996). Hormone replacement therapy
and coronary artery atherosclerosis: The monkey model. British Journal of Obstetrics
and Gynaecology, 103(Suppl. 13), 53–58.
Curb, J. D., Prentice, R. L., Bray, P. F., Langer, R. D., Van Horn, L., Barnabei, V. M.,
et al. (2006). Venous thrombosis and conjugated equine estrogen in women without
a uterus. Archives of Internal Medicine, 166, 772–780.
Cushman, M., Kuller, L. H., Prentice, R., Rodabough, R. J., Psaty, B. M., Stafford, R. S.,
et al. (2004). Estrogen plus progestin and risk of venous thrombosis. Journal of the
American Medical Association, 292, 1573–1580.
Food and Nutrition Board and Board on Health Sciences Policy (1993). An assessment
of the NIH Women’s Health Initiative. S. Thaul and D. Hotra (Eds.). Washington, DC:
National Academy Press.
Grady, D., Rubin, S. M., Pettiti, D. B., Fox, C. S., Black, D, Ettinger, B., et al. (1992).
Hormone therapy to prevent disease and prolong life in postmenopausal women.
Annals of Internal Medicine, 117, 1016–1036.
Hays, J., Hunt, J. R., Hubbell, F. A., Anderson, G. L., Limacher, M., Allen, C., et al.
(2003). The Women’s Health Initiative recruitment methods and results. Annals of
Epidemiology, 13, S18–S77.
Hendrix, S. L., Wassertheil-Smoller, S., Johnson, K. C., Howard, B. V., Kooperberg, C.,
Rossouw, J. E., et al. (2006). Effects of conjugated equine estrogen on stroke in the
Women’s Health Initiative. Circulation, 113, 2425–2434.
Hersh, A. L., Stefanick, M., & Stafford, R. S. (2004). National use of postmenopausal
hormone therapy. Journal of the American Medical Association, 291, 47–53.
Hough, J. L., & Zilversmit, D. B. (1986). Effect of 17-b estradiol on aortic cholesterol
content and metabolism in cholesterol-fed rabits. Arteriosclerosis, 6, 57–64.
Hsia, J., Langer, R. D., Manson, J. E., Kuller, L., Johnson, K. C., Hendrix, S. L., et al.
(2006). Conjugated equine estrogens and coronary heart disease: The Women’s
Health Initiative. Archives of Internal Medicine, 166, 357–365.
Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B., et al. (1998).
Randomized trial of estrogen plus progestin for secondary prevention of coronary
heart disease in postmenopausal women. Journal of the American Medical Association,
280, 605–613.
Jackson, R. D., Wactawski-Wende, J., LaCroix, A. Z., Pettinger, M., Yood, R. A., Watts,
N. B., et al. (2006). Effects of conjugated equine estrogen on risk of fractures and
BMD in postmenopausal women with hysterectomy: Results from the Women’s
Health Initiative randomized trial. Journal of Bone and Mineral Research, 21,
817–828.
Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., et al.
(2003). Estrogen plus progestin and the risk of coronary heart disease. New England
Journal of Medicine, 349, 523–534.
Million Women Study Collaborators (2003). Breast cancer and hormone replacement
therapy in the Million Women Study. Lancet, 362, 419–427.
Pick, R., Stamler, J., Robard, S., & Katz, L. N. (1952). The inhibition of coronary
atherosclerosis by estrogens in cholesterol-fed chicks. Circulation, 6, 276–280.
Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G.,
et al. (2005). Combined postmenopausal hormone therapy and cardiovascular
disease: Toward resolving the discrepancy between Women’s Health Initiative
clinical trial and observational study results. American Journal of Epidemiology, 162,
404–414.
Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G.,
et al. (2006). Combined analysis of Women’s Health Initiative observational and
clinical trial data on postmenopausal hormone treatment and cardiovascular disease.
Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Pettinger, M.,
Hendrix, S. L., et al. (2008a). Estrogen plus progestin therapy and breast cancer
in recently postmenopausal women. American Journal of Epidemiology, 167,
1207–1216.
Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Langer, R. D.,
Pettinger, M., et al. (2008b). Conjugated equine estrogens and breast cancer risk
in the Women’s Health Initiative clinical trial and observational study. American
Journal of Epidemiology, 167, 1407–1415.
Prentice, R. L., Pettinger, M., Beresford, S. A., Wactawski-Wende, J., Hubbell, F. A.,
Stefanick, M. L., et al. (2009). Colorectal cancer in relation to postmenopausal
estrogen and estrogen plus progestin in the Women’s Health Initiative clinical
trial and observational study. Cancer Epidemiology, Biomarkers and Prevention, 18,
1531–1537.
Ritenbaugh, C., Stanford, J. L., Wu, L., Shikany, J. M., Schoen, R. E., Stefanick, M. L.,
et al. (2008). Conjugated equine estrogens and colorectal cancer incidence
and survival: The Women’s Health Initiative randomized clinical trial. Cancer
Epidemiology, Biomarkers and Prevention, 17, 2609–2618.
Robbins, J., & Finklestein, D. (2000). Correcting for non-compliance and dependent
censoring in an AIDS clinical trial with inverse probability of censoring weighted
(IPCW) log-rank tests. Biometrics, 56, 779–788.
Rossouw, J. E., Prentice, R. L., Manson, J. E., Wu, L., Barad, D., Barnabei, V. M., et al.
(2007). Postmenopausal hormone therapy and risk of cardiovascular disease by
age and years since menopause. Journal of the American Medical Association, 297,
1465–1477.
Stampfer, M. J., & Colditz, G. A. (1991). Estrogen replacement therapy and coronary
heart disease: A quantitative assessment of the epidemiologic evidence. Preventive
Medicine, 20, 47–63.
Stefanick, M. L., Anderson, G. L., Margolis, K. L., Hendrix, S. L., Rodabough, R. J.,
Paskett, E. D., et al. (2006). Effects of conjugated equine estrogens on breast cancer
and mammography screening in postmenopausal women with hysterectomy.
Journal of the American Medical Association, 295, 1647–1657.
Steinberg, K. K., Thacker, S. B., Smith, S. J., Stroup, D. F., Zack, M. M., Flanders, W. D.,
et al. (1991). A meta-analysis of the effect of estrogen replacement therapy on the
risk of breast cancer. Journal of the American Medical Association, 265, 1985–1990.
Wassertheil-Smoller, S., Hendrix, S. L., Limacher, M., Heiss, G., Kooperberg, C., Baird,
A., et al. (2003). Effect of estrogen plus progestin on stroke in postmenopausal
women: The Women’s Health Initiative: a randomized trial. Journal of the
American Medical Association, 289, 2673–2684.
Women’s Health Initiative Steering Committee (2004). Effects of conjugated equine

estrogen in postmenopausal women with hysterectomy: The Women’s Health
Initiative randomized controlled trial. Journal of the American Medical Association,
291, 1701–1712.
Women’s Health Initiative Study Group (1998). Design of the Women’s Health
Initiative clinical trial and observational study. Controlled Clinical Trials, 19, 61–109.
Writing Group for the Women’s Health Initiative Investigators (2002). Risks and
benefits of estrogen plus progestin in healthy postmenopausal women: Principal
results from the Women’s Health Initiative randomized controlled trial. Journal of
the American Medical Association, 288, 321–333.
Wysowski, D. K., Golden, L., & Burke, L. (1995). Use of menopausal estrogens
and medroxyprogesterone in the United States, 1982–1992. Obstetrics and
Gynecology, 85, 6–10.
part ii
Innovations in Methods
6
Alternative Graphical Causal Models and the

Identification of Direct Effects
james m. robins and thomas s. richardson
Introduction
The subject-specific data from either an observational or experimental study

consist of a string of numbers. These numbers represent a series of empirical
measurements. Calculations are performed on these strings and causal infer-
ences are drawn. For example, an investigator might conclude that the ana-
lysis provides strong evidence for ‘‘both an indirect effect of cigarette
smoking on coronary artery disease through its effect on blood pressure
and a direct effect not mediated by blood pressure.’’ The nature of the
relationship between the sentence expressing these causal conclusions and
the statistical computer calculations performed on the strings of numbers has
been obscure. Since the computer algorithms are well-defined mathematical
objects, it is crucial to provide formal causal models for the English sentences
expressing the investigator’s causal inferences. In this chapter we restrict
ourselves to causal models that can be represented by a directed acyclic
graph.
There are two common approaches to the construction of causal models.
The first approach posits unobserved fixed ‘potential’ or ‘counterfactual’ out-
comes for each unit under different possible joint treatments or exposures.
The second approach posits relationships between the population distribution
of outcomes under experimental interventions (with full compliance) to the
set of (conditional) distributions that would be observed under passive obser-
vation (i.e., from observational data). We will refer to the former as ‘counter-
factual’ causal models and the latter as ‘agnostic’ causal models (Spirtes,
Glymour, & Scheines, 1993) as the second approach is agnostic as to whether
unit-specific counterfactual outcomes exist, be they fixed or stochastic.
The primary difference between the two approaches is ontological: The
counterfactual approach assumes that counterfactual variables exist, while the
103
agnostic approach does not require this. In fact, the counterfactual theory
logically subsumes the agnostic theory in the sense that the counterfactual
approach is logically an extension of the latter approach. In particular, for a
given graph the causal contrasts (i.e. parameters) that are well-defined under
the agnostic approach are also well-defined under the counterfactual
approach. This set of contrasts corresponds to the set of contrasts between
treatment regimes (strategies) which could be implemented in an experiment
with sequential treatment assignments (ideal interventions), wherein the
treatment given at stage m is a (possibly random) function of past covariates
on the graph. We refer to such contrasts or parameters as ‘manipulable with
respect to a given graph’. As discussed further in Section 1.8, the set of
manipulable contrasts for a given graph are identified under the associated
agnostic causal model from observational data with a positive joint distribu-
tion and no hidden (i.e. unmeasured) variables. A parameter is said to be
identified if it can be expressed as a known function of the distribution of the
observed data. A discrete joint distribution is positive if the probability of a
joint event is nonzero whenever the marginal probability of each individual
component of the event is nonzero.
Although the agnostic theory is contained within the counterfactual
theory, the reverse does not hold. There are causal contrasts that are well-
defined within the counterfactual approach that have no direct analog within
the agnostic approach. An example that we shall discuss in detail is the pure
direct effect (also known as a natural direct effect) introduced in Robins and
Greenland (1992). The pure direct effect (PDE) of a binary treatment X on Y
relative to an intermediate variable Z is the effect the treatment X would have
had on Y had (contrary to fact) the effect of X on Z been blocked. The PDE is
non-manipulable relative to X, Y and Z in the sense that, in the absence of
additional assumptions, the PDE does not correspond to a contrast between
treatment regimes of any randomized experiment performed via interven-
tions on X, Y and Z.
In this chapter, we discuss three counterfactual models, all of which
agree in two important respects: first they agree on the set of well-defined
causal contrasts; second they make the consistency assumption that the effect
of a (possibly joint) treatment on a given subject depends neither on whether
the treatment was freely chosen by, versus forced on, the subject nor on
the treatments received by other subjects. However the counterfactual
models do not agree as to the subset of these contrasts that can be identified
from observational data with a positive joint distribution and no hidden
variables. Identifiability of causal contrasts in counterfactual models is
obtained by assuming that (possibly conditional on prior history) the treat-
ment received at a given time is independent of some set of counter-
factual outcomes. Different versions of this independence assumption are
6 Alternative Graphical Causal Models 105
possible: The stronger the assumption (i.e., the more counterfactuals

assumed independent of treatment), the more causal contrasts that are iden-
tified. For a given graph, G, all the counterfactual models we shall consider
identify the set of contrasts identified under the agnostic model for G. We
refer to this set of contrasts as the manipulable contrasts relative to G.
Among the counterfactual models we discuss, the model derived from the
non-parametric structural equation model (NPSEM) of Pearl (2000) makes
the strongest independence assumption; indeed, the assumption is suffi-
ciently strong that the PDE may be identified (Pearl, 2001). In contrast,
under the weaker independence assumption of the Finest Fully
Randomized Causally Interpretable Structured Tree Graph (FFRCISTG) coun-
terfactual model of Robins (1986) or the Minimal Counterfactual Model
(MCM) introduced in this chapter, the PDE is not identified. The MCM is
the weakest counterfactual model (i.e., contains the largest set of distribu-
tions over counterfactuals) that satisfies the consistency assumption and
identifies the set of manipulable contrasts based on observational data with
a positive joint distribution and no hidden variables. The MCM is equivalent
to the FFRCISTG model when all variables are binary. Otherwise the MCM is
obtained by a mild further weakening of the FFRCISTG independence
assumption.
The identification of the non-manipulable PDE parameter under an
NPSEM appears to violate the slogan ‘‘no causation without manipulation.’’
Indeed, (Pearl, 2010) has recently advocated the alternative slogan ‘‘causation
before manipulation’’ in arguing for the ontological primacy of causation
relative to manipulation. Such an ontological primacy follows, for instance,
from the philosophical position that all dependence between counterfactuals
associated with different variables is due to the effects of common causes
(that are to be included as variables in the model and on the associated
graph, G), thus privileging the NPSEM over other counterfactual models.
(Pearl, 2010) privileges the NPSEM over other models but presents different
philosophical arguments for his position.
Pearl’s view is anathema to those with a refutationist view of causality
(e.g., Dawid (2000)) who argue that a theory that allows identification of non-
manipulable parameters (relative to a graph G) is not a scientific theory
because some of its predictions (e.g., that the PDE is a particular function
of the distribution of the observed data) are not experimentally testable and,
thus, are non-refutable. Indeed, in Appendix B, we give an example of a data
generating process that satisfies the assumptions of an FFRCISTG model but
not those of an NPSEM such that (i) the NPSEM prediction for the PDE is
false but (ii) the predictions made by all four causal models for the manipul-
able parameters relative to the associated graph G are correct. In this setting,
anyone who assumed an NPSEM would falsely believe he or she was able to
consistently estimate the PDE parameter from observational data on the

variables on G and no possible experimental intervention on these variables
could refute either their belief in the correctness of the NPSEM or their belief
in the validity (i.e., consistency) of their estimator of the PDE. In Appendix C,
we derive sharp bounds for the PDE under the assumption that the
FFRCISTG model associated with graph G holds. We find that these
bounds may be quite informative, even though the PDE is not (point)
identified.
This strict refutationist view of causality relies on the belief that there is a
sharp separation between the manipulable and non-manipulable causal con-
trasts (relative to graph G) because every prediction made concerning a
manipulable contrast based on observational data can be checked by an
experiment involving interventions on variables in G. However, this view
ignores the facts that (i) such experiments may be infeasible or unethical;
(ii) such empirical experimental tests will typically require an auxiliary
assumption of exchangeability between the experimental and observational
population and the ability to measure all the variables included in the causal
model, neither of which may hold in practice; and (iii) such tests are them-
selves based upon the untestable assumption that experimental interventions
are ideal. Thus, many philosophers of science do not agree with the strict
refutationist’s sharp separation between manipulable and non-manipulable
causal contrasts.
However, Pearl does not rely on this argument in responding to the
refutationist critique of the NPSEM that it can identify a contrast, the
PDE, that is not subject to experimental test. Rather, he has responded by
describing a scenario in which the PDE associated with a particular NPSEM
is identifiable, scientifically meaningful, of substantive interest, and corre-
sponds precisely to the intent-to-treat parameter of a certain randomized
controlled experiment. Pearl’s account appears paradoxical in light of the
results described above, since it suggests that the PDE may be identified
via intervention. Resolving this apparent contradiction is the primary subject
of this chapter.
We will show that implicit within Pearl’s account is a causal model asso-
ciated with an expanded graph (G0) containing more variables than Pearl’s
original graph (G). Furthermore, although the PDE of the original NPSEM
counterfactual model is not a manipulable parameter relative to G, it is
manipulable relative to the expanded graph G0. Consequently, the PDE is
identified by all four of the causal models (agnostic, MCM, FFRCISTG
and NPSEM) associated with G0. The causal models associated with
graph G0 formalize Pearl’s verbal, informal account and constitute the ‘‘addi-
tional assumptions’’ required to make the original NPSEM’s pure direct
effect contrast equal to a contrast between treatments in a randomized
experiment—a randomized experiment whose treatments correspond

to variables on the expanded graph G0 that are absent from the original
graph G.
However, the distribution of the variables of the expanded graph G0 is not
positive. Furthermore, the available data are restricted to the variables of the
original graph G. Hence, it is not at all obvious prima facie that the expanded
causal model’s treatment contrasts will be identified.
Remarkably, we prove that, under all four causal models associated with
the larger graph G0, the manipulable causal contrast of the expanded causal
model that equals the PDE of Pearl’s original NPSEM G is identified from
observational data on the original variables. This identification crucially relies
on certain deterministic relationships between variables in the expanded
model. Our proof thus resolves the apparent contradiction; furthermore, it
shows that the ontological primacy of manipulation reflected in the slogan
‘‘no causation without manipulation’’ can be maintained by interpreting the
PDE parameter of a given counterfactual causal model as a manipulable
causal parameter in an appropriate expanded model.
Having said this, although in Pearl’s scenario the intervention associated
with the expanded causal model was scientifically plausible, we discuss a
modification of Pearl’s scenario in which the intervention required to inter-
pret the PDE contrast of the original graph G as a manipulable contrast of an
expanded graph G0 is more controversial. Our discussion reveals the scientific
benefits that flow from the exercise of trying to provide an interventionist
interpretation for a non-manipulable causal parameter identified under an
NPSEM associated with a graph G. Specifically, the exercise often helps one
devise explicit, and sometimes even practical, interventions, corresponding to
manipulable causal parameters of an expanded graph G0. The exercise also
helps one recognize when such interventions are quite a stretch.
In this chapter, our goal is not to advocate for the primacy of manipulation
or of causation. Rather, our goal is to contribute both to the philosophy and
to the mathematics of causation by demonstrating that the apparent conflict
between these paradigms is often not a real one.
The reduction of an identified non-manipulable causal contrast of an
NPSEM to a manipulable causal contrast of an expanded model that is
then identified via deterministic relationships under the expanded agnostic
model is achieved here for the PDE. A similar reduction for the effect of
treatment on the treated (i.e. compliers) in a randomized trial with full
compliance in the placebo arm was given by Robins, VanderWeele, and
Richardson (2007); see also Geneletti and Dawid (2007) and Appendix A
herein.
This chapter extends and revises previous discussions by Robins and
colleagues (Robins & Greenland, 1992; Robins, 2003; Robins, Rotnitzky,
& Vansteelandt, 2007) of direct and indirect effects. We restrict consideration

to causal models, such as the agnostic, FFRCISTG, MCM, and NPSEM, that
can be represented by a directed acyclic graph (DAG). See Robins,
Richardson, and Spirtes (2009) for a discussion of alternative points of
view due to Hafeman and VanderWeele (2010), Imai, Keele, and Yamamoto
(2009) and Petersen, Sinisi, and Laan (2006) that are not based on DAGs.
The chapter is organized as follows: Section 1 introduces the four types
of causal model associated with a graph; Section 2 defines direct effects;
Section 3 analyzes the conditions required for the PDE to be identified;
Section 4 considers, via examples, the extent to which the PDE may be
interpreted in terms of interventions; Section 5 relates our results to the
work of Avin, Shpitser, and Pearl (2005) on path-specific causal effects; finally
Section 6 concludes.
1 Graphical Causal Models
Define a DAG G to be a graph with nodes (vertices) representing the ele-

ments of a vector of random variables V ¼ (V1, . . . , VM) with directed edges
(arrows) and no directed cycles. To avoid technicalities, we assume all vari-
ables Vm are discrete. We let f (v) fV(v) P(V ¼ v) all denote the probability
density of V, where, for simplicity, we assume v 2 V V M ,
V m V 1 V m , V m denotes the assumed known space of possible
values vm of Vm, and for any z1, . . . , zm, we define z m ¼ ðz1 ; . . . ;zm Þ.
By convention, for any z m ; we define z 0 z0 0. Note V m V 1 V m
is the product space of the V j , j m. We do not necessarily assume that f (v) is
strictly positive for all v 2 V.
As a simple example, consider a randomized trial of smoking cessation,
represented by the DAG G with node set V ¼ (X, Z, Y) in Figure 6.1. Thus,
M ¼ 3, V1 ¼ X, V2 ¼ Z, V3 ¼ Y. Here, X is the randomization indicator, with
X ¼ 0 denoting smoking cessation and X ¼ 1 active smoking; Z is an indicator
variable for hypertensive status 1 month post randomization; Y is an
X Z
Figure 6.1 A simple DAG containing a treatment X, an intermediate Z and a response Y.

indicator variable for having a myocardial infarction (MI) by the end of

follow-up at 3 months. For simplicity, assume complete compliance with
the assigned treatment and assume no subject had an MI prior to 1
month. We refer to the variables V as factual variables as they are variables
that could potentially be recorded on the subjects participating in the study.
Because in this chapter our focus is on identification, we assume the
study population is sufficiently large that sampling variability can be
ignored. Then, the density f (v) ¼ f (x, z, y) of the factual variables can be
taken to be the proportion of our study population with X ¼ x, Z ¼ z,
Y ¼ y. Our ultimate goal is to try to determine whether X has a direct
effect on Y not through Z.
We use either PAVm or PAm to denote the parents of Vm, that is, the set
of nodes from which there is a direct arrow into Vm. For example, in
Figure 6.1, PAY ¼ {X, Z}. A variable Vj is a descendant of Vm if there is a
sequence of nodes connected by edges between Vm and Vj such that, fol-
lowing the direction indicated by the arrows, one can reach Vj by starting at
Vm, that is, Vm ! ! Vj. Thus, in Figure 6.1, Z is a descendant of X but
not of Y.
We suppose that, as in Figure 6.1, the V ¼ (V1, . . . , VM) are numbered so
that Vj is not a descendant of Vm for m > j.
Let R ¼ (R1, . . . , RK) denote any subset of V and let r ¼ (r1, . . . , rK) be a
value of R. We write Rj ¼ Vm, Rj ¼ V m if the jth variable in R corresponds
to the mth variable in V. The NPSEM, MCM and FFRCISTG models all
assume the existence of the counterfactual random variable Vm(r) encoding
the value the variable Vm would have if, possibly contrary to fact, R were set
to r, r 2 R ¼ R1 RK, where Vm(r) is assumed to be well-defined in the
sense that there is reasonable agreement as to the hypothetical intervention
(i.e., closest possible world) which sets R to r (Robins & Greenland, 2000).
For example, in Figure 6.1, Z(x ¼ 1) and Y(x ¼ 1) are a subject’s Z and Y had,
possibly contrary to fact, the subject been a smoker. By assumption, if Rj 2 R
is the mth variable Vm, then Vm(r) equals the value rj to which the variable
Vm ¼ Rj was set. For example, in Figure 6.1, the counterfactual X(x ¼ 1) is
equal to 1. Note we assume Vm(r) is well-defined even when the factual
probability P(R ¼ r) is zero. We recognize that under certain circumstances
such an assumption might be ‘metaphysically suspect’ because the counter-
factuals could be ‘radically’ ill-defined, since no one was observed to receive
the treatment in question. However, in our opinion, in a number of the
examples that we consider in this chapter these counterfactuals do not
appear to be much less well-defined than those corresponding to treatments
that have positive factual probability.
We often write the density fV(r)(v) of V(r) as fr int ðvÞ, with ‘int’ being short
for intervene, to emphasize the fact that fVðrÞ ðvÞ ¼ fr int ðvÞ represents the
density of V in the counterfactual world where we intervened and set each

subject’s R to r. We say that fr int ðvÞ is the density of V had, contrary to fact,
each subject followed the treatment regime r. In contrast, f (v) is the density
of the factual variables V.
With this background, we specify our four causal models.
1.1 FFRCISTG Causal Models

Given a DAG G with node set V, an FFRCISTG model associated with G
makes four assumptions.
(i) All one-step-ahead counterfactuals Vm ðv m1 Þ exist for any setting

v m1 2 V m1 of their predecessors.
For example, in Figure 6.1, a subject’s hypertensive status
Z(x) ¼ V2(v1) at smoking level x for x ¼ 0 and for x ¼ 1 exists and a
subject’s MI status Yðx;zÞ ¼ V3 ðv 2 Þ at each joint level of smoking and
hypertension exists. Because V1 ¼ X has no predecessor, V1 ¼ X exists
only as a factual variable.
(ii) Vm ðv m1 Þ Vm ðpam Þ is a function of v m1 only through the values pam
of Vm’s parents on G.
For example, were the edge X ! Y missing in Figure 6.1, this assump-
tion would imply that Y(x, z) ¼ Y(z) for every subject and every z. That
is, the absence of the edge would imply that smoking X has no effect
on Y other than through its effect on Z.
(iii) Both the factual variables Vm and the counterfactuals Vm(r) for any
R V are obtained recursively from the one-step-ahead counter-
factuals Vj ðv j1 Þ, for j m. For example, V3 ¼ V3(V1, V2(V1)) and
V3(v1) ¼ V3(v1, V2(v1)).
Thus, in Figure 6.1, with the treatment R being smoking X, a sub-
ject’s possibly counterfactual MI status Y(x ¼ 1) ¼ V3(v1 ¼ 1) had he
been forced to smoke is Y(x ¼ 1, Z(x ¼ 1)) and, thus, is completely
determined by the one-step-ahead counterfactuals Z(x) and Y(x, z).
That is, Y(x ¼ 1) is obtained by evaluating the one-step-ahead counter-
factual Y(x ¼ 1, z) at z ¼ Z(x ¼ 1). Similarly, a subject’s factual X and
one-step-ahead counterfactuals determine the subject’s factual hyper-
tensive status Z and MI status Y as Z(X) and Y(X, Z(X)) where Z(X) is
the counterfactual Z(x) evaluated at x ¼ X and Y(X, Z(X)) is the coun-
terfactual Y(x, z) evaluated at (x, z)¼ (X, Z(X)).
(iv) The following independence holds:

Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ??Vm ðvm1 Þ j V m1 ¼ vm1 ; ð6:1Þ
for all m and all vM1 2 V M1 ;
where for a fixed v M1 , v k ¼ ðv1 ; . . . ;vk Þ, k < M 1 denotes the initial
subvector of v M1 .
Assumption (iv) is equivalent to the statement that for each m, conditional on

the factual past V m1 ¼ v m1 , the factual variable Vm is independent of any
possible evolution from m + 1 of one-step-ahead counterfactuals (consistent
with V m1 ¼ v m1 ), i.e. fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg, for some v M1 of which
v m1 is a sub-vector. This follows since by (iii), Vm Vm ðV m1 Þ ¼ Vm ðv m1 Þ
when V m1 ¼ v m1 .
Note that by (iii) above, the counterfactual Vmþ1 ðv m Þ for a given subject,
say subject i, depends on the treatment v m received by the subject but does
not depend on the treatment received by any other subject. Further, Vmþ1 ðv m Þ
takes the same value whether the treatment v m is counter to fact (i.e.,
V m 6¼ v m Þ or factual (i.e., V m ¼ v m and thus Vmþ1 ðv m Þ ¼ Vmþ1 Þ. That is, the
FFRCISTG model satisfies the consistency assumption described in the
Introduction. Indeed, we shall henceforth refer to (iii) as the ‘consistency
assumption’.
The following example will play a central role in the chapter.
Example 1 Consider the FFRCISTG model associated with the graph in

Figure 6.1, then, for all z,
Yðx ¼ 1; zÞ; Zðx ¼ 1Þ ?

? X; Yðx ¼ 0; zÞ; Zðx ¼ 0Þ ?
?X ð6:2Þ
and
Yðx ¼ 1; zÞ ?
? Zðx ¼ 1Þ j X ¼ 1; Yðx ¼ 0; zÞ ?
? Zðx ¼ 0Þ j X ¼ 0 ð6:3Þ
are true statements by assumption (iv). However, the model makes no claim
as to whether
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j X ¼ 0
and
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j X ¼ 1
are true because, for example, the value of x in Y(x ¼ 1, z) differs from the
value x ¼ 0 in Z(x ¼ 0). We shall see that all four of the above independence
statements are true by assumption under the NPSEM associated with the
graph in Figure 6.1.
1.2 Minimal Counterfactual Models (MCMs)

An MCM differs from an FFRCISTG model only in that (iv) is replaced
by:
(iv*) For all m and all v M1 2 V M1 ,
f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 ; Vm ¼ vm Þ

ð6:4Þ
¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ:
Since (iv) can be written as the condition that:

For all m, all v M1 2 V M1 ; and all vm 2 Vm,

f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1; Vm ¼ vm Þ
¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ;
condition (iv) for an FFRCISTG implies condition (iv*) for an MCM.

However, the reverse does not hold.
An MCM requires only that the last display holds for the unique value vm
of Vm that occurs in the given v M1 . Thus, Equation (6.4) states that, condi-
tional on the factual past V m1 ¼ v m1 through m 1, any possible evolution
from m + 1 of one-step-ahead counterfactuals, fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg,
consistent with the past V m ¼ v m through m, is independent of the event
Vm ¼ vm. In other words,

Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ??IðVm ðvm1 Þ ¼ vm Þ j V m1 ¼ vm1 ;
ð6:5Þ
for all m and all vM1 2 V M1 ;
where I(Vm ¼ vm) is the Bernoulli indicator random variable.

It follows that in the special case where all the Vm are binary, an MCM
and an FFRCISTG model are equivalent because, for Vm binary, the random
variables Vm and I(Vm ¼ vm) are the same (possibly up to a recoding).
Physical randomization of X and/or Z implies counterfactual independen-
cies beyond those of an FFRCISTG or MCM model, see Robins et al. (2009).
However, these extra independencies fail to imply Z(0)? ?Y (1; z) for the
graph in Figure 6.1 and hence do not affect our results.
1.3 A Representation of MCMs and FFRCISTG Models

that Does Not Condition on the past
In this section we derive alternative characterizations of the counterfactual
independence conditions for the FFRCISTG model and the MCM that will
facilitate the comparison of these models with the NPSEM.
Theorem 1 Given an FFRCISTG model associated with a graph G:
(a) The set of independences in condition (6.1) is satisfied if and only if
for each vM1 2 V M1 ; the random variables

ð6:6Þ
Vmþ1 ðvm Þ; m ¼ 0; . . . ; M 1 are mutually independent:
(b) Furthermore, the set of independences (6.6) is the same for any ordering of
the variables compatible with the descendant relationships in G.
Proof of (a): ()) Given v M1 and m 2 {1, . . . , M 1}, we define

=m = {=m,m, . . . ,=M,m} to be a set of conditional independence statements:

i) =m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ??Vm ðvm1 Þ; and
ii) for j ¼ 1 to j ¼ M m,
=mþj;m :
V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ?
? Vmþj ðvmþj1 Þ j
V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; V m ðvm1 Þ¼ vm :
First, note that the set of independences in condition (6.1) is precisely =1.
Now, if the collection =m holds (for m < M 2) then =m+1 holds since (I) the
set =m+1 is precisely the set {=m+1,m , . . . , =M,m} except with Vm ðv m1 Þ removed
from all conditioning events and (II) =m,m licenses such removal. Thus,
beginning with =1, we recursively obtain that =m and thus =m,m holds for
m ¼ 1, . . . , M 1. The latter immediately implies that the variables Vm+1ðv m Þ,
m ¼ 0, . . . , M 1 are mutually independent.
(() The reverse implication is immediate upon noting that the con-
ditioning event V m1 ¼ v m1 in Equation (6.1) is the event V0 ¼ v0,
V1(v0) ¼ v1, . . . ,Vm1ðv m2 Þ ¼ vm1.
Proof of (b): This follows immediately from the assumption that

Vm ðv m1 Þ ¼ Vm ðpam Þ. œ
Theorem 2 Given an MCM associated with a graph G:
(a) The set of independences in condition (6.5) is satisfied if and only if for
each v M1 2 V M1 , and each m 2 {1, . . . , M 1},

Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ?? IðVm ðvm1 Þ ¼ vm Þ: ð6:7Þ
(b) Furthermore, the set of independences (6.7) is the same for any ordering of
the variables compatible with the descendant relationships in G.
An immediate corollary is the following.

Corollary 3 An MCM implies that for all v M1 2 V M1 , the random variables
IðVmþ1 ðv m Þ ¼ vmþ1 Þ, m ¼ 0, . . . , M 1 are mutually independent.
Proof of Theorem 2(a): ()) Given v M1 , the proof exactly follows that of the
previous theorem when we redefine:

i) =m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ?? IðVm ðvm1 Þ ¼ vm Þ; and
ii) for j ¼ 1 to j ¼ M m,
=mþj;m :
V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ?
? IðVmþj ðvmþj1 Þ ¼ vmþj Þ j
V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; Vm ðvm1 Þ¼ vm :
The reverse implication and (b) follows as in the proof of the previous
theorem. œ
1.4 Non-Parametric Structural Equation

Models (NPSEMs)
Given a DAG G with node set V, an NPSEM associated with G assumes that
there exist mutually independent random variables m and deterministic
unknown functions fm such that the counterfactual Vm ðv m1 Þ Vm ðpam Þ is
given by fm(pam, m) and both the factual variables Vm and the counterfactuals
Vm(x) for any X V are obtained recursively from the Vm ðv m1 Þ as in (iii) in
Section 1.1.
Under an NPSEM both the FFRCISTG condition (6.1) and the MCM
condition (6.5) hold. However, an FFRCISTG or MCM associated with G
will not, in general, be an NPSEM for G. Indeed an NPSEM implies

Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ?? Vm ðv
m1 Þ j V m1 ¼ vm1 ;

ð6:8Þ
for all m; all vM1 2 V M1 ; and all vm1 ; vm1 2 V m1 :
That is, conditional on the factual past V m1 ¼ v m1 , the counterfactual
Vm ðv
m1 Þ is statistically independent of all future one-step ahead counterfac-
tuals. This implies that all four statements in Example 1 are true under an
NPSEM; see also Pearl (2000, Section 3.6.3).
Hence, in an MCM or FFRCISTG model, in contrast to an NPSEM, the
defining independences are those for which the value of v m1 in (a) the
conditioning event, (b) the counterfactual Vm at m and (c) the set of future
one-step-ahead counterfactuals fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg are equal. Thus, an
FFRCISTG assumes independence of fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg and
Vm ðv
m1 Þ given Vm1 ¼ v m1 only when v m1 ¼ v m1 ¼ v m1 . As mentioned
above, the MCM further weakens the independence by replacing Vm with
I(Vm ¼ vm).
In Appendix B we describe a data-generating process leading to a counter-
factual model that is an MCM/FFRCISTG model associated with Figure 6.1,
but not an NPSEM for this figure.
Understanding the implications of these additional counterfactual inde-
pendences assumed by an NPSEM compared to an MCM or FFRCISTG
model is one of the central themes of this chapter.
1.5 The g-Functional

Before defining our third causal model, the agnostic causal model, we need
to define the g-functional density. The next Lemma shows that the assump-
tions of an MCM, and thus a fortiori those of the NPSEMs and FFRCISTG
models, restrict the joint distribution of the factual variables (when there are
missing edges in the DAG).
Lemma 4 In an MCM associated with DAG G, for all v such that f (v) > 0, the
density f (v) P(V ¼ v) of the factuals V satisfies the Markov factorization
Y
M
f ðvÞ ¼ f ðvj j paj Þ: ð6:9Þ
j¼1
Robins (1986) proved Lemma 4 for an FFRCISTG model; the proof applies
equally to an MCM. Equation (6.9) is equivalent to the statement that each
variable Vm is conditionally independent of its non-descendants given its
parents (Pearl, 1988).
Example 2 In Figure 6.1, f (x, z, y) ¼ f (y | x, z) f (z | x) f (x). If the arrow from

X to Y were missing, we would then have f (x, z, y) ¼ f (y | z) f (z | x) f (x) since
Z would be the only parent of Y.
Definition 5 Given a DAG G, a set of variables R V, and a value r of R, define

the g-functional density
(Q
j:Vj 62R f ðvj j paj Þ if v ¼ ðu; rÞ;
Pr ðV ¼ vÞ fr ðvÞ
0 if v ¼ ðu; r Þ with r 6¼ r:
In words, fr(v) is the density obtained by modifying the product on the

right-hand side of Equation (6.9) by removing the term f (vj | paj) for every
Vj 2 R, while for Vj 62 R, for each Rm 2 R in PAj, set Rm to the value rm in the

term f (vj | paj). Note the probability that R equals r is 1 under the density fr(v),
that is, Pr(R ¼ r) fr(r) ¼ 1.
The density fr(z) may not be a well-defined function of the density f (v) of
the factual data V when the factual distribution of V is non-positive because
setting Rm 2 PAj to the value rm in f (vj | paj) may result in conditioning on an
event that has probability zero of occurring in the factual distribution.
Example 3 In Figure 6.1 with R ¼ (X, Z), r ¼ (x ¼ 1, z ¼ 0), fr(v) fx¼1,z¼0(x*,

z*, y) ¼ f (y | x ¼ 1, z ¼ 0) if (x*, z*) ¼ (1, 0). On the other hand, fx¼1,z¼0(x*,
z*, y) ¼ 0 if (x*, z*) 6¼ (1, 0) since, under fx¼1,z¼0(x*, z*, y), X is always 1 and
Z always 0. It follows that fx¼1,z¼0(y) ¼ f (y | x ¼ 1, z ¼ 0). If the event
(X, Z) ¼ (1, 0) has probability zero under f (v) then fx¼1,z¼0(y) is not a function
of f (v) and is thus not uniquely defined.
The following Lemma connects the g-functional density fr(v) to the inter-
vention density frint ðvÞ.
Lemma 6 Given an MCM associated with a DAG G, sets of variables R, Z V

and a treatment regime r, if the g-functional density fr(z) is a well-defined function
of f (v), then fr ðzÞ ¼ frint ðzÞ.
In words, whenever the g-functional density fr(z) is a well-defined function
of f (v), it is equal to the intervention density for Z that would be observed
had, contrary to fact, all subjects followed the regime r.
This result can be extended from so-called static treatment regimes r
to general treatment regimes, where treatment is a (possibly random)
function of the observed history, as follows. Suppose we are given a set of
variables R and for each Vj ¼ Rm 2 R we are given a density pj ðvj j v j1 Þ. Then,
we define pR to be the general treatment regime corresponding to an inter-
vention in which, for each Vj ¼ Rm 2 R, a subject’s treatment level vj is ran-
domly assigned with randomization probabilities pj ðvj j v j1 Þ that are a
function of the values of the subset of the variables V j1 that temporally
precede Vj. We let fpRint ðvÞ be the distribution of V that would be observed
if, contrary to fact, all subjects had been randomly assigned treatment
with probabilities pR. Further, we define the g-functional density fpR(v) to be
the density
Y Y
fpR ðvÞ f ðvj j paj Þ pj ðvj j vj1 Þ
j:Vj 62R j:Vj 2R
and for Z V, fpR ðzÞ vnz fpR ðvÞ. Thus the marginal fpR(z) is obtained from
fpR(v) by summation in the usual way. Then, we have the following extension
of Lemma 6.
Extended Lemma 6: Given an MCM associated with a DAG G, sets of variables

R, Z V, and a treatment regime pR, if the g-functional density fpR(z) is a well-
defined function of f (v), then fpR ðzÞ ¼ fpRint ðzÞ.
In words, whenever the g-functional density fpR(z) is a well-defined func-
tion of f (v), it is equal to the intervention density for Z that would be
observed had, contrary to fact, all subjects followed the general regime pR.
Robins (1986) proved Extended Lemma 6 for an FFRCISTG model; the proof
applies equally to an MCM. Extended Lemma 6 actually subsumes Lemma 6
as fr int ðzÞ is fpRint ðzÞ for pR such that for Vj ¼ Rm 2 R, pj ðvj j v j1 Þ ¼ 1 if vj ¼ rm
and is zero if vj 6¼ rm.
Corollary to Extended Lemma 6: Given an MCM associated with a DAG G, sets

of variables R, Z V and a treatment regime pR, fpR ðzÞ ¼ fpint
R
ðzÞ whenever pR
satisfies the following positivity condition:
For all Vj 2 R, f ðV j1 Þ > 0 and pj ðvj jV j1 Þ > 0 imply

f ðvj j V j1 Þ > 0 with probability one under f (v).
This follows directly from Extended Lemma 6, as the positivity condition

implies that fpR(z) is a well-defined function of f (v). In the literature, one
often sees only the Corollary stated and proved. However, as noted by Gill
and Robins (2001), these proofs use the ‘positivity condition’ only to show
that fpR(z) is a well-defined (i.e., unique) function of f (v). Thus, these proofs
actually establish the general version of Extended Lemma 6. In this chapter
we study models in which fpR(z) is a well-defined function of f (v) even though
the positivity assumption fails; as a consequence, we require the general
version of the Lemma.
1.6 Agnostic Causal Models

We are now ready to define the agnostic causal model (Spirtes et al., 1993):
Given a DAG G with node set V, the agnostic causal model represented by
G assumes that the joint distribution of the factual variables V factors as
in Equation (6.9) and that the interventional density of Z V, again
denoted by fpRint ðzÞ or fr int ðzÞ, under treatment regime pR or regime r is
given by the g-functional density fpR(z) or fr(z), whenever fpR(z) or fr(z) is a
well-defined function of f (v).
Although this model assumes that density fpRint ðvÞ or fr int ðvÞ of V under
these interventions exist, the model makes no reference to counterfactual
variables and is agnostic as to their existence. Thus the agnostic causal
model does not impose any version of a consistency assumption.
1.7 Interventions Restricted to a Subset of Variables

In this chapter we restrict consideration to graphical causal models in which
we assume that interventions on every possible subset of the variables are
possible and indeed well-defined.
The constraint that only a subset V* of V can be intervened on may be
incorporated into the agnostic causal model by marking the elements of V*
and requiring that, for any intervention pR, R ˝ V*. For the FFRCISTG
model, Robins (1986, 1987) constructs a counterfactual model, the fully ran-
domized causally interpreted structured tree graph (FRCISTG) model that
imposes the constraint and reduces to the FFRCISTG model when
V* ¼ V. We briey review his approach and its extension to the MCM
model in Appendix D. See also the decision-theoretic models of Dawid
(2000) and Heckerman and Shachter (1995).
1.8 Manipulable Contrasts and Parameters

In the Introduction we defined the set of manipulable contrasts relative to
a graph G to be the set of causal contrasts that are well-defined under the
agnostic causal model, i.e., the set of contrasts that are functions of the
causal effects fpRint ðzÞ. The set consists of all contrasts between treatment
regimes in an experiment with sequential treatment assignments, wherein
the treatment given at stage m is a function of past covariates on
the graph.
Definition 7 We say a causal effect in a particular causal model associated with a

DAG G with node set V is non-parametrically identified from data on V (or,
equivalently, in the absence of hidden variables) if it is a function of the density
f (v) of the factuals V.
Thus in all four causal models, the causal effects fpint R
ðzÞ for which the g-
functional fpR(z) is a well-defined function of f (v) are non-parametrically
identified from data on V. It follows that the manipulable contrasts are
non-parametrically identified under an agnostic causal model from observa-
tional data with a positive joint distribution and no hidden (i.e., unmeasured)
variables. (Recall that a discrete joint distribution is positive if the probability
of a joint event is nonzero whenever the marginal probability of each indi-
vidual component of the event is nonzero.)
In contrast, the effect of treatment on the treated
ETTðxÞ E½YðxÞ Yð0Þ j X ¼ x

is not a manipulable parameter relative to the graph G: X ! Y, since it is not
well-defined under the corresponding agnostic causal model. However,
ETT(x) is identified under both MCMs and FFRCISTG models. Robins

(2003) stated that an FFRCISTG model identified only ‘‘manipulable para-
meters.’’ However, in that work, unlike here, no explicit definition of manip-
ulable was used; in particular it was not specified which class of interventions
was being considered. In Appendix A we show that the MCMs and
FFRCISTG models identify ETT(x), which is not a manipulable parameter
relative to the graph X ! Y. However, ETT(x) is a manipulable parameter
relative to an expanded graph G0 with deterministic relations; see also
Robins, VanderWeele, and Richardson (2007), and Geneletti and Dawid
(2007).
For expositional simplicity, we will henceforth restrict our discussion to
static deterministic regime effects frint ðzÞ except when non-static (i.e., dynamic
and/or random) regimes pR are being explicitly discussed.
2 Direct Effects
Consider the following query: Do cigarettes (X) have a causal effect on MI (Y)
through a pathway that does not involve hypertension (Z)? This query is
often rephrased as whether X has a direct causal effect on Y not through
the intermediate variable Z. The concept of direct effect has been formalized
in three different ways in the literature. For notational simplicity we always
take X to be binary, except where noted in Appendix A.
2.1 Controlled Direct Effects (CDEs)

Consider a causal model associated with a DAG G with node set V containing
(X, Y, Z). In a counterfactual causal model, the individual and average con-
trolled direct effect (CDE) of X on Y when Z is set to z are, respectively, defined
as Y(x ¼ 1, z) Y(x ¼ 0, z) and CDE(z) ¼ E[Y(x ¼ 1, z) Y(x ¼ 0, z)]. In our
previous notation, E[Y(x ¼ 1, z) Y(x ¼ 0, z)] is the difference in means
int int int
Ex¼1;z ½Y Ex¼0;z ½Y of Y under the intervention distributions fx¼1;z ðvÞ and
int
fx¼0;z ðvÞ. Under the associated agnostic causal model, counterfactuals do not
int int
exist but the CDE(z) can still be defined as Ex¼1;z ½Y Ex¼0;z ½Y. Under all
int int
four causal models, Ex¼1;z ½Y Ex¼0;z ½Y is identified from data on V by
Ex¼1,z [Y] Ex¼0,z [Y] under the g-formula densities fx¼1,z(v) and fx¼0,z(v),
if these are well-defined functions of f (v). In the case of Figure 6.1, Ex,z[Y] is
just the mean E[Y | X ¼ x, Z ¼ z] of the factual Y given X ¼ x and Z ¼ z since,
by the definition of the g-formula, fx,z(y) ¼ f (y | X ¼ x, Z ¼ z).
When Z is binary there exist two different controlled direct effects corre-
sponding to z ¼ 1 and z ¼ 0. For example, CDE(1) is the average effect of X
on Y in the study population were, contrary to fact, all subjects to have Z set
to 1. It is possible for CDE(1) to be zero and CDE(0) to be nonzero or vice
versa. Whenever CDE(z) is nonzero for some level of Z, there will exist a
directed path from X to Y not through Z on the causal graph G, regardless of
the causal model.
2.2 Pure Direct Effects (PDEs)

In a counterfactual model, Robins and Greenland (1992) (hereafter R&G)
defined the individual pure direct effect (PDE) of a (dichotomous) exposure
X on Y relative to an intermediate variable Z to be Y(x ¼ 1, Z(x ¼ 0))
Y(x ¼ 0). That is, the individual PDE is the subject’s value of Y under
exposure to X had, possibly contrary to fact, X’s effect on the intermediate
Z been blocked (i.e., had Z remained at its value under non-exposure) minus
the value of Y under non-exposure to X. The individual PDE can also be
written as Y(x ¼ 1, Z(x ¼ 0)) Y(x ¼ 0, Z(x ¼ 0)), since Y(x ¼ 0) ¼ Y(x ¼ 0,
Z(x ¼ 0)). Thus the PDE contrast measures the direct effect of X on Y
when Z is set to its value Z(x ¼ 0) under non-exposure to X. The average
PDE is given by
PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Yðx ¼ 0Þ

ð6:10Þ
¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ Yðx ¼ 0; Zðx ¼ 0ÞÞ:
Pearl (2001) adopted R&G’s definition but changed nomenclature. He

referred to the pure direct effect as a ‘natural’ direct effect. Since the inter-
int
vention mean E½Yðx ¼ 0Þ ¼Ex¼0 ½Y is identified from data on V under any of
the associated causal models, the PDE is identified if and only if E[Y(x ¼ 1,
Z(x ¼ 0))] is identified. The data generating process given in Appendix B
shows that E[Y(x ¼ 1, Z(x ¼ 0))] is not a manipulable effect relative to the
graph in Figure 6.1. Further we show that E[Y(x ¼ 1, Z(x ¼ 0))] is not identi-
fied under an MCM or FFRCISTG model from data on V in the absence of
further untestable assumptions. However, we shall see that E[Y(x ¼ 1,
Z(x ¼ 0))] is identified under the NPSEM associated with the graph in
Figure 6.1.
Under the agnostic causal model, the concept of pure direct effect
is not defined since the counterfactual Y(x ¼ 1, Z(x ¼ 0)) is not assumed
to exist.
2.3 Principal Stratum Direct Effects (PSDEs)

In contrast to the control direct effect and pure direct effect, the individual
principal stratum direct effect (PSDE) is defined only for subjects for whom
X has no causal effect on Z so that Z(x ¼ 1) ¼ Z(x ¼ 0). For a subject with
Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z, the individual principal stratum direct effect is

defined to be
Yðx ¼ 1; zÞ Yðx ¼ 0; zÞ
(here, X is assumed to be binary). The average PSDE in principal stratum z is

defined to be
PSDEðzÞ E½Yð1;zÞ Yð0;zÞ j Zð1Þ ¼Zð0Þ ¼z:
Robins (1986, Sec. 12.2) first proposed using PSDE(z) to define causal
effects. In his article, Y ¼ 1 denoted the indicator of death from a cause of
interest (subsequent to a time t), Z ¼ 0 denoted the indicator of survival until
t from competing causes, and the contrast PSDE(z) was used to solve the
problem of censoring by competing causes of death in defining the causal
effect of the treatment X on the cause Y. Rubin (1998) and Frangakis and
Rubin (1999, 2002) later used this same contrast to solve precisely the same
problem of ‘‘censoring by death.’’ Finally, the analysis of Rubin (2004) was
also based on this contrast, except that Z and Y were no longer assumed to be
failure-time indicators.
The argument given below in Sec. 4 to prove that E[Y(x ¼ 1, Z(x ¼ 0))] is
not a manipulable effect relative to the graph in Figure 6.1 also proves that
PSDE(z) is not a manipulable effect relative to this graph. Furthermore, the
PSDE(z) represents a causal contrast on a non-identifiable subset of the study
population — the subset with Z(1) ¼ Z(0) ¼ z. An even greater potential pro-
blem with the PSDE is that if X has an effect on every subject’s Z, then
PSDE(z) is undefined for every possible z. If Z is continuous and/or multi-
variate, it would not be unusual for X to have an effect on every subject’s Z.
Thus, Z is generally chosen to be univariate and discrete with few levels,
often binary when PSDE(z) is the causal contrast.
However, principal stratum direct effects have the potential advantage of
remaining well-defined even when controlled direct effects or pure direct
effects are ill-defined. Note that for a subject with Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z,
we have Y(x ¼ 1, z) ¼ Y(x ¼ 1, Z(x ¼ 1)) Y(x ¼ 1) and Y(x ¼ 0, z) ¼ Y(0,
Z(0)) Y(x ¼ 0), so the individual PSDE for this subject is Y(x ¼ 1)
Y(x ¼ 0). The average PSDE is given by:
PSDEðzÞ ¼ E ½Yðx ¼ 1Þ Yðx ¼ 0Þ j Zð1Þ ¼ Zð0Þ ¼ z:
Thus, PSDE’s can be defined in terms of the counterfactuals Y(x) and Z(x).
Now, in a trial where X is randomly assigned but the intermediate Z is
not, there will generally be reasonable agreement as to the hypothetical
intervention (i.e., closest possible world) which sets X to x so Y(x) and
Z(x) are well defined; however, there may not be reasonable agreement
as to the hypothetical intervention which sets X to x and Z to z, in which

case Y(x, z) will be ill-defined. In that event, controlled and pure direct
effects are ill-defined, but one can still define PSDE(z) by the previous
display.
However, when Y(x, z) and thus CDEs and PDEs are ill-defined, and
therefore use of the PSDE(z) is proposed, it is often the case that (i) the
intermediate variable that is truly of scientific and policy relevance — say,
Z* — is many leveled, even continuous and/or multivariate, so PSDE(z*) may
not exist for any z*, and (ii) Z is a coarsening (i.e. a function) of Z*, chosen to
ensure that PSDE(z) exists. In such settings, the counterfactual Y(x, z*) is
frequently meaningful because the hypothetical intervention which sets X to
x and Z* to z* (unlike the intervention that sets X to x and Z to z) is well-
defined. Furthermore, the CDE(z*) and PDE based on Z*, in contrast to the
PSDE(z), provide knowledge of the pathways or mechanisms by which X
causes Y and represent the effects of interventions of public-health impor-
tance. In such a setting, the direct effect contrasts of primary interest are the
CDE(z*) and the PDE based on Z* rather than the PSDE(z) based on a binary
coarsening Z of Z*. See Robins, Rotnitzky, and Vansteelandt (2007); Robins et
al. (2009) for an example and further discussion.
3 Identification of The Pure Direct Effect
We have seen that the CDE(z), as a manipulable parameter relative to the

graph in Figure 6.1, is generally identified from data on V under all four of
the causal models associated with this graph. We next consider identification
of E[Y(x ¼ 1, Z(x ¼ 0))] and, thus, identification of the PDE in three important
examples. The first two illustrate that the PDE may be identified in the
NPSEM associated with a DAG but not by the associated MCMs or
FFRCISTG models. In the third example the PDE is not identified under
any of the four causal models associated with the DAG. We will elaborate
these examples in subsequent sections.
3.1 Identification of the PDE in the DAG in Figure 6.1

Pearl (2001) proved that under the NPSEM associated with the causal DAG in
Figure 6.1 E[Y(x ¼ 1, Z(x ¼ 0))] is identified. To see why, note that if
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ for all z; ð6:11Þ
then
X
int int
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ Ex¼1;z ½Y fx¼0 ðzÞ; ð6:12Þ
z
because
X
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; zÞ j Zðx ¼ 0Þ ¼ zP½Zðx ¼ 0Þ ¼ z
z
X
¼ E ½Yðx ¼ 1; zÞP½Zðx ¼ 0Þ ¼ z;
z
where the first equality is by the laws of probability and the second by (6.11).
Now, the right side of Equation (6.12) is non-parametrically identified from
int
f (v) under all four causal models since the intervention parameters Ex;z ½Y
int
and fx ðzÞ are identified by the g-functional. In particular, with Figure 6.1 as
the causal DAG,
X X
int int
Ex¼1;z ½Y fx¼0 ðzÞ ¼ E ½Y j X ¼ 1; Z ¼ zf ðz j X ¼ 0Þ: ð6:13Þ
z z
Hence, it remains only to show that (6.11) holds for an NPSEM corre-
sponding to the graph in Figure 6.1. Now, we noted in Example 1 that
Y(x ¼ 1, z) ?? Z(x ¼ 0) | X ¼ j held for j ¼ 0 and j ¼ 1 for the NPSEM (but
not for the FFRCISTG) associated with the DAG in Figure 6.1. Further, for
this NPSEM, {Y(x ¼ 1, z), Z(x ¼ 0)} ? ? X. Combining, we conclude that (6.11)
holds. In contrast, for an FFRCISTG model or MCM corresponding to Figure
6.1, E[Y(x ¼ 1, Z(x ¼ 0))] is not identified, because condition (6.11) need not
hold. In Appendix C we derive sharp bounds for the PDE under the assump-
tion that the FFRCISTG model or the MCM associated with graph G holds.
We find that these bounds may be quite informative, even though the PDE is
not (point) identified under this model.
3.2 The ‘Natural Direct Effect’ of Didelez,

Dawid & Geneletti
Didelez, Dawid, and Geneletti (2006) (hereafter referred to as DDG) discuss
an effect that they refer to as the ‘natural direct effect’ and prove it is
identified under the agnostic causal model associated with the DAG in
Figure 6.1, the difference between Equation (6.13) and E[Y | X ¼ 0] being
the identifying formula. Since the parameter we have referred to as the
natural or pure direct effect is not even defined under the agnostic model,
it is clear they are giving the same name to a different parameter. Thus,
DDG’s results have no relevance to the identification of the PDE.
To clarify, we discuss DDG’s results in greater detail. To define DDG’s
parameter, let R ¼ (X, Z) and consider a regime pR p(X ¼ j,Z) with p(x) ¼ 1 if
and only if x ¼ j and with a given p(z | x) ¼ p*(z) that does not depend on X.
Then, fpint
ðx¼j;zÞ
ðvÞ ¼ fpint
ðx¼j;zÞ
ðx; y; zÞ is the density in a hypothetical study where
each subject receives X ¼ j and then is randomly assigned Z based on the

density p*(z). DDG define the natural direct effect to be
EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y with p*(z) equal to the density fx¼0 int
ðzÞ of Z when X
int int
is set to 0, provided EpðX¼0;ZÞ ½Y is equal to Ex¼0 ½Y, the mean of Y when all
subjects are untreated. When EpintðX¼0;ZÞ ½Y 6¼ Ex¼0 int
½Y, they say their natural
direct effect is undefined. Now, under the agnostic causal model associated
with the DAG in Figure 6.1, it follows from Extended Lemma 6 that
EpintðX¼0;ZÞ ½Y ¼ Ex¼0
int
½Y ¼ E½YjX ¼ 0 and EpintðX¼1;ZÞ ½Y is given by the right side
of Equation (6.13), confirming DDG’s claim about their parameter
EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y. In contrast, our PDE parameter is given by the dif-
ference between Equation (6.13) and E[Y | X ¼ 0] only when E[Y(x ¼ 1,
Z(x ¼ 0))] equals Equation (6.13), which cannot be the case under an agnostic
causal DAG model as E[Y(x ¼ 1, Z(x ¼ 0))] is then undefined. Note E[Y(x ¼ 1,
Z(x ¼ 0))] does equal Equation (6.13) under the NPSEM associated with
Figure 6.1 but not under the MCM or FFRCISTG model associated with
this Figure.
3.3 Identification of the PDE with a Measured Common

Cause of Z and Y that Is Not Directly Affected by X
Consider the causal DAG in Figure 6.2(a) that differs from the DAG in
Figure 6.1 in that it assumes (in the context of our smoking study example)
there is a measured common cause L of hypertension Z and MI Y that is not
caused by X. Suppose we assume an NPSEM with V ¼ (X, L, Z, Y) and our
goal remains estimation of E[Y(x ¼ 1, Z(x ¼ 0))]. Then E[Y(x ¼ 1, Z(x ¼ 0))].
remains identified under the NPSEM associated with the DAG in Figure
6.2(a) with the identifying formula now
X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ:
z;l
X Z L X Z L
(a) Y (b) Y
Figure 6.2 An elaboration of the DAG in Figure 6.2 in which L is a (measured) common
cause of Z and Y.
This follows from the fact that under an NPSEM associated with the DAG in
Figure 6.2(a),
Yðx ¼ 1; zÞ ?
? Zðx ¼ 0Þ j L for all z; ð6:14Þ
which in turn implies
X
int int
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ Ex¼1;z ½Y j L ¼ l fx¼0 ðz j L ¼ lÞf ðlÞ: ð6:15Þ
z;l
The right side of (6.15) remains identified under all four causal models via
X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ: ð6:16Þ
z;l
In contrast, for an MCM or FFRCISTG associated with the graph in

Figure 6.2(a), E[Y(x ¼ 1, Z(x ¼ 0))] is not identified because (6.14) need
not hold.
3.4 Failure of Identification of the PDE in an NPSEM

with a Measured Common Cause of Z and Y that Is
Directly Affected by X
Consider the causal DAG shown in Figure 6.2(b) that differs from that in
Figure 6.2(a) only in that X now causes L so that there exists an arrow from X
to L. The right side of Equation (6.15) remains identified under all four
causal models via
X
E ½YjX ¼ 1; Z ¼ z; L ¼ l f ðzjX ¼ 0; L ¼ lÞf ðljX ¼ 0Þ:
z;l
Under an NPSEM, MCM, or FFRCISTG model associated with this causal

DAG Y(x ¼ 1, Z(x ¼ 0)) is by definition
Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0ÞÞ ¼ Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ: ð6:17Þ
Avin et al. (2005) prove that Equation (6.14) does not hold for this NPSEM.
Thus, even under an NPSEM, we cannot conclude that Equation (6.15) holds.
In fact, Avin et al. (2005) prove that for this NPSEM E[Y(x ¼ 1, Z(x ¼ 0))] is
not identified from data on V. This is because the expression on the right-
hand side of Equation (6.17) involves both L(x ¼ 1) and L(x ¼ 0), and there is
no way to eliminate either.
Additional Assumptions Identifying the PDE in the NPSEM

Associated with the DAG in Figure 6.2(b)
However, if we were to consider a counterfactual model that imposes even

more counterfactual independence assumptions than the NPSEM, then the
PDE may still be identified, though by a different formula.
For example, if, in addition to the usual NPSEM independence assump-
tions, we assume that
Lðx ¼ 0Þ ?
? Lðx ¼ 1Þ ð6:18Þ
then we have
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ

X
¼ E½Yðx ¼ 1; l; zÞ j Lðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l
l ;l;z
f ðLðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l Þ
X
¼ E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
X
¼ E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l Þf ðLðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
X
¼ E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0Þf ðL ¼ ljX ¼ 1Þ
l ;l;z
f ðZ ¼ zjX ¼ 0; L ¼ l Þ: ð6:19Þ
Here, the second and fourth equalities follow from the usual NPSEM inde-
pendence restrictions but the third requires condition (6.18).
One setting under which (6.18) holds is that in which the counterfactual
variables L(0) and L(1) result from a restrictive ‘minimal sufficient cause
model’ (Rothman, 1976) such as
LðxÞ ¼ ð1 xÞA0 þ xA1 ð6:20Þ
where A0 and A1 are independent both of one another and of all other
counterfactuals. Note that (6.18) would not hold if the right-hand side of
Equation (6.20) was (1 x)A0 + xA1 + A2, even if the Ai’s were again assumed
to be independent (VanderWeele & Robins, 2007).
An alternative further assumption, sufficient to identify the PDE in the
context of the NPSEM associated with Figure 6.2(b), is that L(1) is a
deterministic function of L(0), i.e., L(1) ¼ g(L(0)) for some function g(). In
this case, we have:
f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞ ¼ f ðLðx ¼ 0Þ ¼ l ÞIðl ¼ gðl ÞÞ

¼ f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ;
where I() is the indicator function. Hence

X
¼ E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ ð6:21Þ
l ;l;z
f ðZ ¼ zjX ¼ 0; L ¼ l Þ:
For a scalar L taking values in a continuous state-space there will exist a

function g() such that Lð1Þ ¼gðLð0ÞÞ under the condition of rank preservation,
that is, if
L i ð0Þ 5 L j ð0Þ ) L i ð1Þ 5 L j ð1Þ;
for all individuals i, j. In this case g is simply the quantile-quantile

function:
1
1

gðlÞ FLð1Þ FLð0Þ ðlÞ ¼ FLjX¼1 FLjX¼0 ðlÞ ; ð6:22Þ
where F() and F 1() indicate the cumulative distribution function (CDF)
and its inverse; the equality follows from the NPSEM assumptions; this
expression shows that gð Þ is identified. (Since L is continuous, the sums
over l, l* in Equation (6.21) are replaced by integrals.) A special case of this
example is a linear structural equation system, where it was already known
that the PDE is identified in the graph in Figure 6.2(b). Our analysis shows
that identification of the PDE in this graph merely requires rank preservation
and not linearity. Note that a linear structural equation model implies both
rank preservation and linearity.
We note that the identifying formula in Equation (6.21) differs from
Equation (6.19). Since neither identifying assumption imposes any restriction
on the distribution of the factual variables in the DAG in Figure 6.2(b), there
is no empirical basis for deciding which, if either, of the assumptions is true.
Consequently, we do not advocate blithely adopting such assumptions in
order to preserve identification of the PDE in contexts such as the DAG in
Figure 6.2(b).
4 Models in which the PDE Is Manipulable
We now turn to the question of whether E[Y(x ¼ 1, Z(x ¼ 0))] can be

identified by intervening on the variables V on G in Figure 6.1. Now, as
noted by R&G (1992) we could observe E[Y(x ¼ 1, Z(x ¼ 0))] if we could
intervene and set X to 0, observe Z(0), then ‘‘return each subject to their
pre-intervention state,’’ intervene to set X to 1 and Z to Z(0), and finally
observe Y(1, Z(0)). However, such an intervention strategy will usually not
exist because such a return to a pre-intervention state is usually not possible
in a real-world intervention (e.g., suppose the outcome Y were death). As a
result, because we cannot observe the same subject under both X ¼ 1 and
X ¼ 0, we are unable to directly observe the distribution of mixed counter-
factuals such as Y(x ¼ 1, Z(x ¼ 0)). It follows that we cannot observe
E[Y(x ¼ 1, Z(x ¼ 0))] by any intervention on the variables X and Z. (Pearl,
2001) argues similarly. That is, although we can verify through intervention
the prediction made by all four causal models that the right-hand side of
Equation (6.13) is equal to the expression on the right-hand side of Equation
(6.12), we cannot verify, by intervention on X and Z, the NPSEM prediction
that Equation (6.12) holds.
Thus E[Y(x ¼ 1, Z(x ¼ 0))] is not manipulable with respect to the graph
in Figure 6.1, and hence neither is the PDE with respect to this graph.
Yet both of these parameters are identified in the NPSEM associated with
this graph. This would be less problematic if these parameters were of
little or no substantive interest. However, as shown in the next section,
Pearl convincingly argues that such parameters can be of substantive
importance.
4.1 Pearl’s Substantive Motivation for the PDE

Pearl argues that the PDE and the associated quantity E[Y(x ¼ 1, Z(x ¼ 0))]
are often causal contrasts of substantive and public-health importance by
offering examples along the following lines. Suppose a new process can
completely remove the nicotine from tobacco, allowing the production of a
nicotine-free cigarette to begin next year. The substantive goal is to use
already collected data on smoking status X, hypertensive status Z and MI
status Y from a randomized smoking-cessation trial to estimate the incidence
of MI in smokers were all smokers to change to nicotine-free cigarettes.
Suppose it is (somehow?) known that the entire effect of nicotine on MI is
through its effect on hypertensive status, while the non-nicotine toxins in
cigarettes have no effect on hypertension. Then, under the further assump-
tion that there do not exist unmeasured confounders for the effect of hyper-
tension on MI, the causal DAG in Figure 6.1 can be used to represent
the study. Under these assumptions, the MI incidence in smokers of cigarettes

free of nicotine would be E[Y(x ¼ 1, Z(x ¼ 0))] under all three counterfactual
causal models since the hypertensive status of smokers of nicotine-free cigar-
ettes will equal their hypertensive status under non-exposure to cigarettes.
Pearl then assumes an NPSEM and concludes that E[Y(x ¼ 1, Z(x ¼ 0))]
equals z E½YjX ¼ 1;Z ¼ z f ðzjX ¼ 0Þ, and the latter quantity can be esti-
mated from the already available data.
What is interesting about Pearl’s example is that to argue for the substan-
tive importance of the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))], he
tells a story about the effect of a manipulation — a manipulation that makes
no reference to Z at all. Rather, the manipulation is to intervene to eliminate
the nicotine component of cigarettes.
Indeed, the most direct representation of his story is provided by the
extended DAG in Figure 6.3 with V ¼ (X, N, O, Z, Y) where N is a binary
variable representing nicotine exposure, O is a binary variable representing
exposure to the non-nicotine components of a cigarette, and (X, Z, Y) are as
defined previously. The bolded arrows from X to N and O indicate a deter-
ministic relationship. Specifically in the factual data, with probability one
under f (v), either one smokes normal cigarettes so X ¼ N ¼ O ¼ 1 or one is
a nonsmoker (i.e. ex-smoker) and X ¼ N ¼ O ¼ 0. In this representation the
int
parameter of interest is the mean En¼0;o¼1 ½Y of Y had, contrary to fact, all
int
subjects only been exposed to the non-nicotine components. As En¼0;o¼1 ½Y is
int int
a function of fn¼0;o¼1 ðvÞ, we conclude that En¼0;o¼1 ½Y is a manipulable causal
effect relative to the DAG in Figure 6.3. Further, Pearl’s story gives no reason
to believe that there is any confounding for estimating this effect. In
Appendix B we present a scenario that differs from Pearl’s in which
int
En¼0;o¼1 ½Y is confounded, and thus, none of the four causal models asso-
ciated with Figure 6.3 can be true (even though the FFRCISTG and agnostic
causal models associated with Figure 6.1 are true).
In contrast, under Pearl’s scenario it is reasonable to take any of the four
causal models, including the agnostic model, associated with Figure 6.3
X N Z
Figure 6.3 An elaboration of the DAG in Figure 6.1; N and O are, respectively, the nicotine
and non-nicotine components of tobacco; thicker edges indicate deterministic relations.
int
as true. Under such a supposition En¼0;o¼1 ½Y is identified if En¼0;o¼1 ½Y is a
well-defined function of f (v). Note, under f (v), data on (X, Z, Y) are equivalent
to data on V ¼ (X, N, O, Z, Y), since X completely determines O and N in the
factual data. We now show that, with Figure 6.3 as the causal DAG and
int
V ¼ (X, N, O, Z, Y), under all four causal models, En¼0;o¼1 ½Y is identified
simply by applying the g-formula density in standard fashion. This result
may seem surprising at first since no subject in the actual study data fol-
lowed the regime (n ¼ 0, o ¼ 1), so the standard positivity assumption
P[N ¼ 0, O ¼ 1] > 0 usually needed to make the g-formula density fn¼0,o¼1(v)
a function of f (v) (and thus identifiable) fails.
However, as we now demonstrate, even without positivity, the conditional
independences implied by the assumptions of no direct effect of N on Y and
no effect of O on Z encoded in the missing arrows from N to Y and O to Z in
Figure 6.3 along with the deterministic relationship between O, N, and X
under f (v) allow one to obtain identification. Specifically, under the DAG in
Figure 6.3,
fn¼0;o¼1 ðy; zÞ ¼ f ðy j O ¼ 1; zÞ f ðz j N ¼ 0Þ
¼ f ðy j O ¼ 1; N ¼ 1; zÞ f ðz j N ¼ 0; O ¼ 0Þ
¼ f ðy j X ¼ 1; zÞ f ðz j X ¼ 0Þ;
where the first equality is by definition of the g-formula density fn¼0,o¼1(y, z),
the second by the conditional independence relations encoded in the DAG in
Figure 6.3, and the last by the deterministic relationships between O, N, and
X under f (v) with V ¼ (X, N, O, Z, Y). Thus
X
En¼0;o¼1 ½Y y fn¼0;o¼1 ðy; zÞ
y;z
X
¼ y f ðy j X ¼ 1; zÞf ðz j X ¼ 0Þ
y;z
X
E ½Y j X ¼ 1; Z ¼ z f ðz j X ¼ 0Þ;
z
which is a function of f (v) with V ¼ (X, N, O, Z, Y). Note that this argument
goes through even if Z and/or Y are non-binary, continuous variables.
The Role of the Extended Causal Model in Figure 6.3
The identifying formula under all four causal models associated with the
DAG in Figure 6.3 is the identifying formula Pearl obtained when represent-
ing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under the NPSEM
associated with the DAG in Figure 6.1.
For Pearl, having at the outset assumed an NPSEM associated with the
DAG in Figure 6.1, the story did not contribute to identification; rather, it
served only to show that the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))]
of the NPSEM associated with the DAG in Figure 6.1 could, under the
scenario of our story, encode a substantively important parameter — the
manipulable causal effect of setting N to 0 and O to 1 on the extended
causal model associated with the DAG in Figure 6.3. However, from the
refutationist point of view, it is the story itself that make’s Pearl’s claim
that E½Yðx ¼ 1;Zðx ¼ 0ÞÞ ¼z E½YjX ¼ 1;z f ðzjO ¼ 1Þ refutable and, thus,
scientifically meaningful. Specifically, when nicotine-free cigarettes become
available, Pearl’s claim can be tested by an intervention that forces a
random sample of the population to smoke nicotine-free cigarettes.
For someone willing to entertain only an agnostic causal model, the infor-
mation necessary to identify the effect of nicotine-free cigarettes was con-
tained in the story as the parameter E[Y(x ¼ 1, Z(x ¼ 0))] is undefined without
the story. [Someone, such as Dawid (2000), opposed to counterfactuals and
thus wedded to the agnostic causal model, might then reasonably and appro-
int int int int
priately choose to define En¼0;o¼1 ½Y En¼0;o¼0 ½Y ¼ En¼0;o¼1 ½Y Ex¼0 ½Y to
be the natural or pure direct effect of X not through Z. This definition differs
from, and in our view is preferable to, the definition of DDG (2006) discussed
previously: The definition of DDG fails to correspond to the concept of the
PDE as used in the literature since its introduction in Robins and Greenland
(1992).]
For an analyst who had assumed that the MCM, but not necessarily the
NPSEM, associated with the DAG in Figure 6.1 was true, the information
contained in the above story licenses the assumption that the MCM asso-
ciated with Figure 6.3 holds. This latter assumption can be used in two
alternative ways, both leading to the same identifying formula. First, it
leads via Lemma 6 to the above g-functional analysis also used by the agnos-
tic model advocate. Second, as we next show, it can be used to prove that
:
(6.11) holds, allowing identification to proceed à la Pearl (2001).
The Role of Determinism
Consider an MCM associated with the DAG in Figure 6.3 with node set
V ¼ (X, N, O, Z, Y). It follows from the fact that X ¼ N ¼ O with probability
(w.p.) 1 that the condition that N(x) ¼ O(x) ¼ x w.p. 1 also holds. However, for
pedagogic purposes, suppose for the moment that the condition
N(x) ¼ O(x) ¼ x w.p. 1 does not hold.
For expositional simplicity we assume all variables are binary
so our model is also an FFRCISTG model. Then, V0 ¼ X, V1(v0) ¼ N(x),
V2 ðv0 ;v1 Þ ¼ V2 ðv1 Þ ¼ OðxÞ;V3 ðv 2 Þ ¼ V3 ðv1 Þ ¼ ZðnÞ; and V4 ðv 3 Þ ¼ V4 ðv2 ;v3 Þ.
By Theorem 1, {Y(o, z), Z(n), O(x), N(x)} are mutually independent. However,
because we are assuming an FFRCISTG model and not an NPSEM, we
cannot conclude that O(x) ? ? N(x*) for x 6¼ x*.
Consider the induced counterfactual models for the variables (X, Z, Y)
obtained from our FFRCISTG model by marginalizing over (N, O). Because
N and O each has only a single child on the graph in Figure 6.3, the counter-
factual model over (X, Z, Y) is the FFRCISTG associated with the complete
graph of Figure 6.1, where the one-step-ahead counterfactuals Z(1)(x), Y(1)
(x, z) associated with Figure 6.1 are obtained from the counterfactuals
{Y(o, z), Z(n), O(x), N(x)} associated with Figure 6.3 by Z(1)(x) ¼ Z(N(x)),
Y(1)(x, z) ¼ Y(O(x), z). Here, we have used the superscript ‘(1)’ to emphasize
the graph with respect to which Z(1)(x) and Y(1)(x, z) are one-step-ahead
counterfactuals. We cannot conclude that Z(1)(0) ¼ Z(N(0)) and Y(1)(1, z) ¼
Y(O(1), z) are independent, even though Z(n) and Y(o, z) are independent
because, as noted above, the FFRCISTG model associated with Figure 6.3
does not imply independence of O(1) and N(0).
Suppose now we re-instate the deterministic constraint that N(x) ¼
O(x) ¼ x w.p. 1. Then, we conclude that O(x) is independent of N(x*),
since both variables are constants. It then follows that Z(1)(0) and
Y(1)(1, z) are independent and, thus, that (6.11) holds and E[Y(1)(1, Z(1)(0))]
is identified.
The Need for Conditioning on Events of Probability Zero
In our argument that, under the deterministic constraint that N(x) ¼ O(x) ¼ x
w.p. 1, the FFRCISTG associated with the DAG in Figure 6.3 implied con-
dition (6.11), the crucial step was the following: By Theorem 1, the indepen-
dences in condition (6.1) that define an FFRCISTG imply that Y(o, z) and
Z(n) are independent for n ¼ 0 and o ¼ 1. In this section, we show that had
we modified (6.1), and thus our definition of an FFRCISTG, by restricting to
conditioning events V m1 ¼ v m1 that have a positive probability under f (v),
then Theorem 1 would not hold for non-positive densities f (v). Specifically, if
f (v) is not positive, the modified version of condition (6.1) does not imply
condition (6.6); furthermore, the set of independences implied by a modified
FFRCISTG associated with a graph G could differ for different orderings of
the variables consistent with the descendant relationships on the graph.
Specifically, we now show that for the modified FFRCISTG associated with
Figure 6.3 and the ordering (X, N, O, Z, Y), we cannot conclude that
Y(x, n, o, z) ¼ Y(o, z) and Z(x, n, o) ¼ Z(n) are independent for n ¼ 0 and o ¼ 1
and, thus, that condition (6.11) holds. However, the modified FFRCISTG with
the alternative ordering (X, N, Z, O, Y) does imply Y(o, z) ? ? Z(n). First, con-
sider the modified FFRCISTG associated with Figure 6.3 and ordering
(X, N, O, Z, Y ) under the deterministic constraint N(x) ¼ O(x) ¼ x w.p. 1. The

unmodified condition (6.1) implies the set of independences
Yðn; o; zÞ ?
? Zðn; oÞ j X ¼ x; NðxÞ ¼ n; OðxÞ ¼ o; for fz; x; n; o 2 f0; 1gg:
The modified condition (6.1) implies only the subset corresponding to {x, z 2
{0, 1}; n ¼ o ¼ x} since the event {N(x) ¼ j, O(x) ¼ 1 j, j 2 {0, 1}} has prob-
ability 0. As a consequence, we can only conclude that Y(n, o, z) ¼
? Z(n) for o ¼ n.
Y(o, z) ?
In contrast, for the modified FFRCISTG associated with Figure 6.3 and
the ordering V ¼ (X, N, Z, O, Y), the deterministic constraint N(x) ¼ O(x) ¼ x
w.p. 1 implies Y(o, z) ?
? Z(n) for n ¼ 0 and o ¼ 1 as follows: By Equation (6.1)
and the fact that Y(x, n, z, o) ¼ Y(o, z) and Z(x, n) ¼ Z(n), we have, without
having to condition on an event of probability 0, that

Yðo; zÞ; ZðnÞ ?? X for z; o; n 2 f0; 1g; ð6:23Þ
Yðo; zÞ ?
? ZðnÞ j X ¼ x; NðxÞ ¼ n for x; z; o 2 f0; 1g and n ¼ x: ð6:24Þ
However, (6.23) implies Y(o, z) ? ? Z(n ¼ x) | X ¼ x for x, z, o 2 { 0, 1} as X ¼ x

is the same event as X ¼ N(x) ¼ x. Thus, Y(o, z) ? ? Z(n) for n, z, o 2 {0, 1}
by (6.23).
The heuristic reason that for the ordering V ¼ (X, N, Z, O, Y) we must
condition on events of probability zero in condition (6.1) in order to prove
(6.11) is that such conditioning is needed to instantiate the assumption that
O has no effect on Z; if we do not allow conditioning on events of probability
zero, the FFRCISTG model with this ordering does not instantiate this
assumption because O and N are equal with probability one and, thus, we
can substitute O for N as the cause Z. Under the ordering V ¼ (X, N, Z, O, Y)
in which O is subsequent to Z, it was not necessary to condition on events of
probability zero in (6.1) to instantiate this assumption as the model precludes
later variables in the ordering from being causes of earlier variables; thus, O
cannot be a cause of Z.
The above example demonstrates that the assumption that Equation (6.1)
holds even when we condition on events of probability zero can place inde-
pendence restrictions on the distribution of the counterfactuals over and
above those implied by the assumption that Equation (6.1) holds when the
conditioning events have positive probability. One might wonder how this
could be so; it is usually thought that different choices for probabilities
conditional on events of probability zero have no distributional implications.
The following simple canonical example that makes no reference to causality
or counterfactuals clarifies how multiple distributional assumptions condi-

tional on events of probability zero can place substantive restrictions on a
distribution.
Example 4 Suppose we have random variables (X, Y, R) where R ¼ 1 w.p. 1.
Suppose we assume both that (i) f (x, y | R ¼ 0) ¼ f (x, y) and (ii) f (x, y | R ¼ 0) ¼
f (x | R ¼ 0) f (y | R ¼ 0). Then we can conclude that f (x, y) ¼ f (x | R ¼ 0)
f (y | R ¼ 0) and, thus, that X and Y are independent since the joint density
f (x, y) factors as a function of x times a function of y. The point is that
although neither assumption (i) nor assumption (ii) alone restricts the joint
distribution of (X, Y); nonetheless, together they impose the restriction that X
and Y are independent.
Inclusion of a Measured Common Cause of Z and Y
A similar elaboration may be given for the causal DAG in Figure 6.2(a). The
extended causal DAG represented by our story would then be the DAG in
Figure 6.4. Under any of our four causal models,
fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðlÞ

¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðlÞ
¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðlÞ:
Hence,
X
En¼0;o¼1 ½Y ¼ E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the pro-
blem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM associated
with the DAG in Figure 6.2(a).
X N Z L
Figure 6.4 The graph from Figure 6.3 with, in addition, a measured common cause (L) of
the intermediate Z and the final response Y.
Summary
We believe in some generality that whenever a particular causal effect is (a)

identified from data on V under an NPSEM associated with a DAG G with
node set V (but is not identified under the associated MCM, FFRCISTG
model, or agnostic causal model) and (b) can expressed as the effect of an
intervention on certain variables (which may not be elements of V ) in an
identifiable sub-population, then that causal effect is also identified under the
agnostic causal DAG model based on a DAG G0 with node set V 0, a superset
of V. To find such an identifying causal DAG model G0, it is generally
necessary to make the variables in V 0 \V deterministic functions of the vari-
ables in V. The above examples based on extended DAGs in Figures 6.3 and
6.4 are cases in point; see Robins, VanderWeele, and Richardson (2007),
Geneletti and Dawid (2007), and Appendix A for such a construction for
the effect of treatment on the treated.
4.2 An Example in which an Interventional Interpretation

of the PDE is more Controversial
The following example shows that the construction of a scientifically plausi-
ble story under which the PDE can be regarded as a manipulable contrast
relative to an expanded graph G0 may be more controversial than our pre-
vious example would suggest. After presenting the example, we briefly dis-
cuss its philosophical implications.
Suppose nicotine X was the only chemical found in cigarettes that had an
effect on MI but that nicotine produced its effects by two different mechan-
isms. First, it increased blood pressure Z by directly interacting with a mem-
brane receptor on blood pressure control cells located in the carotid artery in
the neck. Second, it directly caused atherosclerotic plaque formation and,
thus, an MI by directly interacting with a membrane receptor of the same
type located on the endothelial cells of the coronary arteries of the heart.
Suppose the natural endogenous ligand produced by the body that binds to
these receptors was nicotine itself. Finally, assume that exogenous nicotine
from cigarettes had no causal effect on the levels of endogenous nicotine (say,
because the time-scale under study is too short for homeostatic feedback
mechanisms to kick in) and we had precisely measured levels of endogenous
nicotine L before randomizing to smoking or not smoking (X). Suppose that,
based on this story, an analyst posits that the NPSEM associated with the
graph in Figure 6.2(a) with V ¼ (X, Z, Y, L) is true. As noted in Section 3.3,
under this supposition E[Y(x ¼ 1, Z(x ¼ 0))] is identified via
z;l E½YjX ¼ 1;Z ¼z;L ¼ l f ðzjX ¼ 0;L ¼ lÞf ðlÞ.
Can we express E[Y(x ¼ 1, Z(x ¼ 0))] as an effect of a scientifically plausible

intervention? To do so, we must devise an intervention that (i) blocks the
effect of exogenous nicotine on the receptors in the neck without blocking
the effect of exogenous nicotine on the receptors in the heart but (ii) does not
block the effect of endogenous nicotine on the receptors in either the neck or
heart. To accomplish (i), one could leverage the physical separation of the
heart and the neck to build a ‘‘nano-cage’’ around the blood pressure control
cells in the neck that prevents exogenous nicotine from reaching the recep-
tors on these cells. However, because endogenous and exogenous nicotine are
chemically and physically identical, the cage would also block the effect of
endogenous nicotine on receptors in the neck, in violation of (ii). Thus, a
critic might conclude that E[Y(x ¼ 1, Z(x ¼ 0))] could not be expressed as the
effect of an intervention. If the critic adhered to the slogan ‘‘no causation
without manipulation’’ (i.e., causal contrasts are best thought of in terms of
explicit interventions that, at least in principle, could be performed (Robins &
Greenland, 2000)), he or she would then reject the PDE as a meaningful
causal contrast in this context. In contrast, if the critic believed in the onto-
logical primacy of causation, he or she would take the example as evidence
for their slogan ‘‘causation before manipulation.’’
Alternatively, one can argue that the critic’s conclusion that E[Y(x ¼ 1,
Z(x ¼ 0))] could not be expressed as the effect of an intervention indicates
only a lack of imagination and an intervention satisfying (i) and (ii) may
someday exist. Specifically, someday it may be possible to chemically attach
a side group to the exogenous nicotine in cigarettes in such a way that (a) the
effect of the (exogenous) chemically-modified nicotine and the effect of the
unmodified nicotine on the receptors in the heart and neck are identical,
while (b) allowing the placement of a ‘‘nano-cage’’ in the neck that success-
fully binds the side group attached to the exogenous nicotine, thereby pre-
venting it from reaching the receptors in the neck. In that case, E[Y(x ¼ 1,
Z(x ¼ 0))] equals a manipulable contrast of the extended deterministic causal
DAG of Figure 6.5. In the Figure C ¼ 1 denotes that the ‘‘nano-cage’’
X D Rn Z
L
M
Rh
Y
Figure 6.5 An example in which an interventional interpretation of the PDE is hard to

conceive; thicker edges indicate deterministic relations.
is present. We allow X to take three values, as before X ¼ 0 indicates no

cigarette exposure, X ¼ 1 indicates exposure to cigarettes with unmodified
nicotine, and X ¼ 2 indicates exposure to cigarettes with modified nicotine.
Rn is the fraction of the receptors in the neck that are bound to a nicotine
molecule (exogenous or endogenous) and Rh is the fraction of the receptors
in the heart that are bound to a nicotine molecule. M is a variable that is 1 if
and only if X 6¼ 0; D is a variable that takes the value 1 if and only if either
X ¼ 1 or (X ¼ 2 and C ¼ 0). Then, E[Y(x ¼ 1, Z(x ¼ 0))] is the parameter
int
Ex¼2;c¼1 ½Y corresponding to the intervention described in (a) and (b).
Under all four causal models associated with the graph in Figure 6.5,
X
fx¼2;c¼1 ðy; z; lÞ f ðy j rh ; zÞ f ðrh j m; lÞ f ðz j rn Þ f ðm j x ¼ 2Þ
m;d;rh ;rn
f ðrn j d; lÞ f ðd j c ¼ 1; x ¼ 2Þf ðlÞ

¼ f ðy j M ¼ 1; l; zÞ f ðz j D ¼ 0; lÞf ðlÞ
¼ f ðy j X ¼ 1; z; lÞ f ðz j X ¼ 0; lÞf ðlÞ;
where the first equality uses the fact that D ¼ 0 and M ¼ 1 when x ¼ 2 and
c ¼ 1 and the second uses the fact that, since in the observed data C ¼ 0
w.p. 1, D ¼ 0 if and only if X ¼ 0, and M ¼ 1 if and only if X ¼ 1 (since
X 6¼ 2 w.p. 1). Thus,
X
Ex¼2;c¼1 ½Y ¼ E½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the pro-
blem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM based on the
DAG in Figure 6.2(a).
As noted in the Introduction, the exercise of trying to construct a story to
provide an interventionist interpretation for a non-manipulable causal para-
meter of an NPSEM often helps one devise explicit, and sometimes even
practical, interventions which can then be represented as a manipulable
causal effect relative to an extended deterministic causal DAG model such
as Figure 6.3.
5 Path-Specific Effects
In this section we extend our results to path-specific effects. We begin with a

particular motivating example.
5.1 A Specific Example

Suppose our underlying causal DAG was the causal DAG of Figure 6.2(b) in
which there is an arrow from X to L. We noted above that Pearl proved E[Y(x ¼ 1,
Z(x ¼ 0))] was not identified from data (X, L, Z, Y) on the causal DAG in
Figure 6.2(b) even under the associated NPSEM. There exist exactly three
possible extensions of Pearl’s original story that are consistent with the causal
DAG in Figure 6.2(b), as shown in Figure 6.6: (a) nicotine N causes L but O does
not; (b) O causes L but N does not; (c) both N and O cause L. We consider as
int
before the causal effect En¼0;o¼1 ½Y. Under all four causal models associated with
int
the graph in Figure 6.6(a), En¼0;o¼1 ½Y is identified from factual data on
V ¼ (X, L, Z, Y). Specifically, on the DAG in Figure 6.6(a), we have
fn¼0;o¼1 ðy; z; lÞ
¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j N ¼ 0Þ
¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðl j N ¼ 0; O ¼ 0Þ
¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ;
so
X
int
En¼0;o¼1 ½Y ¼ EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ: ð6:25Þ
l;z
X N Z L X N Z L
O O
(a) Y (b) Y
X N Z L X N Z L
O O
(c) Y (d) Y
Figure 6.6 Elaborations of the graph in Figure 6.2(b), with additional variables as
described in the text; thicker edges indicate deterministic relations.
Similarly, under all four causal models associated with graph in Figure 6.6(b),
int
En¼0;o¼1 ½Y is identified from factual data on V ¼ (X, L, Z, Y): On the DAG in
Figure 6.6(b) we have
fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j O ¼ 1Þ

¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ
so
X
int
En¼0;o¼1 ½Y ¼ EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ: ð6:26Þ
l;z
int
However, En¼0;o¼1 ½Y is not identified from factual data on V ¼ (X, L, Z, Y)
under any of the four causal models associated with graph in Figure 6.6(c).
In this graph fn¼0,o¼1(y, z, l ) ¼ f (y | O ¼ 1, z, l )f (z | N ¼ 0, l ) f (l | O ¼ 1, N ¼ 0).
However, f (l | O ¼ 1, N ¼ 0) is not identified from the factual data since
the event {O ¼ 1, N ¼ 0} has probability 0 under f (v). Note that the
int
identifying formulae for En¼0;o¼1 ½Y for the graphs in Figure 6.6(a) and (b)
are different.
Relation to Counterfactuals Associated with the DAG in Figure 6.2(b)
Let Y(x, l, z), Z(x, l ) and L(x) denote the one-step-ahead counterfactuals
associated with the graph in Figure 6.2(b). Then, it is clear from the assumed
deterministic counterfactual relation N(x) ¼ O(x) ¼ x that the parameter
int
En¼0;o¼1 ½Y ¼ E ½Yðo ¼ 1; Lðn ¼ 0Þ; Zðn ¼ 0; Lðn ¼ 0ÞÞÞ
associated with the graph in Figure 6.6(a) can be written in terms of the
counterfactuals associated with the graph in Figure 6.2(b) as
E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ:
Likewise, we have that the parameter

int
En¼0;o¼1 ½Y ¼ E ½Yðo ¼ 1; Lðo ¼ 1Þ; Zðn ¼ 0; Lðo ¼ 1ÞÞÞ
associated with the graph in Figure 6.6(b) equals
E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ
in terms of the counterfactuals associated with the graph in Figure 6.2(b). In

int
contrast En¼0;o¼1 ½Y associated with the graph in Figure 6.6(c) is not the
mean of any counterfactual defined from Y(x, l, z), Z(x, l ), and L(x) under
the graph in Figure 6.2(b) since L, after intervening to set n ¼ 0, o ¼ 1, is
neither L(x ¼ 1) nor L(x ¼ 0) as both imply a counterfactual for L under which
n ¼ o.
Furthermore, the parameter
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ
associated with the graph in Figure 6.2(b) is not identified under any of the
four causal models associated with any of the three graphs in Figure 6.6(a),
(b) and (c); see Section 3.4.
Thus, in summary, under an MCM or FFRCISTG model associated with
the DAG in Figure 6.6(a), the extension of Pearl’s original story encoded in
that DAG allows the identification of the causal effect E[Y{x ¼ 1, L(x ¼ 0),
Z(x ¼ 0)}] associated with the DAG in Figure 6.2(b). Similarly, under an
MCM or FFRCISTG model associated with the DAG in Figure 6.2(b) the
extension of Pearl’s original story encoded in this graph allows the identifica-
tion of the causal effect E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1))] associated with
the DAG in Figure 6.2(b).
Contrast with the NPSEM for the DAG in Figure 6.2(b)
We now compare these results to those obtained under the assumption that
the NPSEM associated with the DAG in Figure 6.2(b) held. Under this model
Avin et al. (2005) proved, using their theory of path-specific effects, that while
E[Y(x ¼ 1, Z(x ¼ 0))] is unidentified, both
E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ and E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ
ð6:27Þ
are identified (without requiring any additional story) by Equations (6.25) and
(6.26) respectively.
From the perspective of the FFRCISTG models associated with the graphs
in Figure 6.6(a) and (b) if N and O represent, as we have been assuming, the
substantive variables Nicotine and Other components of cigarettes (rather
than merely formal mathematical constructions), these graphs will generally
represent mutually exclusive causal hypotheses. As a consequence, at most
one of the two FFRCISTG models will be true; thus, from this perspective,
only one of the two parameters in (6.27) will be identified.
Simultaneous Identification of both Parameters in (6.27)

by an Expanded Graph
We next describe an alternative scenario associated with the expanded graph

in Figure 6.6(d) whose substantive assumptions imply (i) the FFRCISTG
model associated with Figure 6.6(d) holds and (ii) the two parameters
of (6.27) are manipulable parameters of that FFRCISTG which are identi-
fied by Equations (6.25) and (6.26), respectively. Thus, this alternative
scenario provides a (simultaneous) manipulative interpretation for the non-

manipulative (relative to (X, Z, Y)) parameters (6.27) that are simultaneously
identified by the NPSEM associated with the DAG in Figure 6.2(b).
Suppose it was (somehow?) known that, as encoded in the DAG in Figure
6.6(d), the Nicotine (N0) component of cigarettes was the only (cigarette-
related) direct cause of Z not through L, the Tar (T ) component was the
only (cigarette-related) direct cause of L, the Other components (O0) contained
all the (cigarette-related) direct causes of Y not through Z and L, and there
are no further confounding variables so that the FFRCISTG model associated
with Figure 6.6(d) can be assumed to be true. Then, the parameter
En0 ¼0;t¼0;o0 ¼1 [Y] associated with Figure 6.6(d) equals both the parameter
E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] associated with Figure 6.2(b) and the parameter
En¼0,o¼1[Y] associated with Figure 6.6(a) (where n ¼ 0 is now defined to be the
intervention that sets nicotine n0 ¼ 0 and tar t ¼ 0 while o ¼ 1 is the interven-
tion o0 ¼ 1; N and O are redefined by 1 N ¼ (1 N0) (1 T ) and O ¼ O0).
Furthermore, En0 ¼0;t¼0;o0 ¼1 [Y] is identified by Equation (6.25). Similarly, the
parameter En0 ¼0;t¼1;o0 ¼1 [Y] associated with Figure 6.6(d) is equal to both the
parameter E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] associated with Figure 6.2(b)
and the parameter En¼0,o¼1[Y] associated with Figure 6.6(b) (where n ¼ 0 is
now the intervention that sets nicotine n0 ¼ 0 while o ¼ 1 denotes the inter-
vention that sets tar t ¼ 1 and o0 ¼ 1; N and O are redefined by N ¼ N0 and
O ¼ TO0). Furthermore, the parameter En0 ¼0;t¼1;o0 ¼1 [Y] is identified by
Equation (6.26).
Note that under this alternative scenario, and in contrast to our previous
scenarios, the substantive meanings of the intervention that sets n ¼ 0 and
o ¼ 1 and of the variables N and O for Figure 6.6(a) differ from the sub-
stantive meaning of this intervention and these variables for Figure 6.6(b),
allowing the two parameters En¼0,o¼1[Y] to be identified simultaneously, each
by a different formula, under the single FFRCISTG model associated with
Figure 6.6(d).
Connection to Path-Specific Effects
Avin et al. (2005) refer to E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] as the effect of X ¼ 1
on Y when the paths from X to L and from X to Z are both blocked (inacti-
vated) and to E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] as the effect of X ¼ 1 on
Y when the paths from X to Z are blocked. They refer to
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ
as the effect of X ¼ 1 on Y when both the path from X to Z and (X’s effect on)
the path from L to Z are blocked.
5.2 The General Case

We now generalize the above results. Specifically, given any DAG G, with a
variable X, construct a deterministic extended DAG Gex that differs from G
only in that the only arrows out of X on Gex are deterministic arrows from X
to new variables N and O and the origin of each arrow out of X on G is from
either N or O (but never both) on Gex. Then, with V nX being the set of
variables on G other than X, the marginal g-formula density fn¼0,o¼1(vnx) is
identified from the distribution of the variables V on G whenever f (v) is a
positive distribution by
Y
fn¼0;o¼1 ðvnxÞ ¼ f ðvj j paj Þ
f j:Vj is not a child of X on Gg
Y
f ðvj j paj nx; X ¼ 1Þ
f j:Vj is a child of O on Gex g
Y
f ðvj j paj nx; X ¼ 0Þ:
f j:Vj is a child of N on Gex g
Note that if X has p children on G, there exist 2p different graphs Gex. The
identifying formula for fn¼0,o¼1(vnx) in terms of f (v) depends on the graph
Gex. It follows that, under the assumption that a particular Gex is associated
int
with one of our four causal models, the intervention distribution fn¼0;o¼1 ðvnxÞ
corresponding to that Gex is identified under any of the four associated
models.
We now discuss the relationship with path-specific effects. Avin et al.
(2005) first define, for any counterfactual model associated with G, the
path-specific effect on the density of V nX when various paths on graph G
have been blocked. Avin et al. (2005) further determine which path-specific
densities are identified under the assumption that the NPSEM associated
with G is true and provide the identifying formulae.
The results of Avin et al. (2005) imply that the path-specific effect corre-
sponding to the set of blocked paths on G being the paths from X to the
subset of its children who were the children of N on any given Gex is
identified under the NPSEM assumption for G. Their identifying formula
is precisely our fn¼0,o¼1(vnx) corresponding to this Gex. In fact, our derivation
int
implies that this path-specific effect on G is identified by fn¼0;o¼1 ðvnxÞ for this
Gex under the assumption that any of our four causal models associated with
this Gex holds, even without assuming that the NPSEM associated with the
original graph G is true. Again, under the NPSEM assumption for G, all 2p
int
effects fn¼0;o¼1 ðvnxÞ as Gex varies are identified, each by the formula
fn¼0,o¼1(vnx), specific to the graph Gex.
A substantive scenario under which all 2p effects fn¼0;o¼1int

ðvnxÞ are simul-
taneously identified by the Gex-specific formulae fn¼0,o¼1(vnx) is obtained by
assuming an FFRCISTG model for an expanded graph on which N and O are
0
replaced by a set of parents Xj , j ¼ 1, . . . , p, one for each child of X, X is the
0 0 0
only parent of each Xj , each Xj has a single child, and X ¼Xj w.p. 1 in the
0
actual data. We consider the 2p interventions that set a subset of the Xj to 1
and the remainder to 0. The relationship of this analysis to the analysis based
on the graphs Gex containing N and O mimics the relationship of the ana-
lysis based on Figure 6.6(d) under the alternative scenario of the last subsec-
tion to the analyses based on Figure 6.6(a) and Figure 6.6(b). In these latter
analyses, X had precisely three children: Z, L and Y.
Avin et al. (2005) also show that other path-specific effects are identified
under the NPSEM assumption for G. However, their results imply that
whenever, for a given set of blocked paths, the path-specific density of V nX
is identified from data on V under an NPSEM associated with G, the identi-
fying formula is equal to the g-formula density fn¼0,o¼1(vnx) corresponding to
one of the 2p graphs Gex. Avin et al. (2005) provide an algorithm that can be
used to find the appropriate Gex corresponding to a given set of blocked
paths.
As discussed in Section 3.4 even the path-specific densities that are not
identified under an NPSEM become identified under yet further untestable
counterfactual independence assumptions and/or rank preservation
assumptions.
6 Conclusion
The results presented here, which are summarized in Table 6.1, appear to
present a clear trade-off between the agnostic causal DAG, MCM, and
FFRCISTG model frameworks and that of the NPSEM.
Table 6.1 Relations between causal models and estimands associated with the DAG
shown in Figure 6.1; column ‘D’ indicates if the contrast is defined in the model; ‘I’
whether it is identified.
Causal Model Potential Direct Effects ETT

Outcome
CDE PDE PSDE |X |=2 |X |>2
Indep. Ass.
D I D I D I D I D I
Agnostic DAG None Y Y N N N N N N N N
MCM (6.5) Y Y Y N Y N Y Y Y N
FFRCISTG (6.1) Y Y Y N Y N Y Y Y Y
NPSEM (6.8) Y Y Y Y Y Y Y Y Y Y
In the NPSEM approach the PDE is identified, even though the result
cannot be verified by a randomized experiment without making further
assumptions. In contrast, the PDE is not identified under an agnostic
causal DAG model or under an MCM/FFRCISTG model. Further, in
Appendix A we show that the ETT can be identified under an MCM/
FFRCISTG model even though the ETT cannot be verified by a randomized
experiment without making further assumptions.
Our analysis of Pearl’s motivation for the PDE suggests that these dichoto-
mies may not be as stark as they may at first appear. We have shown that in
certain cases where one is interested in a prima facie non-manipulable causal
parameter then the very fact that it is of interest implies that there also exists
an extended DAG in which the same parameter is manipulable and identifi-
able in all the causal frameworks.
Inevitably, such cases will be interpreted differently by NPSEM ‘skeptics’
and ‘advocates.’ Advocates may argue that if our conjecture holds, then we
can work with NPSEMs and have some reassurance that in important cases
of scientific interest we will have the option to go back to an agnostic causal
DAG. Conversely, skeptics may conclude that if we are correct then this
shows that it is advisable to avoid the NPSEM framework: Agnostic causal
DAGs are fully ‘‘testable’’ (with the usual caveats) and many non-manipul-
able NPSEM parameters that are of interest, but not identifiable within a
non-NPSEM framework, can be identified in an augmented agnostic causal
DAG.
Undoubtedly, this debate is set to run and run . . .
Appendix A: The Effect of Treatment on the Treated:

A Non-Manipulable Parameter
The primary focus of this chapter has been various contrasts assessing
the direct effect of X on Y relative to an intermediate Z. In this appendix
we discuss another non-manipulable parameter, the effect of treatment on
the treated, in order to further clarify the differences among the agnostic,
the MCM and the FFRCISTG models. For our purposes, we shall only
require the simplest possible causal model based on the DAG X ! Y,
obtained by marginalizing over Z in the graph in Figure 6.1. Let Y(0)
denote the counterfactual Y(x) evaluated at x ¼ 0. In a counterfactual
causal model, the average effect of treatment on the treated is defined to be
ETTðxÞ E ½YðxÞ Yð0Þ j X ¼ x Exint ½Y j X ¼ x E0int ½Y j X ¼ x :

Minimal Counterfactual Models (MCMs)
In an MCM associated with DAG X ! Y, E[Y(x) | X ¼ x] ¼ E[Y | X ¼ x], by the

consistency assumption (iii) in Section 1.1. Thus,
ETTðxÞ ¼ E ½Y j X ¼ x E ½Yð0Þ j X ¼ x:
Hence the ETT(x) is identified iff the second term on the right is identified.
First, note that
ETTð0Þ ¼ E ½Y j X ¼ 0 E ½Yð0Þ j X ¼ 0 ¼ 0:
Now, by consistency condition (iii) in Section 1.1 and the MCM assumption,
Equation (6.4), we have
E ½Y j X ¼ 0 ¼ E ½Yð0Þ j X ¼ 0 ¼ E ½Yð0Þ:
By the law of total probability,

X
E ½Yð0Þ ¼ E ½Yð0Þ j X ¼ x PðX ¼ xÞ:
x
Hence, it follows that

X
E ½Y j X ¼ 0PðX 6¼ 0Þ ¼ E ½Yð0Þ j X ¼ x PðX ¼ xÞ: ð6:28Þ
x:x6¼0
In the special case where X is binary, so jX j, the right-hand side of Equation

(6.28) reduces to a single term and, thus, we have E[Y(0) | X ¼ 1] ¼ E[Y | X ¼ 0].
It follows that for binary X, we have
ETTð1Þ ¼ E½Y j X ¼ 1 E½Y j X ¼ 0
under the MCM (and hence any counterfactual causal model).

See Pearl (2010, pp. 396–7) for a similar derivation, though he does not
make explicit that consistency is required.
In contrast, if X is not binary, then the right-hand side of Equation (6.28)
contains more than one unknown so that ETT(x) for x 6¼ 0 is not identified
under the MCM.
However, under an FFRCISTG model, condition (6.1) implies that
E½Yð0ÞjX ¼ x ¼E½YjX ¼ 0;
so ETT(x) is identified in this model, regardless of X’s sample space.

The parameter ETT(1) ¼ E[Y(1) Y(0) | X ¼ 1] is not manipulable, relative to
{X, Y}, even when X is binary, since, without further assumptions, we cannot
experimentally observe Y(0) in subjects with X ¼ 1.
Note that even under the MCM with jX j > 2, the non-manipulable
(relative to {X, Y}) contrast E[Y(0) | X 6¼ 0] E[Y | X 6¼ 0], the effect of receiv-
ing X ¼ 0 on those who did not receive X ¼ 0, is identified since E[Y(0) |
X 6¼ 0] is identified by the left-hand side of Equation (6.28).
We now turn to the agnostic causal model for the DAG X ! Y. Although
Exint ½Y is identified by the g-functional as E[Y | X ¼ x], nonetheless, as
expected for a non-manipulable causal contrast, the effect of treatment on
the treated is not formally defined within the agnostic causal model, without
further assumptions, even for binary X. Of course, the g-functional
(see Definition 5) does define a joint distribution fx(x*, y) for (X, Y ) under
which X takes the value x with probability 1. However, in spite of apparent
notational similarities, the conditional density fx(y | x*) expresses a
different concept from that occurring in the definition of
Exint ½Y j X ¼x E½YðxÞjX ¼ x in the counterfactual theory. The former
relates to the distribution over Y among those individuals who (after the
intervention) have the value X ¼ x*, under an intervention which sets every
unit’s value to x and thus fx(y | x*) ¼ f (y | x) if x* ¼ x and is undefined if
x* 6¼ x ; the latter is based on the distribution of Y under an intervention
fixing X ¼ x among those people who would have had the value X ¼ x* had
we not intervened.
The minimality of the MCM among all counterfactual models that both
satisfy the consistency assumption (iii) in Section 1.1 and identify the inter-
vention distributions f fpint R
ðzÞg can be seen as follows. For binary X, the above
argument for identification of the non-manipulable contrast ETT(1) under an
MCM as the difference E[Y | X ¼ 1] E[Y | X ¼ 0] follows directly, via the laws
of probability, from the consistency assumption (iii) in Section 1.1 and the
minimal independence assumption (6.5) required to identify the intervention
distributions f fpintR
ðzÞg. In contrast, the additional independence assumptions
(6.8) used to identify the PDE under the NPSEM for the DAG in Figure 6.1
or the additional independence assumptions used to identify ETT(1) for non-
binary X under an FFRCISTG model for the DAG X ! Y are not needed to
identify intervention distributions.
Of course, as we have shown, it may be the case that the PDE is identified
as an intervention contrast in an extended causal DAG containing additional
variables; but identification in this extended causal DAG requires additional
assumptions beyond those in the original DAG and hence does not follow
merely from application of the laws of probability.
Similarly, the ETT(1) for the causal DAG X ! Y can be re-interpreted as an
intervention contrast in an extended causal DAG containing additional vari-
ables, regardless of the dimension of X’s state space. Specifically, Robins,
VanderWeele, and Richardson (2007) showed that the ETT(x) parameter is
defined and identified via the extended agnostic causal DAG in Figure 6.7
X* X Y
Figure 6.7 An extended DAG, with a treatment X ¼ X * and response Y, leading to an

interventional interpretation of the effect of treatment on the treated (Robins,
VanderWeele, & Richardson, 2007; Geneletti & Dawid, 2007). The thicker edge indicates
a deterministic relationship.
that adds to the DAG X ! Y a variable X * that is equal to X with probability 1

under f (v) ¼ f (x*, x, y). Then Exint ½Y j X ¼ x is identified by the g-formula as
E[Y | X ¼ x], because X is the only parent of Y. Furthermore Exint ½Y j X ¼ x
has an interpretation as the effect on the mean of Y of setting X to x on those
observed to have X ¼ x* because X ¼ X* with probability 1. Thus, though
ETT(x) is not a manipulable parameter relative to the graph X ! Y, it is
manipulable relative to the variables {X *, X, Y} in the DAG in Figure 6.7.
In the extended graph ETT(x) is identified by the same function of the
observed data as E[Y(x) Y(0) | X ¼ x] in the original FFRCISTG model for
non-binary X or in the original MCM or FFRCISTG model for binary X. The
substantive fact that would license the extended DAG of Figure 6.7 is that a
measurement, denoted by X *, could be taken just before X occurs such that
(i) in the observed data X * predicts X perfectly (i.e. X ¼ X * w.p. 1) but (ii) an
intervention exists that could, in principle, be administered in the small time
interval between the X * measurement and the occurrence of X whose effect
is to set X to a particular value x, say x ¼ 0. As an example, let X * denote the
event that a particular pill has been swallowed, X denote the event that the
pill’s contents enter the blood stream, and the intervention be the adminis-
tration of an emetic that causes immediate regurgitation of the pill, but
otherwise has no effect on the outcome Y; see Robins, VanderWeele, and
Richardson (2007).
A Model that Is an MCM but Not an FFRCISTG
In this section we describe a parametric counterfactual model for the effect

of a ternary treatment X on a binary response Y that is an MCM associated
with the graph in Figure 6.8 but is not an FFRCISTG. Let ¼ (0, 1, 2)
be a (vector-valued) latent variable with three components such that in a
given population
Dirichlet (0, 1, 2) so that 0 + 1 + 2 ¼ 1 w.p. 1. The
joint distribution of the factual and counterfactual data is determined by the
unknown parameters (0, 1, 2). Specifically the treatment X is ternary
with states 0, 1, 2, and P(X ¼ k | ) ¼ k, equivalently
X j
Multinomial ð1; Þ:
Y (x=0)
π 0 π1
Y (x=1)
π2
Y (x=2)
X Y X Y
(a) (b)
Figure 6.8 (a) A simple graph; (b) A graph describing a confounding structure that leads to
a counterfactual model that corresponds to the MCM but not the FFRCISTG associated
with the DAG (a); thicker red edges indicate deterministic relations.
Now suppose that the response Y is binary and that the counterfactual out-
comes Y(x) are as follows:
Yðx ¼ 0Þ j
Bernoulli ð1 =ð1 þ 2 ÞÞ;
Yðx ¼ 1Þ j
Bernoulli ð2 =ð2 þ 3 ÞÞ;
Yðx ¼ 2Þ j
Bernoulli ð0 =ð0 þ 1 ÞÞ:
Thus in this example, conditional on the potential outcome Y(x ¼ k) ‘hap-

pens’ to be a realization of a Bernoulli random variable with probability of
success equal to the probability of receiving treatment X ¼ k + 1 mod 3 given
that treatment X is not k. In what follows we will use [] to indicate that an
expression is evaluated mod 3. Now since (0, 1, 2) follows a Dirichlet
distribution, it follows that
½iþ1 =ð½iþ1 þ ½iþ2 Þ ?

? i
for i ¼ 0, 1, 2. Hence, in this example, for i ¼ 0, 1, 2 we have Y (x ¼ i) ?

? i.
? Y (x ¼ i) | i; hence, the model obeys the MCM indepen-
Further, I (X ¼ i) ?
dence restriction (6.5):
Yðx ¼ iÞ ?
? IðX ¼ iÞ for all i;
but not the FFRCISTG independence restriction (6.1), since
Yðx ¼ iÞ ?
6 ? IðX ¼ jÞ for i 6¼ j:
We note that we have:
PðX ¼ iÞ ¼ Eði Þ ¼ i =ð0 þ 1 þ 2 Þ; ð6:29Þ
j X ¼ i
Dirichlet ði þ 1; ½iþ1 ; ½iþ2 Þ; ð6:30Þ
Yðx ¼ iÞ j X ¼ i
Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ; ð6:31Þ
Y j X ¼ i
Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ: ð6:32Þ
Equation (6.30) follows from standard Bayesian updating (since the Dirichlet
distribution is conjugate to the multinomial). It follows that the vector of
parameters (0, 1, 2) is identified only up to a scale factor since the like-
lihood for the observed variables f (x, y | ) ¼ f (x, y | ) for any > 0, by
Equations (6.29) and (6.32). We note that since E(Y(x)) ¼ E(Y | X ¼ x),
ACEX!Y ðxÞ EðYðxÞÞ EðYð0ÞÞ ¼ EðYjX ¼ xÞ EðYjX ¼ 0Þ and thus is
identified. However since
Yðx ¼ 0Þ j X ¼ 1
Bernoulli ðð1 þ 1Þ=ð1 þ 1 þ 2 ÞÞ;
Yðx ¼ 0Þ j X ¼ 2
Bernoulli ð1 =ð1 þ 2 þ 1ÞÞ;
and the probability of success in these distributions is not invariant

under rescaling of the vector , we conclude that these distributions
are not identified from data on f(x, y). Consequently ETT(x*)
E[Y(x ¼ x*) Y(x ¼ 0) | X ¼ x*] is not identified under our parametric model.
Appendix B: A Data-Generating Process Leading to an

FFRCISTG but not an NPSEM
Robins (2003) stated that it is hard to construct realistic (as opposed to

mathematical) scenarios in which one would accept that the FFRCISTG
model associated with Figure 6.1 held, but the NPSEM did not, and thus
that CDEs are identified but PDEs are not. In this Appendix we describe such
a scenario. We leave it to the reader to judge its realism.
Suppose that a substance U that is endogenously produced by the body
could both (i) decrease blood pressure by reversibly binding to a membrane
receptor on the blood pressure control cells in the carotid artery of the neck
and (ii) directly increase atherosclerosis, and thus MI, by stimulating the
endothelial cells of the coronary arteries of the heart via an interaction
with a particular protein, and that this protein is expressed in endothelial
cells of the coronary arteries only when induced by the chemicals in tobacco
smoke other than nicotine e.g., tar. Further, suppose one mechanism by
which nicotine increased blood pressure Z was by irreversibly binding to
the membrane receptor for U on the blood pressure control cells in the
carotid, the dose of nicotine in a smoker being sufficient to bind every avail-
able receptor. Then, under the assumption that there do not exist further
unmeasured confounders for the effect of hypertension on MI, this scenario
implies that it is reasonable to assume that any of the four causal models
associated with the expanded DAG in Figure 6.9 is true. Here R measures the
degree of binding of protein U to the membrane receptor in blood pressure
control cells. Thus, R is zero in smokers of cigarettes containing nicotine.
E measures the degree of stimulation of the endothelial cells of the carotid
R ≡ (1−N)U E ≡ OU
X N Z
Figure 6.9 An example leading to the FFRCISTG associated with the DAG in Figure 6.1
but not an NPSEM; thicker edges denote deterministic relations.
artery by U. Thus, E is zero except in smokers (regardless of whether the

cigarette contains nicotine).
Before considering whether the NPSEM associated with Figure 6.1 holds,
let us first study the expanded DAG of Figure 6.9. An application of the
g-formula to the DAG in Figure 6.9 shows that the effect of not smoking
int int int int
En¼0;o¼0 ½Y ¼ Ex¼0 ½Y and the effect of smoking En¼1;o¼1 ½Y ¼ Ex¼1 ½Y are
identified by E[Y | X ¼ 0] and E[Y | X ¼ 1] under all four causal models
int
associated with Figure 6.9. However, the effect En¼0;o¼1 ½Y of smoking nico-
tine-free cigarettes is not identified. Specifically,
int
En¼0;o¼1 ½Y
X
¼ E ½Y j O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0Þ f ðuÞ
z;u
X
¼ E ½Y j N ¼ 1; O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0; O ¼ 0Þ f ðuÞ
z;u
X
¼ E ½Y j X ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; X ¼ 0Þ f ðuÞ
z;u
where the first equality used the fact that E is a deterministic function of U
and O and that R is a deterministic function of N and U. The second equality
int
used d-separation and the third, determinism. Thus, En¼0;o¼1 ½Y is not a
function of the density of the observed data on (X, Z, Y ) because u occurs
both in the term E[Y | X ¼ 1, U ¼ u, Z ¼ z] where we have conditioned on
X ¼ 1, and in the term f (z | U ¼ u, X ¼ 0), where we have conditioned on
X ¼ 0. As a consequence, we do not obtain a function of the density of the
observed data when we marginalize over U.
Since under all three counterfactual models associated with the extended
int
DAG of Figure 6.9 En¼0;o¼1 ½Y is equal to the parameter E[Y(x ¼ 1, Z(x ¼ 0))]
of Figure 6.1, we conclude that E[Y(x ¼ 1, Z(x ¼ 0))], and thus, the PDE is not
identified. Hence, the induced counterfactual model for the DAG in
Figure 6.1 cannot be an NPSEM (as that would imply that the PDE would
be identified).
int
Furthermore, En¼0;o¼1 ½Y is a manipulable parameter with respect to the
DAG in Figure 6.3, since this DAG is obtained from marginalizing over U in
int
the graph in Figure 6.9. However, as we showed above, En¼0;o¼1 ½Y is not
identified from the law of the factuals X, Y, Z, N, O, which are the variables in
Figure 6.3. From this we conclude that none of the four causal models
associated with the graph in Figure 6.3 can be true. Note that prima facie
one might have thought that if the agnostic causal DAG in Figure 6.1 is true,
then this would always imply that the agnostic causal DAG in Figure 6.3 is
also true. This example demonstrates that such a conclusion is fallacious.
Similar remarks apply to the MCM and FFRCISTG models.
Additionally, for z ¼ 0, 1, by applying the g-formula to the graph in
int
Figure 6.9, we obtain that the joint effect of smoking and z, En¼1;o¼1;z ½Y,
int
and the joint effect of not smoking and z, En¼0;o¼0;z ½Y, are identified by
E½YjX ¼ 1;Z ¼ z and E½YjX ¼ 0;Z ¼ z, respectively, under all four causal
int int
models for Figure 6.9. Since En¼0;o¼0;z ½Y and En¼1;o¼1;z ½Y are equal to the
int int
parameters Ex¼0;z ½Y and Ex¼1;z ½Y under all four associated causal models
associated with the graph in Figure 6.1 we conclude that CDE(z) is also
identified under all four causal models associated with Figure 6.1.
The results obtained in the last two paragraphs are consistent with the
FFRCISTG model and the MCM associated with the graph in Figure 6.1
holding but not the NPSEM. In what follows we prove such is the case.
Before doing so, we provide a simpler and more intuitive way to under-
stand the above results by displaying in Figure 6.10 the subgraphs of Figure
6.9 corresponding to U, Z, Y when the variables N and O are set to each of
their four possible joint values. We see that only when we set N ¼ 0 and
O ¼ 1 is it the case that U is a common cause of both Z and Y (as setting
N ¼ 0, O ¼ 1 makes R ¼ E ¼ U). Thus, we have
int int
En¼0;o¼0;z ½Y ¼ En¼0;o¼0 ½YjZ ¼ z
¼ E ½YjO ¼ 0; N ¼ 0; Z ¼ z ¼ E ½YjX ¼ 0; Z ¼ z; and
int int
En¼1;o¼1;z ½Y ¼ En¼1;o¼1 ½YjZ ¼ z
¼ E ½YjO ¼ 1; N ¼ 1; Z ¼ z ¼ E ½YjX ¼ 1; Z ¼ z
as O and N are unconfounded and Z is unconfounded when either we

int
set O ¼ 1, N ¼ 1 or we set O ¼ 0, N ¼ 0. However, En¼0;o¼1;z ½Y 6¼
int
En¼0;o¼1 ½Yjz ¼ E½YjN ¼ 0;O ¼ 1;Z ¼ z as the effect of Z on Y is confounded
(a) U (b) U (c) U (d) U
Z Z Z Z
Y Y Y Y
Figure 6.10 An example leading to the FFRCISTG associated with the DAG in Figure 6.1
holding but not the NPSEM: Causal subgraphs on U, Z, Y implied by the graph in Figure
6.9 when we intervene and set (a) N ¼ 0, O ¼ 0; (b) N ¼ 1, O ¼ 0; (c) N ¼ 0, O ¼ 1; (d)
N ¼ 1, O ¼ 1.
int int
when we set N ¼ 0, O ¼ 1. It is because En¼0;o¼1;z ½Y 6¼ En¼0;o¼1 ½Yjz that
int
En¼0;o¼1 ½Y is not identified. If, contrary to Figure 6.9, there was no confound-
ing between Y and Z when N is set to 0 and O is set to 1, then we would have
int int
En¼0;o¼1;z ½Y ¼En¼0;o¼1 ½Yjz. It would then follow that
X
int int int
En¼0;o¼1 ½Y ¼ En¼0;o¼1 ½Yjz fn¼0;o¼1 ½z
z
X
int int
¼ En¼0;o¼1;z ½Y fn¼0;o¼1 ½z
z
X
int int
¼ En¼1;o¼1;z ½Y fn¼0;o¼0 ½z
z
X
¼ E ½YjX ¼ 1; Z ¼ z f ½zjX ¼ 0;
z
where the third equality is from the fact that we suppose N has no direct
effect on Y not through Z and O has no effect on Z.
We conclude by showing that the MCM and FFRCISTG models associated
with Figure 6.1 are true, but the NPSEM is not, if any of the three counter-
factual models associated with Figure 6.9 are true. Specifically, the DAG in
Figure 6.11 represents the DAG of Figure 6.1 with the counterfactuals for
Z(x) and Y(x, z), the variable U of Figure 6.9, and common causes U1 and U2
of the Z(x) and the Y(x, z) added to the graph. Note that U being a common
cause of Z and Y in Figures 6.9 and 6.10 only when we set N ¼ 0 and O ¼ 1
implies that U is only a common cause of Z(0), Y(1, 0), and Y(1, 1) in Figure
6.11. One can check using d-separation that the counterfactual indepen-
dences in Figure 6.11 satisfy those required of an MCM or FFRCISTG
model, but not those of an NPSEM, as Z(0) and Y(1, z) are dependent.
However, Figure 6.11 contains more independences than are required for
the FFRCISTG condition (6.1) applied to the DAG in Figure 6.1. In particu-
lar, in Figure 6.11 Z(1) and Y(0, z) are independent, which implies that
E[Y(0, Z(1))] is identified by z E½YjX ¼ 0;Z ¼ z f ðzjX ¼ 1Þ and, thus, the
the so-called total direct effect E[Y(1, Z(1))] E[Y(0, Z(1))] is also identified.
Figure 6.11 An example leading to an FFRCISTG corresponding to the DAG in Figure 6.1
but not an NPSEM: potential outcome perspective. Counterfactuals for Y are indexed
Y(x, z). U, U1, and U2 indicate hidden confounders. Thicker edges indicate deter-
ministic relations.
Finally, we note that we could easily modify our example to eliminate the
independence of Z(1) and Y(0, z).
Appendix C: Bounds on the PDE under an

FFRCISTG Model
In this Appendix we derive bounds on the PDE
PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Y j X ¼ 0
under the assumption that the MCM or FFRCISTG model corresponding to

the graph in Figure 6.1 holds and all variables are binary. Note

¼ E ½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ
þ E ½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ:
The two quantities E½Yðx ¼ 1;z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0 and E½Yðx ¼ 1;z ¼ 1Þ j

Zðx ¼ 0Þ ¼ 1 are constrained by the law for the observed data via
E½Y j X ¼ 1; Z ¼ 0 ¼ E½Yðx ¼ 1; z ¼ 0Þ

¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ
þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ
¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ
þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ;
E½Y j X ¼ 1; Z ¼ 1 ¼ E½Yðx ¼ 1; z ¼ 1Þ

¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ
þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ
¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ
þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ:
It then follows from the analysis in Richardson and Robins (2010, Section
2.2) that the set of possible values for the pair
ð0 ; 1 Þ ðE½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0; E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1Þ
compatible with the observed joint distribution f ðz;y j xÞ is given by
ð0 ;1 Þ 2 ½l0 ;u0 ½l1 ;u1

where
l0 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 0 1Þ=PðZ ¼ 0 j X ¼ 0Þg;

u0 ¼ minfE½Y j X ¼ 1; Z ¼ 0=PðZ ¼ 0 j X ¼ 0Þ; 1g;
l1 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 1 1Þ=PðZ ¼ 1 j X ¼ 0Þg;

u1 ¼ minfE½Y j X ¼ 1; Z ¼ 1=PðZ ¼ 1 j X ¼ 0Þ; 1g:
Hence, we have the following upper and lower bounds on the PDE:
maxf0; PðZ ¼ 0 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 0 1g þ

maxf0; PðZ ¼ 1 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 1 1g E ½Y j X ¼ 0
PDE
minfPðZ ¼ 0 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 0g þ
minfPðZ ¼ 1 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 1g E ½Y j X ¼ 0:
Kaufman, Kaufman, and MacLehose (2009) obtain bounds on the PDE

under assumption (6.2) but while allowing for confounding between Z and
Y, i.e. not assuming that (6.3) holds, as we do. As we would expect the
bounds that we obtain are strictly contained in those obtained by Kaufman
et al. (2009, see Table 2, row {50}). Note that when P(Z ¼ z | X ¼ 0) ¼ 1, PDE
¼ CDEðzÞ ¼ E ½YjX ¼ 1; Z ¼ z E ½YjX ¼ 0; Z ¼ zÞ; thus, in this case our
upper and lower bounds on the PDE coincide and the parameter is identi-
fied. In contrast, Kaufman et al.’s upper and lower bounds on the PDE
do not coincide when P(Z ¼ z | X ¼ 0) ¼ 1. This follows because, under
their assumptions, CDE(z) is not identified, but PDE ¼ CDE(z) when
P(Z ¼ z | X ¼ 0) ¼ 1.
Appendix D: Interventions Restricted to a Subset: The

FRCISTG Model
To describe the FRCISTG model for V ¼ (V1, . . . , VM), we suppose that each
Vm ¼ (Lm, Am) is actually a composite of variables Lm and Am, one of which
can be the empty set. The causal effects of intervening on any of the Lm
variables is not defined. However, we assume that for any subset R of
A ¼ A M ¼ ðA1 ; . . . ;AM Þ, the counterfactuals Vm(r) are well-defined for any
r 2 R.
Specifically, we assume that the one-step-ahead counterfactuals
Vm ða m1 Þ ¼(Lm ða m1 Þ;Am ða m1 Þ) exist for any setting of a m1 2 A m1 . Note
that it is implicit in this definition that Lk precedes Ak for all k. Next, we
make the consistency assumption that the factual variables Vm and the coun-
terfactual variables Vm(r) are obtained recursively from the Vm ða m1 Þ. We do
not provide a graphical characterization of parents. Rather, we say that the
parents Pam of Vm consist of the smallest subset of A m1 such that, for all
a m1 2 A m1 ,Vm ða m1 Þ ¼ Vm ðpam Þ where pam is the sub-vector of a m1 corre-
sponding to Pam. One can then view the parents Pam of Vm as the direct
causes of Vm relative to the variables prior to Vm on which we can perform
interventions. Finally, an FRCISTG model imposes the following
independences:

Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ?? Am ðam1 Þ j Lm ¼ lm ; Am1 ¼ am1 ;
ð6:33Þ
for all m; aM1 ; lm :
Note that (6.33) can also be written

Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ?? Am ðam1 Þ j Lm ðam1 Þ ¼ lm ; Am1 ¼ am1 ;
for all m; aM1 ; lm ;
where L m ða m1 Þ ¼(Lm ða m1 Þ;Lm1 ða m2 Þ; . . . ;L1 ).

In the absence of inter-unit interference and non-compliance, data from a
sequentially randomized experiment in which at each time m the treatment
Am is randomly assigned, with the assignment probability at m possibly
depending on the past ðL m ;A m1 Þ, will follow an FRCISTG model; see
Robins (1986) for further discussion.
The analogous minimal causal model (MCM) with interventions restricted to
a subset is defined by replacing Am ða m1 Þ by IfAm ða m1 Þ ¼ am g in condition
(6.33).
It follows from (Robins, 1986) that our Extended Lemma 6 continues to
hold when we substitute either ‘FRCISTG model’ or ‘MCM with restricted
interventions’, for ‘MCM’ in the statement of the Lemma, provided we take
R A.
Likewise we may define an agnostic causal model with restricted interventions

to be the causal model that simply assumes that the interventional density of
Z V, denoted by fpint
R
ðzÞ, under treatment regime pR for any R A, is given
by the g-functional density fpR(z) whenever fpR(z) is a well-defined function
of f (v).
In Theorem 1 we proved that the set of defining conditional indepen-
dences in condition (6.1) of an FFRCISTG model can be re-expressed as a
set of unconditional independences between counterfactuals. An analogous
result does not hold for an FRCISTG. However, the following theorem shows
that we can remove past treatment history from the conditioning set in the
defining conditional independences of an FRCISTG model, provided that we
continue to condition on the counterfactuals L m ða m1 Þ.
Theorem 8 An FRCISTG model for V ¼ðV1 ; . . . ;VM Þ;Vm ¼ðLm ;Am Þ implies
that for all m, a M1 ;l m ,

Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ?? Am ðam1 Þ j Lm ðam1 Þ ¼ lm :
Note that the theorem would not be true had we substituted the factual L m
for L m ða m1 Þ.
References
Avin, C., Shpitser, I., & Pearl, J. (2005). Identifiability of path-specific effects. In L. P.
Kaelbling & A. Saffiotti (Eds.), IJCAI-05, Proceedings of the nineteenth international
joint conference on artificial intelligence (pp. 357–363). Denver: Professional Book
Center.
Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American
Statistical Association, 95(450), 407–448.
Didelez, V., Dawid, A., & Geneletti, S. (2006). Direct and indirect effects of sequential
treatments. In R. Dechter & T. S. Richardson (Eds.), UAI-06, Proceedings of the 22nd
annual conference on uncertainty in artificial intelligence (pp. 138–146). Arlington, VA:
AUAI Press.
Frangakis, C. E., & Rubin, D. B. (1999). Addressing complications of intention-to-treat
analysis in the combined presence of all-or-none treatment-noncompliance and sub-
sequent missing outcomes. Biometrika, 86(2), 365–379.
Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics, 58(1), 21–29.
Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the
treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology
and Public Health,.
Gill, R. D., & Robins, J. M. (2001). Causal inference for complex longitudinal data: The
continuous case. Annals of Statistics, 29(6), 1785–1811.
Hafeman, D., & VanderWeele, T. (2010). Alternative assumptions for the identification
of direct and indirect effects. Epidemiology. (Epub ahead of print)
Heckerman, D., & Shachter, R. D. (1995). A definition and graphical representation for
causality. In P. Besnard & S. Hanks (Eds.), UAI-95: Proceedings of the eleventh annual
conference on uncertainty in artificial intelligence (pp. 262–273). San Francisco: Morgan
Kaufmann.
Imai, K., Keele, L., & Yamamoto, T. (2009). Identification, inference, and sensitivity
analysis for causal mediation effects (Tech. Rep.). Princeton University, Department
of Politics.
Kaufman, S., Kaufman, J. S., & MacLehose, R. F. (2009). Analytic bounds on causal
risk differences in directed acyclic graphs involving three observed binary variables.
Journal of Statistical Planning and Inference, 139(10), 3473–3487.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan
Kaufmann.
Pearl, J. (2000). Causality. Cambridge: Cambridge University Press.
Pearl, J. (2001). Direct and indirect effects. In J. S. Breese & D. Koller (Eds.), UAI-01,
Proceedings of the 17th annual conference on uncertainty in artificial intelligence
(pp. 411–42). San Francisco: Morgan Kaufmann.
Pearl, J. (2010). An introduction to causal inference. The International Journal of
Biostatistics, 6(2). (DOI: 10.2202/1557-4679.1203)
Petersen, M., Sinisi, S., & Laan, M. van der. (2006). Estimation of direct causal effects.
Richardson, T. S., & Robins, J. M. (2010). Analysis of the binary instrumental variable
model. In R. Dechter, H. Geffner, & J. Halpern (Eds.), Heuristics, probability and
causality: A tribute to Judea Pearl (pp. 415–444). London: College Publications.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with
sustained exposure periods – applications to control of the healthy worker survivor
Robins, J. M. (1987). Addendum to ‘‘A new approach to causal inference in
mortality studies with sustained exposure periods – applications to control of the
healthy worker survivor effect’’. Computers and Mathematics with Applications, 14,
923–945.
Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct
and indirect effects. In P. Green, N. Hjort, & S. Richardson (Eds.), Highly structured
stochastic systems (pp. 70–81). Oxford: Oxford University Press.
Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and
indirect effects. Epidemiology, 3, 143–155.
Robins, J. M., & Greenland, S. (2000). Comment on ‘‘Causal inference without coun-
terfactuals’’. Journal of the American Statistical Association, 95(450), 431–435.
Robins, J. M., Richardson, T. S., & Spirtes, P. (2009). Identification and inference for
direct effects (Tech. Rep. No. 563). University of Washington, Department of
Statistics.
Robins, J. M., Rotnitzky, A., & Vansteelandt, S. (2007). Discussion of ‘‘Principal stra-
tification designs to estimate input data missing due to death’’ by Frangakis, C.E.,
Rubin D.B., An, M., MacKenzie, E. Biometrics, 63(3), 650–653.
Robins, J. M., VanderWeele, T. J., & Richardson, T. S. (2007). Discussion of ‘‘Causal
effects in the presence of non compliance: a latent variable interpretation’’ by
Forcina, A. Metron, LXIV(3), 288–298.
Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592.
Rubin, D. B. (1998). More powerful randomization-based ‘‘p-values’’ with the p itali-
cized in double-blind trials with non-compliance. Statistics in Medicine, 17, 371–385.
Rubin, D. B. (2004). Direct and indirect causal effects via potential outcomes.
Scandinavian Journal of Statistics, 31(2), 161–170.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction and Search
(No. 81). New York: Springer-Verlag.
VanderWeele, T., & Robins, J. (2007). Directed acyclic graphs, sufficient causes, and the
properties of conditioning on a common effect. American Journal of Epidemiology,
166(9), 1096–1104.
7
General Approaches to Analysis of Course

Applying Growth Mixture Modeling to Randomized
Trials of Depression Medication
bengt muthén, hendricks c. brown, aimee m. hunter,
ian a. cook, and andrew f. leuchter
Introduction
This chapter discusses the assessment of treatment effects in longitudinal

randomized trials using growth mixture modeling (GMM) (Muthén &
Shedden, 1999; Muthén & Muthén, 2000; Muthén et al., 2002; Muthén &
Asparouhov, 2009). GMM is a generalization of conventional repeated mea-
surement mixed-effects (multilevel) modeling. It captures unobserved subject
heterogeneity in trajectories not only by random effects but also by latent
classes corresponding to qualitatively different types of trajectories. It can be
seen as a combination of conventional mixed-effects modeling and cluster
analysis, also allowing prediction of class membership and estimation of each
individual’s most likely class membership. GMM has particularly strong
potential for analyses of randomized trials because it responds to the need
to investigate for whom a treatment is effective by allowing for different
treatment effects in different trajectory classes.
The chapter is motivated by a University of California, Los Angeles study
of depression medication (Leuchter, Cook, Witte, Morgan, & Abrams, 2002).
Data on 94 subjects are drawn from a combination of three studies carried
out with the same design, using three different types of medications: fluox-
etine (n = 14), venlafaxine IR (n = 17), and venlafaxine XR (n = 18). Subjects
were measured at baseline and again after a 1-week placebo lead-in phase. In
the subsequent double-blind phase of the study, the subjects were rando-
mized into medication (n = 49) and placebo (n = 45) groups. After randomi-
zation, subjects were measured at nine occasions: at 48 hours and at weeks
1–8. The current analyses consider the Hamilton Depression Rating Scale.
159
Several predictors of course of the Hamilton scale trajectory are available,

including gender, treatment history, and a baseline measure of central
cordance hypothesized to influence tendency to respond to treatment.
The results of studies of this kind are often characterized in terms of an
end point analysis where the outcome at the end of the study, here at 8
weeks, is considered for the placebo group and for the medication group.
A subject may be classified as a responder by showing a week 8 depression
score below 10 or when dropping below 50% of the initial score. The treat-
ment effect may be assessed by comparing the medication and placebo
groups with respect to the ratio of responders to nonresponders.
As an alternative to end point analysis, conventional repeated measure-
ment mixed-effects (multilevel) modeling can be used. Instead of focusing on
only the last time point, this uses the outcome at all time points, the two
pretreatment occasions and the nine posttreatment occasions. The trajectory
shape over time is of key interest and is estimated by a model that draws on
the information from all time points. The idea of considering trajectory shape
in research on depression medication has been proposed by Quitkin et al.
(1984), although not using a formal statistical growth model.
Rates of response to treatment with antidepressant drugs are estimated to be
50%–60% in typical patient populations. Of particular interest in this chapter is
how to assess treatment effects in the presence of a placebo response. A placebo
response is an improvement in depression ratings seen in the placebo group that
is unrelated to medication. The improvement is often seen as an early steep
drop in depression, often followed by a later upswing. An example is seen in
Figure 7.2. A placebo response confounds the estimation of the true effect
of medication and is an important phenomenon given its high prevalence of
25%–60% (Quitkin, 1984). Because the placebo response is pervasive, the sta-
tistical modeling must take it into account when estimating medication effects.
This can be done by acknowledging the qualitative heterogeneity in trajectory
shapes for responders and nonresponders.
It is important to distinguish among responder and nonresponder trajec-
tory shapes in both the placebo and medication groups. Conventional
repeated measures modeling may lead to distorted assessment of medication
effects when individuals follow several different trajectory shapes. GMM
avoids this problem while maintaining the repeated measures modeling
advantages. The chapter begins by considering GMM with two classes, a
nonresponder class and a responder class. The responder class is defined as
those individuals who respond in the placebo group and who would have
responded to placebo among those in the medication group. Responder class
membership is observed for subjects in the placebo group but is unobserved
in the medication group. Because of randomization, it can be assumed that
this class of subjects is present in both the placebo and medication groups
and in equal numbers. GMM can identify the placebo responder class in the
medication group. Having identified the placebo responder and placebo non-
responder classes in both the placebo and medication groups, medication
effects can more clearly be identified. In one approach, the medication
effect is formulated in terms of an effect of medication on the trajectory
slopes after the treatment phase has begun. This medication effect is allowed
to be different for the nonresponder and responder trajectory classes.
Another approach formulates the medication effect as increasing the prob-
ability of membership in advantageous trajectory classes and decreasing the
probability of membership in disadvantageous trajectory classes.
Growth Mixture Modeling
This section gives a brief description of the GMM in the context of the
current study. A two-piece, random effect GMM is applied to the Hamilton
Depression Rating Scale outcomes at the 11 time points y1–y11. The first
piece refers to the two time points y1 and y2 before randomization, and the
second piece refers to the nine postrandomization time points y3–y11. Given
only two time points, the first piece is by necessity taken as a linear model
with a random intercept, defined at baseline, and a fixed effect slope. An
exploration of each individual’s trajectory suggests a quadratic trajectory
shape for the second piece. The growth model for the second piece is cen-
tered at week 8, defining the random intercept as the systematic variation at
that time point. All random effect means are specified as varying across
latent trajectory classes. The medication effect is captured by a regression
of the linear and quadratic slopes in the second piece on a medication
dummy variable. These medication effects are allowed to vary across the
latent trajectory classes. The model is shown in diagrammatic form at the
top of Figure 7.1.1
The statistical specification is as follows. Consider the depression outcome
yit for individual i, let c denote the latent trajectory class variable, let g denote
random effects, let at denote time, and let 2t denote residuals containing
measurement error and time-specific variation. For the first, prerandomiza-
tion piece, conditional on trajectory class k (k = 1, 2 . . . K),
1. In Figure 7.1 the observed outcomes are shown in boxes and the random effects in circles.
Here, i, s, and q denote intercept, linear slope, and quadratic slope, respectively. In the follow-
ing formulas, these random effects are referred to as g0, g1, and g2. The treatment dummy
variable is denoted x.
ybase ybpo1i y48 y1 y2 y3 y4 y5 y6 y7 y8
i1 s1 i2 s2 q2
ybase ybpo1i y48 y1 y2 y3 y4 y5 y6 y7 y8
i2 s2 q2
Figure 7.1 Two alternative GMM approaches.
pre pre pre pre

Yit ‰ci ¼k ¼ g0i ¼ g1i at þ 2it ; (1)
with 1 = 0 to center at baseline, and random effects

pre
g10i ‰ci ¼k ¼ 10k þ 10i ; (2)
pre
g11i ‰ci ¼k ¼ 11k þ 11i ; (3)
with only two prerandomization time points, the model is simplified to

assume a nonrandom slope, V(11) = 0, for identification purposes. For the
second, postrandomization piece,
yit ‰ci ¼k ¼ g0i þ g1i at þ g2i at2 þ 2it ; (4)
with 11 = 0, defining g0i as the week 8 depression status. The remaining t
values are set according to the distance in timing of measurements. Assume
for simplicity a single drug and denote the medication status for individual i
by the dummy variable xi (x = 0 for the placebo group and x = 1 for the
medication group).2 The random effects are allowed to be influenced by
2. In the application three dummy variables are used to represent the three different medications.
group and a covariate, w, their distributions varying as a function of trajectory

class (k),
g0i ‰ci ¼k ¼ 0k þ 01k xi þ 02k wi þ 0i ; (5)
The residuals i in the first and second pieces have a 4 4 covariance matrix
k, here taken to be constant across classes k. For both pieces the residuals
2it have a T T covariance matrix k, here taken to be constant across
classes. For simplicity, k and k are assumed to not vary across treatment
groups. As seen in equations 5–7, the placebo group (xi = 0) consists of
subjects from the two different trajectory classes that vary in the means of
the growth factors, which in the absence of covariate w are represented by
0k, 1k, and 2k. This gives the average depression development in the
absence of medication. Because of randomization, the placebo and medica-
tion groups are assumed to be statistically equivalent at the first two time
points. This implies that x is assumed to have no effect on g10i or g11i in the
first piece of the development. Medication effects are described in the second
piece by g01k, g11k, and g21k as a change in average growth rate that can be
different for the classes.
This model allows the assessment of medication effects in the presence of
a placebo response. A key parameter is the medication-added mean of the
intercept random effect centered at week 8. This is the g01k parameter of
equation 5. This indicates how much lower or higher the average score is at
week 8 for the medication group relative to the placebo group in the trajec-
tory class considered. In this way, the medication effect is specific to classes
of individuals who would or would not have responded to placebo. The
modeling will be extended to allow for the three drugs of this study to
have different g parameters in equations 5–7.
Class membership can be influenced by baseline covariates as expressed
by a logistic regression (e.g., with two classes),
log½Pðc i ¼ 1jxi Þ=Pðci ¼ 2‰xi Þ ¼ c þ c wi ; (8)
where c = 1 may refer to the nonresponder class and c = 2, the responder

class. It may be noted that this model assumes that medication status does
not influence class membership. Class membership is conceptualized as a
quality characterizing an individual before entering the trial.
A variation of the modeling will focus on postrandomization time points.

Here, an alternative conceptualization of class membership is used. Class
membership is thought of as being influenced by medication so that the
class probabilities are different for the placebo group and the three medica-
tion groups. Here, the medication effect is quantified in terms of differences
across groups in class probabilities. This model is shown in diagrammatic
form at the bottom of Figure 7.1. It is seen that the GMM involves only the
postrandomization outcomes, which is logical given that treatment influences
the latent class variable, which in turn influences the posttreatment out-
comes. In addition to the treatment variable, pretreatment outcomes may
be used as predictors of latent class, as indicated in the figure. The treatment
and pretreatment outcomes may interact in their influence on latent class
membership.
Estimation and Model Choice

The GMM can be fitted into the general latent variable framework of the
Mplus program (Muthén & Muthén, 1998–2008). Estimation is carried out
using maximum likelihood via an expectation-maximization (EM) algorithm.
Missing data under the missing at random (MAR) assumption are allowed
for the outcomes. Given an estimated model, estimated posterior probabilities
for each individual and each class are produced. Individuals can be classified
into the class with the highest probability. The classification quality is sum-
marized in an entropy value with range 0–1, where 1 corresponds to the case
where all individuals have probability 1 for one class and 0 for the others. For
model fitting strategies, see Muthén et al. (2002), Muthén (2004), and
Muthén and Asparouhov (2008). A common approach to decide on the
number of classes is to use the Bayesian information criterion (BIC),
which puts a premium on models with large log-likelihood values and a
small number of parameters. The lower the BIC, the better the model.
Analyses of depression trial data have an extra difficulty due to the typically
small sample sizes. Little is known about the performance of BIC for sam-
ples as small as in the current study. Bootstrapped likelihood ratio testing can
be performed in Mplus (Muthén & Asparouhov, 2008), but the power of such
testing may not be sufficient at these sample sizes. Plots showing the agree-
ment between the class-specific estimated means and the trajectories for
individuals most likely belonging to a class can be useful in visually inspect-
ing models but are of only limited value in choosing between models.
A complication of maximum-likelihood GMM is the presence of local
maxima. These are more prevalent with smaller samples such as the current
ones for the placebo group, the medication group, as well as for the com-
bined sample. To be confident that a global maximum has been found, many
random starting values need to be used and the best log-likelihood value
needs to be replicated several times. In the present analyses, between 500
and 4,000 random starts were used depending on the complexity of the
model.
Growth Mixture Analyses
In this section the depression data are analyzed in three steps using GMM.
First, the placebo group is analyzed alone. Second, the medication group is
analyzed alone. Third, the placebo and medication groups are analyzed jointly
according to the GMM just presented in order to assess the medication effects.
Analysis of the Placebo Group

A two-class GMM analysis of the 45 subjects in the placebo group resulted in
the model-estimated mean curves shown in Figure 7.2. As expected, a
responder class (class 1) shows a postrandomization drop in the depression
score with a low of 7.9 at week 5 and with an upswing to 10.8 at week 8. An
estimated 32% of the subjects belong to the responder class. In contrast, the
nonresponder class has a relatively stable level for weeks 1–8, ending with a
depression score of 15.6 at week 8. The sample standard deviation at week 8
is 7.6. It may be noted that the baseline score is only slightly higher for the
nonresponder class, 22.7 vs. 21.9. The standard deviation at baseline is 3.6.3
The observed trajectories of individuals classified into the two classes are
plotted in Figure 7.3a and b as broken lines, whereas the solid curves show
the model-estimated means. The figure indicates that the estimated mean
curves represent the individual development rather well, although there is a
good amount of individual variation around the mean curves.
It should be noted that the classification of subjects based on the trajectory
shape approach of GMM will not agree with that using end point analysis. As
an example, the nonresponder class of Figure 7.3b shows two subjects with
scores less than 5 at week 8. The individual with the lowest score at week 8,
however, has a trajectory that agrees well with the nonresponder mean curve
for most of the trial, deviating from it only during the last 2 weeks. The week
8 score has a higher standard deviation than at earlier time points, thereby
weighting this time point somewhat less. Also, the data coverage due to
3. The maximum log-likelihood value for the two-class GMM of Figure 7.2 is 1,055.974, which is
replicated across many random starts, with 28 parameters and a BIC value of 2,219. The
classification based on the posterior class probabilities is not clear-cut in that the classification
entropy value is only 0.66.
24
23
22
21
20
19
18
17
16
15
14
13
HamD
12
11
10
9
8
7 Class 1, 32.4%
6
5 Class 2, 67.6%
4
3
2
1
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.2 Two-class GMM for placebo group.
missing observations is considerably lower for weeks 5–7 than other weeks,
reducing the weight of these time points. The individual with the second
lowest score at week 8 deviates from the mean curve for week 5 but has
missing data for weeks 6 and 7. This person is also ambiguously classified in
terms of his or her posterior probability of class membership.
To further explore the data, a three-class GMM was also fitted to the 45
placebo subjects. Figure 7.4a shows the mean curves for this solution. This
solution no longer shows a clear-cut responder class. Class 2 (49%) declines
early, but the mean score does not go below 14. Class 1 (22%) ends with a
mean score of 10.7 but does not show the expected responder trajectory
shape of an early decline.4 A further analysis was made to investigate if
the lack of a clear responder class in the three-class solution is due to the
sample size of n = 45 being too small to support three classes. In this
analysis, the n = 45 placebo group subjects were augmented by the medica-
tion group subjects but using only the two prerandomization time points
from the medication group. Because of randomization, subjects are statisti-
cally equivalent before randomization, so this approach is valid. The first,
prerandomization piece of the GMM has nine parameters, leaving only
25 parameters to be estimated in the second, postrandomization piece by
4. The log-likelihood value for the model in Figure 7.4a is 1,048,403, replicated across several
random starts, with 34 parameters and a BIC value of 2,226. Although the BIC value is slightly
worse than for the two-class solution, the classification is better, as shown by the entropy value
of 0.85.
36
34
32
30
28
26
24
22
20
18
HamD
16
14
12
10
8
6
4
2
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
(a) Time
36
34
32
30
28
26
24
22
20
18
HamD
16
14
12
10
8
6
4
2
0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
(b) Time
Figure 7.3 Individual trajectories for placebo subjects classified into (a) the responder
class and (b) the non-responder class.
the n = 45 placebo subjects alone. Figure 7.4b shows that a responder class
(class 2) is now found, with 21% of the subjects estimated to be in this class.
High (class 3) and low (class 1) nonresponder classes are found, with 18%
and 60% estimated to be in these classes, respectively. Compared to Figure 7.3,
the observed individual trajectories within class are somewhat less hetero-
geneous (trajectories not shown).5
5. The log-likelihood value for the model in Figure 7.4b is 1,270.030, replicated across several
random starts, with 34 parameters and a BIC value of 2,695. The entropy value is 0.62. Because
a different sample size is used, these values are not comparable to the earlier ones.
28
26
24
22
20
18
16
HamD
14
12
10
8
6
4
2
0
week 4
week 5
week 6
week 7
week 8
week 1
week 2
week 3
baseline
lead-in
48 hrs
Time
28
26
24
22
20
18
16
HamD
14
12
10
8
6
4
2
0
baseline
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
lead-in
48 hrs
Time
Figure 7.4 (a) Three-class GMM for placebo group. (b) Three-class GMM for placebo
group and pre-randomization medication group individuals.
Analysis of the Medication Group

Two major types of GMMs were applied to the medication group. The first
type analyzes all time points and either makes no distinction among the
three drugs (fluoxetine, venlafaxine IR, venlafaxine XR) or allows drug differ-
ences for the class-specific random effect means of the second piece of the
GMM. It would not make sense to also let class membership vary as a
function of drug since class membership is conceptualized as a quality char-
acterizing an individual before entering the trial. Class membership influ-
ences prerandomization outcomes, which cannot be influenced by drugs.
To investigate class membership, the second type of GMM analyzes the
nine postrandomization time points both to focus on the period where the
medications have an effect and to let the class membership correspond to
only postrandomization variables. Here, not only are differences across the
three drugs allowed for the random effect means for each of the classes but
the drug type is also allowed to influence class probabilities.
Analysis of All Time Points
A two-class GMM analysis of the 49 subjects in the medication group

resulted in the model-estimated mean curves shown in Figure 7.5. As
expected, one of the classes is a large responder class (class 1, 85%). The
other class (class 2, 15%) improves initially but then worsens.6
A three-class GMM analysis of the 49 subjects in the medication group
resulted in the model-estimated mean curves shown in Figure 7.6. The three
mean curves show the expected responder class (class 3, 68%) and the class
(class 2, 15%) found in the two-class solution showing an initial improve-
ment but later worsening. In addition, a nonresponse class (class 1, 17%)
emerges, which has no medication effect throughout.7
Allowing for drug differences for the class-specific random effect means of
the second piece of the GMM did not give a trustworthy solution in that the
best log-likelihood value was not replicated. This may be due to the fact that
this model has more parameters than subjects (59 vs. 49).
Analysis of Postrandomization Time Points
As a first step, two- and three-class analyses of the nine postrandomization time
points were performed, not allowing for differences across the three drugs. This
gave solutions that were very similar to those of Figures 7.5 and 7.6. The
similarity in mean trajectory shape held up also when allowing for class prob-
abilities to vary as a function of drug. Figure 7.7 shows the estimated mean
curves for this latter model. The estimated class probabilities for the three drugs
show that in the responder class (class 2, 63%) 21% of the subjects are on
fluoxetine, 29% are on venlafaxine IR, and 50% are on venlafaxine XR. For
the nonresponder class that shows an initial improvement and a later wor-
sening (class 3, 19%), 25% are on fluoxetine, 75% are on venlafaxine IR, and
0% are on venlafaxine XR. For the nonresponder class that shows no
improvement at any point (class 1, 19%), 58% are on fluoxetine, 13% are
on venlafaxine IR, and 29% are on venlafaxine XR. Judged across all three
trajectory classes, this suggests that venlafaxine XR has the better outcome,
followed by venlafaxine IR, with fluoxetine last. Note, however, that for these
data subjects were not randomized to the different medications; therefore,
comparisons among medications are confounded by subject differences.8
6. The log-likelihood value for the model in Figure 7.5 is –1,084.635, replicated across many
random starts, with 28 parameters and a BIC value of 2,278. The entropy value is 0.90.
7. The log-likelihood value for the model in Figure 7.6 is –1,077.433, replicated across many
random starts, with 34 parameters and a BIC value of 2,287. The BIC value is worse than
for the two-class solution. The entropy value is 0.85.
8. The log-likelihood value for the model of Figure 7.7 is –873.831, replicated across many
random starts, with 27 parameters and a BIC value of 1,853. The entropy value is 0.79.
26
24
22
20
18
16
HamD
14
12
10
8
Class 1, 84.7%
6
Class 2, 15.3%
4
2
0
baseline
lead-in
48 hrs
1 week
2 weeks
3 weeks
4 weeks
5 weeks
6 weeks
7 weeks
8 weeks
Time
Figure 7.5 Two-class GMM for medication group.
26
24
22
20
18
16
HamD
14
12
10
8
Class 1, 16.9%
6 Class 2, 14.9%
4 Class 3, 68.2%
0
baseline
lead-in
48 hrs
1 week
2 weeks
3 weeks
4 weeks
5 weeks
6 weeks
7 weeks
8 weeks
Time
Figure 7.6 Three-class GMM for medication group.

26
25
24
23
22
21
20
19
18
17
16
15
14
HamD
13
12
11
10
9
8
7 Class 1, 18.6%
6 Class 2, 62.9%
5
4 Class 3, 18.6%
3
2
1
0
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.7 Three-class GMM for medication group post randomization.
As a second step, a three-class model was analyzed by a GMM, where not

only class membership probability was allowed to vary across the three drugs
but also the class-varying random effect means. This analysis showed no
significant drug differences in class membership probabilities. As shown
in Figure 7.8, the classes are essentially of different nature for the three
drugs.9
Analysis of Medication Effects, Taking Placebo

Response Into Account
The separate analyses of the 45 subjects in the placebo group and the
49 subjects in the medication group provide the basis for the joint analysis
of all 94 subjects. Two types of GMMs will be applied. The first is directly
in line with the model shown earlier under Growth Mixture Modeling,
where medication effects are conceptualized as postrandomization changes
in the slope means. The second type uses only the postrandomization
time points and class membership is thought of as being influenced by
9. The log-likelihood value for the model of Figure 7.8 is –859.577, replicated in only a few
random starts, with 45 parameters and a BIC value of 1, 894. The entropy value is 0.81. It
is difficult to choose between the model of Figure 7.7 and the model of Figure 7.8 based on
statistical indices. The Figure 7.7 model has the better BIC value, but the improvement in the
log-likelihood of the Figure 7.8 model is substantial.
(a) 32
30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
32
(b) 30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
32
(c) 30
28
26
24
22
20
HamD
18
16
14
12
10
8
6
4
2
0
week 1
48 hrs
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.8 Three-class GMM for (a) fluoxetine subjects, (b) venlafaxine IR subjects, and
(c) venlafaxine XR subjects.
medication, in line with the Figure 7.7 model. Here, the class probabilities
are different for the placebo group and the three medication groups so that
medication effect is quantified in terms of differences across groups in class
probabilities.
Analysis of All Time Points
For the analysis based on the earlier model (see Growth Mixture Modeling), a
three-class GMM will be used, given that three classes were found to be
interpretable for both the placebo and the medication groups. Figure 7.9
shows the estimated mean curves for the three-class solution for the placebo
group, the fluoxetine group, the venlafaxine IR group, and the venlafaxine XR
group. It is interesting to note that for the placebo group the Figure 7.9a
mean curves are similar in shape to those of Figure 7.4b, although the
responder class (class 3) is now estimated to be 34%. Note that for this
model the class percentages are specified to be the same in the medication
groups as in the placebo group. The estimated mean curves for the three
medication groups shown in Figure 7.9b–d are similar in shape to those of
the medication group analysis shown in Figure 7.8a–c. These agreements
with the separate-group analyses strengthen the plausibility of the modeling.
This model allows the assessment of medication effects in the presence of
a placebo response. A key parameter is the medication-added mean of the
intercept random effect centered at week 8. This is the g01k parameter of
equation 5. For a given trajectory class, this indicates how much lower or
higher the average score is at week 8 for the medication group in question
relative to the placebo group. In this way, the medication effect is specific to
classes of individuals who would or would not have responded to placebo.
The g01k estimates of the Figure 7.9 model are as follows. The fluoxetine
effect for the high nonresponder class 1 at week 8 as estimated by the GMM
is significantly positive (higher depression score than for the placebo group),
7.4, indicating a failure of this medication for this class of subjects. In the
low nonresponder class 2 the fluoxetine effect is small but positive, though
insignificant. In the responder class, the fluoxetine effect is significantly
negative (lower depression score than for the placebo group), –6.3. The ven-
lafaxine IR effect is insignificant for all three classes. The venlafaxine XR
effect is significantly negative, –11.7, for class 1, which after an initial slight
worsening turns into a responder class for venlafaxine XR. For the nonre-
sponder class 2 the venlafaxine XR effect is insignificant, while for the
responder class it is significantly negative, –7.8. In line with the medication
group analysis shown in Figure 7.7, the joint analysis of placebo and medica-
tion subjects indicates that venlafaxine XR has the most desirable outcome
relative to the placebo group. None of the drugs is significantly effective for
the low nonresponder class 2.10
10. The log-likelihood value for the model shown in Figure 7.9 is –2,142.423, replicated across a
few random starts, with 61 parameters and a BIC value of 4,562. The entropy value is 0.76.
32 32
(a) (b)
174
30 30
28 28
26 26
24 24
22 22
20 20
HamD 18 18
HamD
16 16
14 14
12 12
10 10
8 8
placebo, Class 1, 20.5% fluax, Class 1, 20.5%
6 6

4 4
2 2
0 0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time Time
32 32
(c) 30
(d) 30
28 28
26 26
24 24
22 22
20 20
18 18
HamD
HamD
16 16
14 14
12 12
10 10
8 8
6 ven IR, Class 1, 20.5% 6 ven XR, Class 1, 20.5%
0 0
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
baseline
lead-in
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time Time
Figure 7.9 Three-class GMM of both groups: (a) Placebo subjects, (b) fluoxetine subjects, (c) venlafaxine IR subjects, and (d) venlafaxine XR subjects.
Analysis of Postrandomization Time Points
As a final analysis, the placebo and medication groups were analyzed together
for the postrandomization time points. Figure 7.10 displays the estimated
three-class solution, which again shows a responder class, a nonresponder
class which initially improves but then worsens (similar to the placebo response
class found in the placebo group), and a high nonresponder class.11 As a first
step, it is of interest to compare the joint placebo–medication group analysis of
Figure 7.10 to the separate placebo group analysis of Figure 7.4b and the
separate medication group analysis of Figure 7.6.
Comparing the joint analysis in Figure 7.10 to that of the placebo group
analysis of Figure 7.4b indicates the improved outcome when medication
group individuals are added to the analysis. In the placebo group analysis
of Figure 7.4b 78% are in the two highest, clearly nonresponding trajectory
classes, whereas in the joint analysis of Figure 7.10 only 36% are in the
highest, clearly nonresponding class. In this sense, medication seems to
have a positive effect in reducing depression. Furthermore, in the placebo
analysis, 21% are in the placebo-responding class which ultimately worsens,
whereas in the joint analysis 21% are in this type of class and 43% are in a
clearly responding class.
Comparing the joint analysis in Figure 7.10 to that of the medication
group analysis of Figure 7.6 indicates the worsened outcome when placebo
group individuals are added to the analysis. In the medication group analysis
of Figure 7.6 only 17% are in the nonresponding class compared to 36% in
the joint analysis of Figure 7.10. Figure 7.6 shows 15% in the initially
improving but ultimately worsening class compared to 21% in Figure 7.10.
Figure 7.6 shows 68% in the responding class compared to 43% in Figure 7.10.
All three of these comparisons indicate that medication has a positive effect
in reducing depression.
As a second step, it is of interest to study the medication effects for each
medication separately. The joint analysis model allows this because the class
probabilities differ between the placebo group and each of the three medica-
tion groups, as expressed by equation 8. The results are shown in Figure
7.11. For the placebo group, the responder class (class 3) is estimated to be
26%, the initially improving nonresponder class (class 1) to be 22%, and the
high nonresponder class (class 2) to be 52%. In comparison, for the fluox-
etine group the responder class is estimated to be 48% (better than placebo),
the initially improving nonresponder class to be 0% (better than placebo),
and the high nonresponder class to be 52% (same as placebo). For the
11. The log-likelihood value for the model shown in Figure 7.10 is –1,744.999, replicated across
many random starts, with 29 parameters and a BIC value of 3,621. The entropy value is 0.69.
26
25
24
23
22
21
20
19
18
17
16
15
HamD
14
13
12
11
10
9
8
7 Class 1, 21.0%
6
5 Class 2, 35.8%
4
3 Class 3, 43.1%
2
1
0
48 hrs
week 1
week 2
week 3
week 4
week 5
week 6
week 7
week 8
Time
Figure 7.10 Three-class GMM analysis of both groups using post-randomization time
points.
Placebo Group Fluoxetine Group

60 60
52 52
48
50 50
40 40
30 26 30
22
20 20
10 10
0
0 0
R IINR HNR R IINR HNR
Venlafaxine IR Group Venlafaxine XR Group
50 47 100
46 90
45 90
40 80
35 70
30 60
25 50
20 40
15 30
10 7 20 10
5 10 0
0 0
R IINR HNR R IINR HNR
R = Responder Class
IINR = Initially Improving Non-Responder Class
HNR = High Non-Responder Class
Figure 7.11 Medication effects in each of 3 trajectory classes.
venlafaxine IR group, the responder class is estimated to be 46% (better than

placebo), the initially improving nonresponder class t be 47% (worse than
placebo), and the high nonresponder class to be 7% (better than placebo). For
the venlafaxine XR group, the responder class is estimated to be 90% (better
than placebo), the initially improving nonresponder class to be 0% (better
than placebo), and the high nonresponder class to be 10% (better than
placebo).
Conclusions
The growth mixture analysis presented here demonstrates that, unlike con-
ventional repeated measures analysis, it is possible to estimate medication
effects in the presence of placebo effects. The analysis is flexible in that the
medication effect is allowed to differ across trajectory classes. This approach
should therefore have wide applicability in clinical trials. It was shown that
medication effects could be expressed as causal effects. The analysis also
produces a classification of individuals into trajectory classes.
Medication effects were expressed in two alternative ways, as changes in
growth slopes and as changes in class probabilities. Related to the latter
approach, a possible generalization of the model is to include two latent
class variables, one before and one after randomization, and to let the med-
ication influence the postrandomization latent class variable as well as transi-
tions between the two latent class variables. Another generalization is
proposed in Muthén and Brown (2009) considering four classes of subjects:
(1) subjects who would respond to both placebo and medication, (2) subjects
who would respond to placebo but not medication, (3) subjects who would
respond to medication but not placebo, and (4) subjects who would respond
to neither placebo nor medication. Class 3 is of particular interest from a
pharmaceutical point of view.
Prediction of class membership can be incorporated as part of the model
but was not explored here. Such analyses suggest interesting opportunities
for designs of trials. If at baseline an individual is predicted to belong to a
nonresponder class, a different treatment can be chosen.
References
Leuchter, A. F., Cook, I. A., Witte, E. A., Morgan, M., & Abrams, M. (2002). Changes
in brain function of depressed subjects during treatment with placebo. American
Journal of Psychiatry, 159, 122–129.
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related
techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative meth-
odology for the social sciences (pp. 345–368). Newbury Park, CA: Sage Publications.
Muthén, B., & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-
Gaussian random effects. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G.
Molenberghs (Eds.), Longitudinal data analysis (pp. 143–165). Boca Raton, FL:
Chapman & Hall/CRC Press.
Muthén, B. & Brown, H. (2009). Estimating drug effects in the presence of placebo
response: Causal inference using growth mixture modeling. Statistics in Medicine,
28, 3363–3385.
Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002).
General growth mixture modeling for randomized preventive interventions.
Biostatistics, 3, 459–475.
Muthén, B., & Muthén, L. (2000). Integrating person-centered and variable-centered
analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical
and Experimental Research, 24, 882–891.
Muthén, B., & Muthén, L. (1998–2008). Mplus user’s guide (5th ed.) Los Angeles:
Muthén & Muthén.
Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes
using the EM algorithm. Biometrics, 55, 463–469.
Quitkin, F. M., Rabkin, J. G., Ross, D., & Stewart, J. W. (1984). Identification of true
drug response to antidepressants. Use of pattern analysis. Archives of General
8
Statistical Methodology for a SMART

Design in the Development of Adaptive
Treatment Strategies
alena i. oetting, janet a. levy, roger d. weiss,
and susan a. murphy
Introduction
The past two decades have brought new pharmacotherapies as well as beha-
vioral therapies to the field of drug-addiction treatment (Carroll & Onken,
2005; Carroll, 2005; Ling & Smith, 2002; Fiellin, Kleber, Trumble-Hejduk,
McLellan, & Kosten, 2004). Despite this progress, the treatment of addiction
in clinical practice often remains a matter of trial and error. Some reasons for
this difficulty are as follows. First, to date, no one treatment has been found
that works well for most patients; that is, patients are heterogeneous in
response to any specific treatment. Second, as many authors have pointed
out (McLellan, 2002; McLellan, Lewis, O’Brien, & Kleber, 2000), addiction is
often a chronic condition, with symptoms waxing and waning over time.
Third, relapse is common. Therefore, the clinician is faced with, first, finding
a sequence of treatments that works initially to stabilize the patient and, next,
deciding which types of treatments will prevent relapse in the longer
term. To inform this sequential clinical decision making, adaptive treatment
strategies, that is, treatment strategies shaped by individual patient
characteristics or patient responses to prior treatments, have been proposed
(Greenhouse, Stangl, Kupfer, & Prien, 1991; Murphy, 2003, 2005; Murphy,
Lynch, Oslin, McKay, & Tenhave, 2006; Murphy, Oslin, Rush, & Zhu, 2007;
Lavori & Dawson, 2000; Lavori, Dawson, & Rush, 2000; Dawson & Lavori,
2003).
Here is an example of an adaptive treatment strategy for prescription
opioid dependence, modeled with modifications after a trial currently in
progress within the Clinical Trials Network of the National Institute on
Drug Abuse (Weiss, Sharpe, & Ling, 2010).
179
Initial Treatment
4 week treatment
Not abstinent Abstinent
During the initial 4 week treatment During the initial 4 week treatment
Second Treatment: Step Up: Second Treatment: Step Down:
12 week treatment No pharmacotherapy +
Treat untill 16 weeks have elapsed Treat untill 16 weeks have elapsed
from the beginning of initial treatment from the beginning of initial treatment
Figure 8.1. An adaptive treatment strategy for prescription opioid dependence.
Example
First, provide all patients with a 4-week course of buprenorphine/nalox-
one (Bup/Nx) plus medical management (MM) plus individual
drug counseling (IDC) (Fiellin, Pantalon, Schottenfeld, Gordon, &
O’Connor, 1999), culminating in a taper of the Bup/Nx. If at any
time during these 4 weeks the patient meets the criterion for nonre-
sponse,1 a second, longer treatment with Bup/Nx (12 weeks) is pro-
vided, accompanied by MM and cognitive behavior therapy (CBT).
However, if the patient remains abstinent2 from opioid use during
those 4 weeks, that is, responds to initial treatment, provide 12 addi-
tional weeks of relapse prevention therapy (RPT).
A patient whose treatment is consistent with this strategy experiences one

of two sequences of two treatments, depicted in Figure 8.1. The two sequences
are
1. Four-week Bup/Nx treatment plus MM plus IDC, then if the criterion

for nonresponse is met, a subsequent 12-week Bup/Nx treatment plus
MM plus CBT.
2. Four-week Bup/Nx treatment plus MM plus IDC, then if abstinence is

achieved, a subsequent 12 weeks of RPT.
1. Response to initial treatment is abstinence from opioid use during these first 4 weeks.
Nonresponse is defined as any opioid use during these first 4 weeks
2. Abstinence might be operationalized using a criterion based on self-report of opioid use and
urine screens.
8 SMART Design in the Development of Adaptive Treatment Strategies 181
This strategy might be intended to maximize the number of days the

patient remains abstinent (as confirmed by a combination of urine screens
and self-report) over the duration of treatment.
Throughout, we use this hypothetical prescription opioid dependence
example to make the ideas concrete. In the next section, several research
questions useful in guiding the development of an adaptive treatment strat-
egy are discussed. Next, we review the sequential multiple assignment trial
(SMART), which is an experimental design developed to answer these ques-
tions. We present statistical methodology for analyzing data from a particular
SMART design and a comprehensive discussion and evaluation of these
statistical considerations in the fourth and fifth sections. In the final section,
we present a summary and conclusions and a discussion of suggested areas
for future research.
Research Questions to Refine an

Adaptive Treatment Strategy
Continuing with the prescription opioid dependence example, we might ask

if we could begin with a less intensive behavioral therapy (Lavori et al., 2000).
For example, standard MM, which is less burdensome than IDC and focuses
primarily on medication adherence, might be sufficiently effective for a large
majority of patients; that is, we might ask, In the context of the specified
options for further treatment, does the addition of IDC to MM result in a
better long-term outcome than the use of MM as the sole accompanying
behavioral therapy? Alternatively, if we focus on the behavioral therapy
accompanying the second longer 12-week treatment, we might ask, Among
subjects who did not respond to one of the initial treatments, which accom-
panying behavioral therapy is better for the secondary treatment: MM+IDC or
MM+CBT?
On the other hand, instead of focusing on a particular treatment compo-
nent within strategies, we may be interested in comparing entire adaptive
treatment strategies. Consider the strategies in Table 8.1. Suppose we are
interested in comparing two of these treatment strategies. If the strategies
begin with the same initial treatment, then the comparison reduces to a
comparison of the two secondary treatments; in our example, a comparison
of strategy C with strategy D is obtained by comparing MM+IDC with
MM+CBT among nonresponders to MM alone. We also might compare
two strategies with different initial treatments. For example, in some settings,
CBT may be the preferred behavioral therapy to use with longer treatments;
thus, we might ask, if we are going to provide MM+CBT for nonresponders
Table 8.1 Potential Strategies to Consider for the Treatment of Prescription Opioid
Dependence
Initial Treatment Response to Secondary Treatment

Initial Treatment
Strategy A: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+CBT;
if response, provide RPT
4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment +
MM+IDC MM+CBT
4-week Bup/Nx treatment + Abstinent RPT
MM+IDC
Strategy B: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+IDC;
if response, provide RPT
MM+IDC MM + IDC
MM+IDC
Strategy C: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+CBT; if

response, provide RPT
MM MM + CBT
MM
Strategy D: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+IDC; if

response, provide RPT
MM MM + IDC
MM
to the initial treatment and RPT to responders to the initial treatment, Which
is the best initial behavioral treatment: MM+IDC or MM? This is a
comparison of strategies A and C. Alternately, we might wish to identify
which of the four strategies results in the best long-term outcome
(here, the highest number of days abstinent). Note that the behavioral
therapies and pharmacotherapies are illustrative and were selected to
enhance the concreteness of this example; of course, other selections are
possible.
These research questions can be classified into one of four general types,
as summarized in Table 8.2. The SMART experimental design discussed
in the next section is particularly suited to addressing these types of
questions.
A SMART Experimental Design and the Development of

Adaptive Treatment Strategies
Traditional experimental trials typically evaluate a single treatment with no

manipulation or control of preceding or subsequent treatments. In contrast,
the SMART design provides data that can be used both to assess the efficacy
of each treatment within a sequence and to compare the effectiveness of
strategies as a whole. A further rationale for the SMART design can be
found in Murphy et al. (2006, 2007). We focus on SMART designs in
which there are two initial treatment options, then two treatment options
for initial nonresponders (alternately, initial responders) and one treatment
option for initial treatment responders (alternately, initial nonresponders). In
conversations with researchers across the mental-health field, we have found
this design to be of the greatest interest; these designs are similar to those
employed by the Sequenced Treatment Alternatives to Relieve Depression
(STAR*D) (Rush et al., 2003) and the Clinical Antipsychotic Trials of
Intervention Effectiveness (CATIE) (Stroup et al., 2003); additionally, two
SMART trials of this type are currently in the field (D. Oslin, personal com-
munication, 2007; W. Pelham, personal communication, 2006).
Data from this experimental design can be used to address questions from
each type in Table 8.2. Because SMART specifies sequences of treatments, it
allows us to determine the effectiveness of one of the treatment components
in the presence of either preceding or subsequent treatments; that is, it
addresses questions of both types 1 and 2. Also, the use of randomization
supports causal inferences about the relative effectiveness of different treat-
ment strategies, as in questions of types 3 and 4.
Returning to the prescription opioid dependence example, a useful
SMART design is provided in Figure 8.2. Consider a question of the first
type from Table 8.2. An example is, In the context of the specified options for
further treatment, does the addition of IDC to MM result in a better long-
term outcome than the use of MM as the sole accompanying behavioral
therapy? This question is answered by comparing the pooled outcomes of
subgroups 1,2,3 with those of subgroups 4,5,6. This is the main effect of the
initial behavioral treatment. Note that to estimate the main effect of the
initial behavioral treatment, we require outcomes from not only initial non-
responders but also initial responders. Clinically, this makes sense as a par-
ticular initial treatment may lead to a good response but this response may
not be as durable as other initial treatments. Next, consider a question of the
second type, such as, Among those who did not respond to one of the initial
treatments, which is the better subsequent behavioral treatment:
MM+IDC or MM+CBT? This question is addressed by pooling outcome
data from subgroups 1 and 4 and comparing the resulting mean to the
Table 8.2 Four General Types of Research Questions
Question Type of Analysis Research Question

Required to
Answer Question
Two questions that concern components of adaptive treatment strategies
1 Hypothesis test Initial treatment effect: What is the effect of initial
treatment on long-term outcome in the context of
the specified secondary treatments? In other words,
what is the main effect of initial treatment?
2 Hypothesis test Secondary treatment effect: Considering only those
who did (or did not) respond to one of the initial
treatments, what is the best secondary treatment?
In other words, what is the main effect of
secondary treatment for responders
(or nonresponders)?
Two questions that concern whole adaptive treatment strategies

3 Hypothesis test Comparing strategy effects: What is the difference in
the long-term outcome between two treatment
strategies that begin with a different initial
treatment?
4 Estimation Choosing the overall best strategy: Which treatment
strategy produces the best long-term outcome?
Initial treatment:
Randomization
4 wks Bup/Nx 4 wks Bup/Nx
+MM+CBT +MM
Not Abstinent Not Abstinent

Abstinent (R=1) Abstinent (R=1)
Second Second Second Second

treatment treatment treatment treatment
12 wks Bup/Nx 12 wks Bup/Nx Relapse 12 wks Bup/Nx 12 wks Bup/Nx Relapse
+MM+CBT +MM+CBT Prevention +MM+CBT +MM+CBT Prevention
Measure days Measure days Measure days Measure days Measure days Measure days
abstinent over abstinent over abstinent over abstinent over abstinent over abstinent over
wks 1-16 wks 1-16 wks 1-16 wks 1-16 wks 1-16 wks 1-16
Sub-Group 1 Sub-Group 2 Sub-Group 3 Sub-Group 4 Sub-Group 5 Sub-Group 6
Figure 8.2 SMART study design to develop adaptive treatment strategies for prescrip-
tion opioid dependence.
pooled outcome data of subgroups 2 and 5. This is the main effect of the
secondary behavioral treatment among those not abstinent during the initial
4-week treatment.
An example of the third type question would be to test whether strategies
A and C in Table 8.1 result in different outcomes; to form this test, we use
appropriately weighted outcomes from subgroups 1 and 3 to form an average
outcome for strategy A and appropriately weighted outcomes from subgroups
4 and 6 to form an average outcome for strategy C (an alternate example
would concern strategies B and D; see the next section for formulae).
Note that to compare strategies, we require outcomes from both initial
responders as well as initial nonresponders (e.g., subgroup 3 in addition to
subgroup 1 and subgroup 6 in addition to subgroup 4). The fourth type of
question concerns the estimation of the best of the strategies. To choose the
best strategy overall, we follow a similar ‘‘weighting’’ process to form the
average outcome for each of the four strategies (A, B, C, D) and then desig-
nate as the best strategy the one that is associated with the highest average
outcome.
Test Statistics and Sample Size Formulae
In this section, we provide the test statistics and sample size formulae for the
four types of research questions summarized in Table 8.2. We assume that
subjects are randomized equally to the two treatment options at each step.
We use the following notation: A1 is the indicator for initial treatment, R
denotes the response to the initial treatment (response = 1 and nonresponse
= 0), A2 is the treatment indicator for secondary treatment, and Y is the
outcome. In our prescription opioid dependence example, the values for
these variables are as follows: A1 is 1 if the initial treatment uses
MM+IDC and 0 otherwise, A2 is 1 if the secondary treatment for nonrespon-
ders uses MM+CBT and 0 otherwise, and Y is the number of days the subject
remained abstinent over the 16-week study period.
Statistics for Addressing the Different Research Questions

The test statistics for questions 1–3 of Table 8.2 are presented in Table 8.3;
the method for addressing question 4 is also given in Table 8.3. The test
statistics for questions 1 and 2 are the standard test statistics for a two-group
comparison with large samples (Hoel, 1984) and are not unique to the
SMART design. The estimator of a strategy mean, used for both questions
3 and 4, as well as the test statistic for question 3 are given in Murphy (2005).
In large samples, the three test statistics corresponding to questions 1–3 are
Table 8.3 Test Statistics for Each of the Possible Questions
Type of Test Statistic

Question
1a
ðY A1 ¼ 1 Y A1 ¼ 0 Þ
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S2A1 ¼ 1 S2A1 ¼ 0
NA1 ¼ 1 þ NA1 ¼ 0
where NA1=i denotes the number of subjects who received i as the initial
treatment
2a
ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ
Z ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S2R¼0; A2¼1 S2R¼0; A2¼0
NR¼0; A2¼1 þ NR¼0; A2¼0
where NR=0, A2=i denotes the number of nonresponders who received i

as the secondary treatment
3b pffiffiffiffi
N ð
^ A1¼1; A2¼a2
^ A1¼0; A2¼b2 Þ
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ 2A1¼1; A2¼a2 þ ^ 2A1¼0; A2¼b2
where N is the total number of subjects and a2 and b2 are the secondary
treatments in the two prespecified strategies being compared
4 Choose largest of
^ A1¼1; A2¼1 ;
^ A1¼0; A2¼1 ;
^ A1¼1; A2¼0 ;
^ A1¼0; A2¼0
a
The subscripts on Y and S2 denote groups of subjects. For example YR¼0;A2¼1 is the average
outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. S2R¼0;A2¼1 is the
sample variance of the outcome for subjects who do not respond initially (R = 0) and are assigned
A2 = 1. Similarly, the subscript on N denotes the group of subjects.
b

^ is an estimator of the mean outcome and^ 2 is the associated variance estimator for a
^ and ^ 2 are in
particular strategy. Here, the subscript denotes the strategy. The formulae for
Table 8.4.
normally distributed (with mean zero under the null hypothesis of no effect).
In Tables 8.3, 8.4, and 8.5, specific values of Ai are denoted by ai and bi,
where i indicates the initial treatment (i = 1) or secondary treatment (i = 2);
these specific values are either 1 or 0.
Sample Size Calculations

In the following, all sample size formulae assume a two-tailed z-test. Let be
the desired size of the hypothesis test, let 1 – be the power of the test, and
let z=2 be the standard normal (1 – /2) percentile. Approximate normality of
the test statistic is assumed throughout.
Table 8.4 Estimators for Strategy Means and for Variance of Estimator of Strategy
Means
Strategy Estimator for Strategy Mean: N*Estimator for Variance of

Sequence Estimator of Strategy Mean:
X
N
(a1, a2) Wi ða1 ; a2 ÞYi
1X N
^ A1¼a1; A2¼a2 ¼

i¼1 ^ 2A1¼a1; A2¼a2 ¼ Wi ða1 ; a2 Þ2
XN N i¼1
Wi ða1 ; a2 Þi ðYi
^ A1¼a1; A2¼a2 Þ2
i¼1

A1i A2i
(1, 1) Wi ð1; 1Þ ¼ ð1 Ri Þ þ Ri
:5 :5

A1i ð1 A2i Þ
(1, 0) Wi ð1; 0Þ ¼ ð1 Ri Þ þ Ri
:5 :5

ð1 A1i Þ A2i
(0, 1) Wi ð0; 1Þ ¼ ð1 Ri Þ þ Ri
:5 :5

ð1 A1i Þ ð1 A2i Þ
(0, 0) Wi ð0; 0Þ ¼ ð1 Ri Þ þ Ri
:5 :5
Data for subject i are of the form (A1i, Ri, A2i, Yi), where A1i, Ri, A2i, and Yi are defined as in the
section Test Statistics and Sample Size Formulae and N is the total sample size.
In order to calculate the sample size, one must also input the desired
detectable standardized effect size. We denote the standardized effect size by
and use the definition found in Cohen (1988). The standardized effect sizes
for the various research questions we are considering are summarized in
Table 8.5.
The sample size formulae for questions 1 and 2 are standard formulae
(Jennison & Turnbull, 2000) and assume an equal number in each of the two
groups being compared. Given desired levels of size, power, and standardized
effect size, the total sample size required for question 1 is
N1 ¼ 2 2 ðz=2 þ z Þ2 ð1=Þ2
The sample size formula for question 2 requires the user to postulate the
initial response rate, which is used to provide the number of subjects who
will be randomized to secondary treatments. The sample size formula uses
the working assumption that the initial response rates are equal; that is,
subjects respond to initial treatment at the same rate regardless of the parti-
cular initial treatment, p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0]. This working
assumption is used only to size the SMART and is not used to analyze the
Table 8.5 Standardized Effect Sizes for Addressing the Four Questions in Table 8.2
Research Formula for Standardized Effect Size

Question
E½Y j A1 ¼ 1 E½Y j A1 ¼ 0
1 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j A1 ¼ 1 þ Var½Y j A1 ¼ 0
2
E½Y j R ¼ 0; A2 ¼ 1 E½Y j R ¼ 0; A2 ¼ 0
2 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j R ¼ 0; A2 ¼ 0 þ Var½Y j R ¼ 0; A2 ¼ 0
2
E½Y j A1 ¼ 1; A2 ¼ a2 E½Y j A1 ¼ 0; A2 ¼ b2
3 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j A1 ¼ 1; A2 ¼ a2 þ Var½Y j A1 ¼ 0; A2 ¼ b2
2
where a2 and b2 are the secondary treatment assignments of A2

E½Y j A1 ¼ a1 ; A2 ¼ a2 E½Y j A1 ¼ b1 ; A2 ¼ b2
4 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var½Y j A1 ¼ a1 ; A2 ¼ a2 þ Var½Y j A1 ¼ b1 ; A2 ¼ b2
2
where (a1, a2) = strategy with the highest mean outcome,

(b1, b2) = strategy with the next highest mean outcome;
ai and bi indicate specific values of Ai, i = 1,2
data from it, as can be seen from Table 8.3. The formula for the total
required sample size for question 2 is
N2 ¼ 2 2 ðz=2 þ z Þ2 ð1=Þ2 =ð1 pÞ
When calculating the sample sizes to test question 3, two different sample
size formulae can be used: one that inputs the postulated initial response rate
and one that does not. The formula that uses a guess of the initial response
rate makes two working assumptions. First, the response rates are equal for
both initial treatments (denoted by p), and second, the variability of the out-
come Y around the strategy mean (A1 = 1, A2 = a2), among either initial
responders or nonresponders, is less than the variance of the strategy mean
and similarly for strategy (A1 = 0, A2 = b2). This formula is
N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ 1 pÞÞ ð1=Þ2
The second formula does not require either of these two working assump-
tions; it specifies the sample size required if the response rates are both 0, a
‘‘worst-case scenario.’’ This conservative sample size formula for addressing
question 3 is
N3b ¼ 2 ðz=2 þ z Þ2 4 ð1=Þ2

We will compare the performance of these two sample size formulae for
addressing question 3 in the next section. See the Appendix for a derivation
of these formulae.
The method for finding the sample size for question 4 relies on an algo-
rithm rather than a formula; we will refer to the resulting sample size as N4.
Since question 4 is not a hypothesis test, instead of specifying power to detect
a difference in two means, the sample size is based on the desired probability
to detect the strategy that results in the highest mean outcome. The standar-
dized effect size in this case involves the difference between the two highest
strategy means. This algorithm makes the working assumption that
2 = Var[Y|A1 = a1, A2 = a2] is the same for all strategies. The algorithm uses
an idea similar to the one used to derive the sample size formula for question
3 that is invariant to the response rate. Given a desired level of probability
for selecting the correct treatment strategy with the highest mean and a
desired treatment strategy effect, the algorithm for question 4 finds the
sample sizes that correspond to the range of response probabilities and
then chooses the largest sample size. Since it is based on a worst-case sce-
nario, this algorithm will result in a conservative sample size formula. See
the Appendix for a derivation of this algorithm. The online sample size
calculator for question 4 can be found at http://methodologymedia.
psu.edu/smart/samplesize.
Example sample sizes are given in Table 8.6. Note that as the response
rate decreases, the required sample sizes for question 3 (e.g., comparing two
strategies that have different initial treatments) increases. To see why this
must be the case, consider two extreme cases, the first in which the response
rate is 90% for both initial treatments and the second in which the nonre-
sponse rate is 90%. In the former case, if n subjects are assigned to treatment
1 initially and 90% respond (i.e., 10% do not respond), then the resulting
sample size for strategy (1, 1) is 0.9 * n + ½ * 0.1 * n = 0.95 * n. The ½ occurs
due to the second randomization of nonresponders between the two second-
ary treatments. On the other hand, if only 10% respond (i.e., 90% do not
respond), then the resulting sample size for strategy (1, 1) is 0.1 * n + ½ *
0.9 * n = 0.55 * n, which is less than 0.95 * n. Thus, the lower the expected
response rate, the larger the initial sample size required for a given power
to differentiate between two strategies. This result occurs because the
number of treatment options (two options) for nonresponders is greater
than the number of treatment options for responders (only one).
Consider the prescription opioid dependence example. Suppose we are
particularly interested in investigating whether MM+CBT or MM+IDC is
best for subjects who do not respond to their initial treatment. This is a
question of type 2. Thus, in order to ascertain the sample size for the
SMART design in Figure 8.2, we use formula N2. Suppose we decide to
Table 8.6 Example Sample Sizes: All Entries Are for Total Sample Size
Desired Desired Standardized Initial Research Question

Size(1) Power(2) Effect Size Response
1 2 3 3 4
1– Rate(3)
(varies (invariant
p
by p) to p)
= 0.10
= 0.20
= 0.20
p = 0.5 620 1,240 930 1240 358
p = 0.1 620 689 1,178 1,240 358
= 0.50
p = 0.5 99 198 149 198 59
p = 0.1 99 110 188 198 59
= 0.10
= 0.20
p = 0.5 864 1,728 1,297 1,729 608
p = 0.1 864 960 1,642 1,729 608
= 0.50
p = 0.5 138 277 207 277 97
p = 0.1 138 154 263 277 97
= 0.05
= 0.20
= 0.20
p = 0.5 784 1,568 1,176 1,568 358
p = 0.1 784 871 1,490 1,568 358
= 0.50
p = 0.5 125 251 188 251 59
p = 0.1 125 139 238 251 59
= 0.10
= 0.20
p = 0.5 1,056 2,112 1,584 2,112 608
p = 0.1 1,056 1,174 2,007 2,112 608
= 0.50
p = 0.5 169 338 254 338 97
p = 0.1 169 188 321 338 97
a
All entries assume that each statistical test is two-tailed; the sample size for question 4 does not
vary by since this is not a hypothesis test.
b
In question 4, we choose the sample size so that the probability that the treatment strategy with
the highest mean has the highest estimated mean is 1–.
c
The sample size formulae assume that the response rates to initial treatments are equal:
p = Pr[R=1|A1=1] = Pr[R=1|A1=0].
size the trial to detect a standardized effect size of 0.2 between the two
secondary treatments with the power and size of the (two-tailed) test at
0.80 and 0.05, respectively. After surveying the literature and discussing the
issue with colleagues, suppose we decide that the response rate for the two
initial treatments will be approximately 0.10 (p = 0.10). The number of sub-
jects required for this trial is then N2 ¼ 2 2 ðz=2 þ z Þ2 ð1=Þ2 =ð1 pÞ ¼
4 ðz0:05=2 þ z0:2 Þ2 ð1=0:2Þ2 =0:9 ¼ 871. Furthermore, as secondary objectives,
suppose we are interested in comparing strategy A:—Begin with MM+IDC; if
nonresponse, provide MM+CBT; if response, provide RPT—with D—Begin
with MM; if nonresponse, provide MM+IDC; if response, provide RPT—
(corresponding to a specific example of question 3) and in choosing the
best strategy overall (question 4). Using the same input values for the para-
meters and looking at Table 8.6, we see that the sample size required
for question 3 is about twice as much as that required for question 2.
Thus, unless we are willing and able to double our sample size, we realize
that a comparison of strategies A and D will have low power. However, the
sample size for question 4 is only 358 (using desired probability of 0.80),
so we will be able to answer the secondary objective of choosing the best
strategy with 80% probability.
Suppose that we conduct the trial with 871 subjects. The hypothetical data
3
set and SAS code for calculating the following values can be found at http://
www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/. For question 2, the
value of the z-statistic is
ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ ð5:8619 4:3135Þ

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2:1296;
S2R¼0; A2¼1 S2 109:3975 98:5540
þ R¼0; A2¼0 391 þ 396
NR¼0; A2¼1 NR¼0; A2¼0
which has a two-sided p value of 0.0332. Using the formulae in Table 8.4, we
get the following estimates for the strategy means:
½
^ ð1;1Þ ; ^ ð1;0Þ ;

^ ð0;1Þ ;

^ ð0;0Þ ¼ ½7:1246 4:9994 6:3285 5:6364:
3. We generated this hypothetical data so that the true underlying effect size for question 2 is 0.2,
the true effect size for question 3 is 0.2, and the strategy with the highest mean in truth is
(1, 1), with an effect size of 0.1. Furthermore, the true response rates for the initial treatments
are 0.05 for A1 = 0 and 0.15 for A1 = 1. When we considered 1,000 similar data sets, we found
that the analysis for question 2 led to significant results 78% of the time and the analysis for
question 3 led to significant results 54% of the time. The latter result and the fact that we did
not detect an effect for question 3 in the analysis is unsurprising, considering that we have half
the sample size required to detect an effect size of 0.2. Furthermore, across the 1,000 similar
simulated data sets the best strategy (1, 1) was detected 86% of the time.
The corresponding estimates for the variances of the estimates of the strategy
means are
½ 2ð1;1Þ ; 2ð1;0Þ ; 2ð0;1Þ ; 2ð0;0Þ ¼ ½396:4555 352:8471 456:5727 441:0138:
Using these estimates, we calculate the value of the corresponding z-statistic

for question 3:
pffiffiffiffi pffiffiffiffiffiffiffiffi
Nð
^ A1¼1; A2¼1
^ A1¼0; A2¼0 Þ 871ð7:1246 5:6364Þ
ffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1:5178;
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 2
A1¼1; A2¼1 þ A1¼0; A2¼0 396:4555 þ 441:0138
which has a two-sided p value of 0.1291, which leads us not to reject the null
hypothesis that the two strategies are equal. For question 4, we choose (1, 1)
as the best strategy, which corresponds to the strategy:
1. First, supplement the initial 4-week Bup/Nx treatment with MM+IDC.
2. For those who respond, provide RPT. For those who do not respond,
continue the Bup/Nx treatment for 12 weeks but switch the accompa-
nying behavioral treatment to MM+CBT.
Evaluation of Sample Size Formulae Via Simulation
In this section, the sample size formulae presented in Sample Size

Calculations are evaluated. We examine the robustness of the newly devel-
oped methods for calculating sample sizes for questions 3 and 4. In addition,
a second assessment investigates the power for question 4 to detect the best
strategy when the study is sized for one of the other research questions. The
second assessment is provided because, due to the emphasis on strategies in
SMART designs, question 4 is always likely to be of interest.
Simulation Designs
The sample sizes used for the simulations were chosen to give a power level
of 0.90 and a Type I error of 0.05 when one of questions 1–3 is used to size
the trial and a 0.90 probability of choosing the best strategy for question 4
when it is used to size the trial; these sample sizes are shown in Table 8.6.
For questions 1–3, power is estimated by the proportion of times out of 1,000
simulations that the null hypothesis is correctly rejected; for question 4, the
probability of choosing the best strategy is estimated by the proportion of
times out of 1,000 simulations that the correct strategy with the highest
mean is chosen. We sized the studies to detect a prespecified standardized

effect size of 0.2 or 0.5. We follow Cohen (1988) in labeling 0.2 as a ‘‘small’’
effect size and 0.5 as a ‘‘medium’’ effect size. The simulated data reflect the
types of scenarios found in substance-abuse clinical trials (Gandhi et al.,
2003; Fiellin et al., 2006; Ling et al., 2005). For example, the simulated
data exhibit initial response rates (i.e., the proportion of simulated subjects
with R = 1) of 0.5 and 0.1, and the mean outcome for the responders is
higher than for nonresponders.
For question 3 we need to specify the strategies of interest, and for the
purposes of these simulations we will compare strategies (A1 = 1, A2 = 1) and
(A1 = 0, A2 = 0); these are strategies A and D, respectively, from Table 8.1.
For the simulations to evaluate the robustness of the sample size calculation
for question 4, we choose strategy A to always have the highest mean out-
come and generate the data according to two different ‘‘patterns’’: (1) the
strategy means are all different and (2) the mean outcomes of the other three
strategies besides strategy A are all equal. In the second pattern, it is more
difficult to detect the ‘‘best’’ strategy because the highest mean must be
distinguished from all the rest, which are all the ‘‘next highest,’’ instead of
just one next highest mean.
In order to test the robustness of the sample size formulae, we calculate a
sample size given by the relevant formula in Sample Size Calculations and
then simulate data sets of this sample size. However, the simulated data will
not satisfy the working assumptions in one of the following ways:
• the intermediate response rates to initial treatments are unequal, that

is, Pr[R = 1|A1 = 1] 6¼ Pr[R = 1|A1 = 0]
• the variances relevant to the question are unequal (for question 4 only)
• the distribution of the final outcome, Y, is right-skewed (thus, for a

given sample size, the test statistic is more likely to have a nonnormal
distribution).
We also assess the power of question 4 when it is not used in sizing the trial.
For each of the types of research questions in Table 8.2, we generate a data
set that follows the working assumptions for the sample size formula for that
question (e.g., use N2 to size the study to test the effect of the second
treatment on the mean outcome) and then perform question 4 on the data
and estimate the probability of choosing the correct strategy with the highest
mean outcome.
The descriptions of the simulation designs for each of questions 1–4 as
well as the parameters for all of the different generative models can be found
at http://www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/.
Robustness of the New Sample Size Formulae

As previously mentioned, since the sample size formulae for questions 1 and
2 are standard, we will focus on evaluating the newly developed sample size
formulae for questions 3 and 4. Table 8.7a and b provides the results of the
simulations designed to evaluate the sample size formulae for questions 3
and 4, respectively.
Considering Table 8.7a, we see that the question 3 sample size formula
N3a performed extremely well when the expected standardized effect size was
0.20. Resulting power levels were uniformly near 0.90 regardless of either the
true initial response rates or any of the three violations of the working
assumptions. Power levels were less robust when the sample sizes were
smaller (i.e., for the 0.50 effect size). For example, when the initial response
rates are not equal, the resulting power is lower than 0.90 in the rows using
an assumed response rate of 0.5. The more conservative sample size formula,
N3b, performed well in all scenarios, regardless of response rate or the pre-
sence of any of the three violations to underlying assumptions. As the
response rate approaches 0, the sample sizes are less conservative but the
results for power remain within a 95% confidence interval of 0.90.
In Table 8.7b, the conservatism of the sample size calculation N4 (asso-
ciated with question 4) is apparent. We can see that N4 is less conservative for
the more difficult scenario where the strategy means besides the highest are
all equal, but the probability of correctly identifying the strategy with the
highest mean outcome is still about 0.90.
Table 8.7a Investigation of Sample Size Assumption Violations for Question 3,

Comparing Strategies A and D
Simulation Parameters Simulation Results (Power)

Effect Initial Sample Total Default Unequal Non-Normal
Size Response Size Sample Working Initial Outcome
Rate Formula Size Assumptions Response Y
(Default) Are Correct Rates
0.2 0.5 N3a 1,584 0.893 0.902 0.882
0.2 0.1 N3a 2,007 0.882 0.910 0.877a
0.5 0.5 N3a 254 0.896 0.864a 0.851a
0.5 0.1 N3a 321 0.926a 0.886 0.898
0.2 0.5 N3b 2,112 0.950a 0.958a 0.974a
0.2 0.1 N3b 2,112 0.903 0.934a 0.898
0.5 0.5 N3b 338 0.973a 0.938a 0.916
0.5 0.1 N3b 338 0.937a 0.890 0.922a
The power to reject the null hypothesis for question 3 is shown when sample size is calculated to
reject the null hypothesis for question 3 with power of 0.90 and type I error of 0.05 (two-tailed).
a
The 95% confidence interval for this proportion does not contain 0.90.
Table 8.7b Investigation of Sample Size Violations for Question 4: Probabilitya to

Detect the Correct ‘‘Best’’ Strategy When the Sample Size Is Calculated to Detect the
Correct Maximum Strategy Mean 90% of the Time
Simulation Parameters Simulation Results (Probability)

b
Effect Initial Pattern Sample Default Unequal Unequal Non-Normal
Size Response Sizec Working Initial Variance Outcome Y
Rate Assumptions Response
(Default) Are Correct Rates
0.2 0.5 1 608 0.966d 0.984d 0.965d 0.972d
0.2 0.1 1 608 0.962d 0.969d 0.964d 0.962d
0.5 0.5 1 97 0.980d 0.985d 0.966d 0.956d
0.5 0.1 1 97 0.960d 0.919d 0.976d 0.947d
0.2 0.5 2 608 0.964d 0.953d 0.952d 0.944d
0.2 0.1 2 608 0.905 0.929d 0.922d 0.923d
0.5 0.5 2 97 0.922d 0.974d 0.976d 0.948d
0.5 0.1 2 97 0.893 0.917 0.927d 0.885
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was
selected as the maximum.
b
1 refers to the pattern of strategy means such that all are different but that the mean for (A1 = 1,
A2 = 1), that is, strategy A, is always the highest. 2 refers to the pattern of strategy means such
that the mean for strategy A is higher than the other three and the other three are all equal.
c
Calculated to detect the correct maximum strategy mean 90% of the time when the sample size
assumptions hold.
d
The 95% confidence interval for this proportion does not contain 0.90.
Overall, under different violations of the working assumptions, the sample

size formulae for questions 3 and 4 still performed well in terms of power.
As discussed, we also assess the power for question 4 when the trial was
sized for a different research question. For each of the types of research
questions in Table 8.2, we generate a data set that follows the working
assumptions for the sample size formula for that question, then evaluate
the power of question 4 to detect the optimal strategy. From Table 8.8a–c,
we see that in almost all cases, regardless of the starting assumptions used to
size the various research questions, we achieve a 0.9 probability or higher of
correctly detecting the strategy with the highest mean outcome. The prob-
ability falls below 0.9 when the standardized effect size for question 4 falls
below 0.1. These results are not surprising as from Table 8.6 we see that
question 4 requires much smaller sample sizes than all the other research
questions.
Note that question 4 is more closely linked to question 3 than to question
1 or 2. Question 3 is potentially a subset of question 4; this relationship
occurs when one of the strategies considered in question 3 is the strategy
with the highest mean outcome. The probability of detecting the correct
Table 8.8a The Probabilitya of Choosing the Correct Strategy for Question 4
When Sample Size Is Calculated to Reject the Null Hypothesis for Question 1
(for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05)
Simulation Parameters Simulation Results

Effect Size Initial Sample Question 1 Question 4 Effect Size
for Response Size (Power) (Probabilitya) for
Question 1 Rate Question 4
0.2 0.5 1,056 0.880 1.000 0.325
0.2 0.1 1,056 0.904 1.000 0.425
0.5 0.5 169 0.934 0.987 0.350
0.5 0.1 169 0.920 0.998 0.630
a
Table 8.8b The Probabilitya of Choosing the Correct Strategy for Question 4

Effect Size Initial Sample Question 2 Question 4 Effect Size
for Response Size (Power) (Probabilitya) for
Question 2 Rate Question 4
0.2 0.5 2,112 0.906 0.999 0.133
0.2 0.1 1,174 0.895 0.716 0.054
0.5 0.5 338 0.895 0.997 0.372
0.5 0.1 188 0.901 0.978 0.420
a
strategy mean as the maximum when sizing for question 3 is generally very
good, as can be seen from Table 8.8c. This is due to the fact that the sample
sizes required to test the differences between two strategy means (each
beginning with a different initial treatment) are much larger than those
needed to detect the maximum of four strategy means with a specified
degree of confidence. For a z-test of the difference between two strategy
means with a two-tailed Type I error rate of 0.05, power of 0.90, and stan-
dardized effect size of 0.20, the sample size requirements range 1,584–2,112.
The sample size required for a 0.90 probability of selecting the correct strat-
egy mean as a maximum when the standardized effect size between it and
the next highest strategy mean is 0.2 is 608. It is therefore not surprising that
the selection rates for the correct strategy mean are generally high when
Table 8.8c The Probabilitya of Choosing the Correct Strategy for Question 4

Effect Initial Sample Sample Question 3 Question 4 Effect
Size for Response Size Size (Power) (Probabilitya) Size for
Question 3 Rate Formula Question 4
0.2 0.5 N3a 1,584 0.893 0.939 0.10
0.2 0.1 N3a 2,007 0.882 0.614 0.02
0.5 0.5 N3a 254 0.896 0.976 0.25
0.5 0.1 N3a 321 0.926 0.978 0.32
0.2 0.5 N3b 2,112 0.950 0.953 0.10
0.2 0.1 N3b 2,112 0.903 0.613 0.02
0.5 0.5 N3b 338 0.973 0.989 0.25
0.5 0.1 N3b 338 0.937 0.985 0.32
a
powered to detect differences between strategy means each beginning with a

different initial treatment.
Summary
Overall, the sample size formulae perform well even when the working
assumptions are violated. Additionally, the performance of question 4 is
consistently good when sizing for all other research questions; this is most
likely due to question 4 requiring smaller sample sizes than the other
research questions to achieve good results.
When planning a SMART similar to the one considered here, if one is
primarily concerned with testing differences between prespecified strategy
means, we would recommend using the less conservative formula N3a if
one has confidence in knowledge of the initial response rates. We recom-
mend this in light of the considerable cost savings that can be accrued by
using this approach, in comparison to the more conservative formula N3b.
We comment further on this topic in the Discussion.
Discussion
In this chapter, we demonstrated how a SMART can be used to answer

research questions about both individual components of an adaptive
treatment strategy and the treatment strategies as a whole. We presented

statistical methodology to guide the design and analysis of a SMART. Two
new methods for calculating the sample sizes for a SMART were presented.
The first is for sizing a study when one is interested in testing the difference
in two strategies that have different initial treatments; this formula incorpo-
rates knowledge about initial response rates. The second new sample size
calculation is for sizing a study that has as its goal choosing the strategy that
has the highest final outcome. We evaluated both of these methods and
found that they performed well in simulations that covered a wide range
of plausible scenarios.
Several comments are in order regarding the violations of assumptions
surrounding the values of the initial response rates when investigating
sample size formula N3a for question 3. First, we examined violations of
the assumption of the homogeneity of response rates across initial treatments
such that they differed by 10% (initial response rates differing by more than
10% in addictions clinical trials are rare) and found that the sample size
formula performed well. Future research is needed to examine the question
regarding the extent to which initial response rates can be misspecified when
utilizing this modified sample size formula. Clearly, for gross misspecifica-
tions, the trialist is probably better off with the more conservative sample size
formula. However, the operationalization of ‘‘gross misspecification’’ needs
further research.
In the addictions and in many other areas of mental health, both clinical
practice as well as trials are plagued with subject nonadherence to treatment.
In these cases sophisticated causal inferential methods are often utilized
when trials are ‘‘broken’’ in this manner. An alternative to the post hoc,
statistical approach to dealing with nonadherence is to consider a proactive
experimental design such as SMART. The SMART design provides the means
for considering nonadherence as one dimension of nonresponse to treat-
ment. That is, nonadherence is an indication that the treatment must be
altered in some way (e.g., by adding a component that is designed to improve
motivation to adhere, by switching the treatment). In particular, one might be
interested in varying secondary treatments based on both adherence mea-
sures and measures of continued drug use.
In this chapter we focused on the simple design in which there are two
options for nonresponders and one option for responders. Clearly, these
results hold for the mirror design (one option for nonresponders and two
options for responders). An important step would be to generalize these
results to other designs, such as designs in which there are equal numbers
of options for responders and nonresponders or designs in which there are
three randomizations. In substance abuse, the final outcome variable is often
binary; sample size formulae are needed for this setting as well. Alternately,
the outcome may be time-varying, such as time-varying symptom levels;

again, it is important to generalize the results to this setting.
Appendix
Sample Size Formulae for Question 3

Here, we present the derivation of the sample size formulae N3a and N3b for
question 3 using results from Murphy (2005).
Suppose we have data from a SMART design modeled after the one pre-
sented in Figure 8.2; that is, there are two options for the initial treatment,
followed by two treatment options for nonresponders and one treatment
option for responders. We use the same notation and assumptions listed in
Test Statistics and Sample Size Formulae. Suppose that we are interested in
comparing two strategies that have different initial treatments, strategies
(a1, a2) and (b1, b2). Without loss of generality, let a1 = 1 and b1 = 0.
To derive the formulae N3a and N3b, we will make the following working
assumption: The sample sizes will be large enough so that
^ ða1; a2Þ is approxi-
mately normally distributed.
We use three additional assumptions for formula N3a. The first is that the
response rates for the initial treatments are equal and the second two
assumptions are indicated by * and **.
The marginal variances relevant to the research question are 20 =
Var[Y|A1 = a1, A2 = a2] and 21 = Var[Y|A1 = b1, A2 = b2]. Denote the mean out-
come for strategy (A1, A2) by
ðA1;A2Þ. The null hypothesis we are interested in
testing is
H0 :
ð1;a2Þ
ð1;b2Þ ¼ 0
and the alternative of interest is
H1 :
ð1;a2Þ
ð1;b2Þ ¼
qffiffiffiffiffiffiffiffiffi
ffi
2 þ 2
where ¼ 1 2 0 . (Note that is the standardized effect size.)
As presented in Statistics for Addressing the Different Research
Questions, the test statistic for this hypothesis is
pffiffiffiffi
N
^ ð1; a2Þ
^ ð0; b2Þ
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ 2ð1; a2Þ þ ^ 2ð0; b2Þ
where
^ ða1;a2Þ and ^ 2ða1;a2Þ are as defined in Table 8.5; in large samples, this
test statistic has a standard normal distribution under the null hypothesis
(Murphy, Van Der Laan, Robins, & Conduct Problems Prevention Group,
2001). Recall that N is the total sample size for the trial. To find the
required sample size N for a two-sided test with power 1– and size , we
solve
Pr½Z < z=2 or Z > z=2 j

ð1; a2Þ
ð0; b2Þ ¼ ¼ 1
for N where z=2 is the standard normal (1–z=2 ) percentile. Thus, we have
Pr½Z < z=2 j

ð1;a2Þ
ð0;b2Þ ¼ þ Pr½Z > z=2 j
ð1;a2Þ
ð0;b2Þ ¼ ¼ 1
Without loss of generality, assume that > 0 so that
Pr½Z < z=2 j

ð1; a2Þ
ð0; b2Þ ¼ ¼ 0
and
Pr½Z > z=2 j

ð1; a2Þ
ð0; b2Þ ¼ ¼ 1
pffiffiffiffi
Define 2ða1;a2Þ ¼ Var½ N
^ ða1;a2Þ . Note that
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ 2ð1; a2Þ þ ^ 2ð0; b2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ð1; a2Þ þ 2ð0; b2Þ
is close to 1 in large samples (Murphy, 2005). Now, E½

^ ð0; b2Þ ¼
^ ð1; a2Þ
ð1; a2Þ
ð0; b2Þ , so we have
2 3
pffiffiffiffi
pffiffiffiffi
6 N
^ ð1; a2Þ
^ ð0; b2Þ N 7
Pr6 4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > z=2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

7¼1
5
2ð1; a2Þ þ 2ð0; b2Þ 2 þ 2
ð1; a2Þ ð0; b2Þ
Note the distribution of

pffiffiffiffi
N
^ ð1; a2Þ
^ ð0; b2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ð1; a2Þ þ 2ð0; b2Þ
follows a standard normal distribution in large samples (Murphy et al., 2001).

Thus, we have
pffiffiffiffi
N
z z=2 þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ
2ð1; a2Þ þ 2ð0; b2Þ
Now, using equation 10 in Murphy (2005) for k = 2 steps1 (initial and sec-
ondary) of treatment,
" #
2 ðY
ða1; a2Þ Þ2
ða1; a2Þ ¼ Ea1;a2
Prða1 Þ Prða2 j R; a1 Þ
" #
ðY
ða1; a2Þ Þ2

¼ Ea1;a2 R ¼ 1 Pra1 ½R ¼ 1
Prða1 Þ Prða2 j 1; a1 Þ
" #
ðY
ða1; a2Þ Þ2

þ Ea1;a2 R ¼ 0 Pra1 ½R ¼ 0
for all values of a1, a2; the subscripts on E and Pr (namely, Ea1,a2 and Pra1)
indicate expectations and probabilities calculated as if all subjects were
assigned a1 as the initial treatment and then, if nonresponse, assigned treat-
ment a2. If we are willing to make the assumption (*) that
Ea1;a2 ½ðY
ða1; a2Þ Þ2 jR Ea1;a2 ½ðY
ða1; a2Þ Þ2
for both R = 1 and R = 0 (i.e., the variability of the outcome around the strat-
egy mean among either responders or nonresponders is no more than the
variance of the strategy mean), then
Pra1 ½R ¼ 1
2ða1; a2Þ Ea1;a2 ½ðY
ða1; a2Þ Þ2
Pra1 ½R ¼ 0
þ Ea1;a2 ½ðY
ða1; a2Þ Þ2 :
Thus, we have

Pra1 ½R ¼ 1 Pra1 ½R ¼ 0
2ða1; a2Þ 2ða1; a2Þ þ ð2Þ
Prða1 Þ Prða2 j 1; a1 Þ Prða1Þ Prða2 j 0; a1 Þ
where 2ða1; a2Þ is the marginal variance of the strategy in question.

Since (**) nonresponding subjects (R = 0) are randomized equally to the
two initial treatment options and since there is one treatment option for
responders (R = 1), for a common initial response rate p = Pr[R = 1|A1 = 1] =
Pr[R = 1|A1 = 0],
2ða1; a2Þ 2ða1; a2Þ 2 ð2 ð1 pÞ þ 1 pÞÞ

Rearranging equation 1 gives us

0qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 12
2ð1; a2Þ þ 2ð0; b2Þ
N@ ðz þ z=2 ÞA

0 12
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
B ð 21 þ 20 Þð2 ð2 ð1 pÞ þ pÞÞ C
B
@ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðz þ z=2 ÞC
A
2 þ 20
1
2
Simplifying, we have the formula
N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ pÞÞ ð1=Þ2
which is the sample size formula given in Sample Size Calculations that
depends on the response rate p.
Going through the arguments once again, we see that we do not need
either of the two working assumptions (*) or (**) to obtain the conservative
sample size formula, N3b:
2 4 ð1=Þ2 ðz þ z=2 Þ2 ¼ N3b
Sample Size Calculation for Question 4

We now present the algorithm for calculating the sample size for question 4.
As in the previous section, suppose we have data from a SMART design
modeled after the one presented in Figure 8.2; we use the same notation
and assumptions listed in Test Statistics and Sample Size Formulae. Suppose
that we are interested in identifying the strategy that has the highest mean
outcome. We will denote the mean outcome for strategy (A1, A2) by
ðA1;A2Þ.
We make the following assumptions:
• The marginal variances of the final outcome given the strategy are
all equal, and we denote this variance by 2. This means that 2 =
Var[Y|A1 = a1, A2 = a2] for all (a1, a2) in {(1,1), (1,0), (0,1), (0,0)}.
• The sample sizes will be large enough so that

^ ða1; a2Þ is approximately
normally distributed.
• The correlation between the estimated mean outcome for strategy (1, 1)
and the estimated mean outcome for strategy (1, 0) is the same as the
correlation between the estimated mean outcome for strategy (0, 1) and
the estimated mean outcome resulting for strategy (0, 0); we denote
this identical correlation by .
The correlation of the treatment strategies is directly related to the initial

response rates. The final outcome under two different treatment strategies
will be correlated to the extent that they share responders. For example, if the
response rate for treatment A1 = 1 is 0, then everyone is a nonresponder and
the means calculated for Y given strategy (1, 1) and for Y given strategy (1, 0)
will not share any responders to treatment A1 = 1; thus, the correlation
between the two strategies will be 0. On the other hand, if the response
rate for treatment A1 = 1 is 1, then everyone is a responder to A1 = 1 and,
therefore, the mean outcomes for strategy (1, 1) and strategy (1, 0) will be
directly related (i.e., completely correlated). Two treatment strategies that each
begin with a different initial treatment are not correlated since the strategies
do not overlap (i.e., they do not share any subjects).
For the algorithm, the user must specify the following quantities:
• the desired standardized effect size,
• the desired probability that the strategy estimated to have the largest
mean outcome does in fact have the largest mean,
We assume that three of the strategies have the same mean and the one
remaining strategy produces the largest mean; this is an extreme scenario in
which it is most difficult to detect the presence of an effect. Without loss of
generality, we choose strategy (1, 1) to have the largest mean.
Consider the following algorithm as a function of N:
1. For every value of in {0, 0.01, 0.02, . . . , 0.99, 1} perform the following
simulation:
Generate K = 20,000 samples of ½
^ ð1;1Þ
^ ð1;0Þ
^ ð0;1Þ
^ ð0;0Þ T from
a multivariate normal with
2 3 2 3

ð1;1Þ =2
6
ð1;0Þ 7 6 0 7
mean M ¼ 6 7 6 7
4
ð0;1Þ 5 ¼ 4 0 5 and

ð0;0Þ 0
2 3
1 0 0
16
6 1 0 077
covariance matrix ¼ 4
N 0 0 1 5
0 0 1
This gives us 20,000 samples, V1 ; . . . ; Vk ; . . . ; V20000 , where each Vk is a

vector of four entries of outcomes, one from each treatment strategy.
For example, Vkt ¼ ½
^ ð1; 1Þ;k
^ ð1; 0Þ;k
^ ð0; 1Þ;k
^ ð0; 0Þ;k .
Count how many times out of V1 ; . . . ; V20000 that

^ ð1; 1Þ;k is highest;
divide this count by 20,000, and call this value C(N). C(N) is the
estimate for the probability of correctly identifying the strategy with
the highest mean.
2. At the end of step 1, we will have a value of C(N) for each in
{0, 0.01, 0.02, . . . , 0.99, 1}. Let N ¼ min C ðNÞ; the value of N is
the lowest probability of detecting the best strategy mean.
Next, we perform a search over the space of possible values of N to
find the value for which N ¼ . N4 is the value of N for which N ¼ .
The online calculator for the sample size for question 4 can be found at
http://methodologymedia.psu.edu/smart/samplesize.
References
Carroll, K. M. (2005). Recent advances in psychotherapy of addictive disorders. Current

Psychiatry Reports, 7, 329–336.
Carroll, K. M., & Onken, L. S. (2005). Behavioral therapies for drug abuse. American
Journal of Psychiatry, 162(8), 1452–1460.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,
NJ: Lawrence Erlbaum Associates.
Dawson, R., & Lavori, P. W. (2003). Comparison of designs for adaptive treatment
strategies: Baseline vs. adaptive randomization. Journal of Statistical Planning and
Inference, 117, 365–385.
Fiellin, D. A., Kleber, H., Trumble-Hejduk, J. G., McLellan, A. T., & Kosten, T. R.
(2004). Consensus statement on office based treatment of opioid dependence using
buprenorphine. Journal of Substance Abuse Treatment, 27, 153–159.
Fiellin, D., Pantalon, M., Schottenfeld, R., Gordon, L., & O’Connor, P. (1999). Manual
for standard medical management of opioid dependence with buprenorphine. New Haven,
CT: Yale University School of Medicine, Primary Care Center and Substance Abuse
Center, West Haven VA/CT Healthcare System.
Fiellin, D. A., Pantalon, M. V., Chawarski, M. C., Moore, B. A., Sullivan, L. E.,
O’Connor, P. G., et al. (2006). Counseling plus buprenorphine-naloxone mainte-
nance therapy for opioid dependence. New England Journal of Medicine, 355(4),
365–374.
Gandhi, D. H., Jaffe, J. H., McNary, S., Kavanagh, G. J., Hayes, M., & Currens, M.
(2003). Short-term outcomes after brief ambulatory opioid detoxification with bupre-
norphine in young heroin users. Addiction, 98, 453–462.
Greenhouse, J., Stangl, D., Kupfer, D., & Prien, R. (1991). Methodological
issues in maintenance therapy clinical trials. Archives of General Psychiatry, 48(3),
313–318.
Hoel, P. (1984). Introduction to mathematical statistics (5th ed.). New York: John Wiley &
Sons.
Jennison, C., & Turnbull, B. (2000). Group sequential methods with applications to clinical
trials. Boca Raton, FL: Chapman & Hall.
Lavori, P.W., & Dawson, R. (2000). A design for testing clinical strategies: Biased
adaptive within-subject randomization. Journal of the Royal Statistical Association,
163, 29–38.
Lavori, P. W., Dawson, R., & Rush, A. J. (2000). Flexible treatment strategies in chronic
disease: Clinical and research implications. Biological Psychiatry, 48, 605–614.
Ling, W., Amass, L., Shoptow, S., Annon, J. J., Hillhouse, M., Babcock, D., et al.
(2005). A multi-center randomized trial of buprenorphine-naloxone versus clonidine
for opioid detoxification: Findings from the National Institute on Drug Abuse
Clinical Trials Network. Addiction, 100, 1090–1100.
Ling, W., & Smith, D. (2002). Buprenorphine: Blending practice and research. Journal
of Substance Abuse Treatment, 23, 87–92.
McLellan, A. T. (2002). Have we evaluated addiction treatment correctly? Implications
from a chronic care perspective. Addiction, 97, 249–252.
McLellan, A. T., Lewis, D. C., O’Brien, C. P., & Kleber, H. D. (2000). Drug dependence,
a chronic medical illness. Implications for treatment, insurance, and outcomes
evaluation. Journal of the American Medical Association, 284(13), 1689–1695.
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal
Statistical Society, 65, 331–366.
Murphy, S. A. (2005). An experimental design for the development of adaptive treat-
ment strategies. Statistics in Medicine, 24, 1455–1481.
Murphy, S. A., Lynch, K. G., Oslin, D.A., McKay, J. R., & Tenhave, T. (2006).
Developing adaptive treatment strategies in substance abuse research. Drug and
Alcohol Dependence. doi:10.1016/j.drugalcdep.2006.09.008.
Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges
in constructing effective treatment sequences for chronic psychiatric disorders.
Neuropsychopharmacology, 32, 257–262.
Murphy, S. A., Van Der Laan, M. J., Robins, J. M., & Conduct Problems Prevention
Group (2001). Marginal mean models for dynamic regimes. Journal of the American
Statistical Association, 96(456), 1410–1423.
Rush, A. J., Crismon, M. L., Kashner, T. M., Toprac, M. G., Carmody, T. J., Trivedi, M.
H., et al. (2003). Texas medication algorithm project, phase 3 (TMAP-3): Rationale
and study design. J. Clin. Psychiatry, 64(4), 357–369.
Stroup, T. S., McEvoy, J. P., Swartz, M. S., Byerly, M. J., Glick, I. D, Canive, J. M., et al.
(2003). The National Institute of Mental Health Clinical Antipsychotic Trials of
Intervention Effectiveness (CATIE) project: Schizophrenia trial design and protocol
development. Schizophrenia Bulletin, 29(1), 15–31.
Weiss, R., Sharpe, J. P., & Ling, W. A. (2010). Two-phase randomized controlled
clinical trial of buprenorphine/naloxone treatment plus individual drug counseling
for opioid analgesic dependence. National Institute on Drug Abuse Clinical Trials
Network. Retrieved June 14, 2020 from http://www.clinicaltrials.gov/ct/show/
NCT00316277?order=1
9
Obtaining Robust Causal Evidence From

Observational Studies
Can Genetic Epidemiology Help?
george davey smith
Introduction: The Limits of Observational Epidemiology
Observational epidemiological studies have clearly made important contribu-

tions to understanding the determinants of population health. However, there
have been high-profile problems with this approach, highlighted by appar-
ently contradictory findings emerging from observational studies and from
randomized controlled trials (RCTs) of the same issue. These situations, of
which the best known probably relates to the use of hormone-replacement
therapy (HRT) in coronary heart disease (CHD) prevention, have been dis-
cussed elsewhere (Davey Smith & Ebrahim, 2002) . The HRT controversy is
covered elsewhere in this volume (see Chapter 5). Here, I will discuss two
examples. First, consider the use of vitamin E supplements and CHD risk.
Several observational studies have suggested that the use of vitamin E sup-
plements is associated with a reduced risk of CHD, two of the most influen-
tial being the Health Professionals Follow-Up Study (Rimm et al., 1993) and
the Nurses’ Health Study (Stampfer et al., 1993), both published in the New
England Journal of Medicine in 1993. Findings from one of these studies are
presented in Figure 9.1, where it can be seen that even short-term use of
vitamin E supplements was associated with reduced CHD risk, which per-
sisted after adjustment for confounding factors. Figure 9.2 demonstrates that
nearly half of U.S. adults are taking either vitamin E supplements or multi-
vitamin/multimineral supplements that generally contain vitamin E
(Radimer et al., 2004). Figure 9.3 presents data from three available time
points, where there appears to have been a particular increase in vitamin E
use following 1993 (Millen, Dodd, & Subar, 2004), possibly consequent upon
the publication of the two observational studies already mentioned, which
206
RR 2
1.5
0.5
0
0-1 year 2-4 years 5-9 years >10 years
Figure 9.1 Observed effect of duration of vitamin E use compared to no use on coronary
heart disease events in the Health Professional Follow-Up Study. From ‘‘Vitamin E con-
sumption and the risk of coronary heart disease in men,’’ by E. B. Rimm, M. J. Stampfer,
A. Ascherio, E. Giovannucci, G. A. Colditz, & W. C. Willett, 1993, New England Journal of
Medicine, 328, 1450–1456.
have received nearly 3,000 citations between them since publication.

The apparently strong observational evidence with respect to vitamin E and
reduced CHD risk, which may have influenced the very high current use of
vitamin E supplements in developed countries, was unfortunately not rea-
lized in RCTs (Figure 9.4), in which no benefit from vitamin E supplementa-
tion use is seen. In this example it is important to note that the observational
studies and the RCTs were testing precisely the same exposure—short-term
40
Male Female
30
Percent
20
10
0
Multivitamins/multimineral Vitamin E Vitamin C
Figure 9.2 Use of vitamin supplements in the past month among U.S. adults, 1999–
2000. From ‘‘Dietary supplement use by US adults: Data from the National Health and
Nutrition Examination Survey, 1999–2000,’’ by K. Radimer, B. Bindewald, J. Hughes,
B. Ervin, C. Swanson, & M. F. Picciano, 2004, American Journal of Epidemiology, 160,
339–349.
30
1987 1992 2000

20
Percent
10
0
Multivitamins Vitamin E Vitamin C
Figure 9.3 Use of vitamin supplements in U.S. adults, 1987–2000. From ‘‘Use of vitamin,
mineral, nonvitamin, and nonmineral supplements in the United States: The 1987,
1992, and 2000 National Health Interview Survey results.’’ by A. E. Millen, K. W.
Dodd, & A. F. Subar, 2004, Journal of the American Dietetic Association, 104, 942–950.
vitamin E supplement use—and yet yielded very different findings with

respect to the apparent influence on risk.
In 2001 the Lancet published an observational study demonstrating an
inverse association between circulating vitamin C levels and incident CHD
(Khaw et al., 2001). The left-hand side of Figure 9.5 summarizes these data,
presenting the relative risk for a 15.7 mmol/l higher plasma vitamin C level,
assuming a log-linear association. As can be seen, adjustment for confoun-
ders had little impact on this association. However, a large-scale RCT, the
Heart Protection Study, examined the effect of a supplement that increased
average plasma vitamin C levels by 15.7 mmol/l. In this study randomization
1.1
1.0
0.9
0.7
0.5
0.3
Stampfer 1993 Rimm 1993 RCTs
Figure 9.4 Vitamin E supplement use and risk of coronary heart disease in two obser-
vational studies (Rimm et al., 1993; Stampfer et al., 1993) and in a meta-analysis of
randomized controlled trials (Eidelman, Hollar, Hebert, Lamas, & Hennekens, 2004).
1.06 (0.95,1.16)
Heart
Protection
Study
EPIC m 0.72 (0.61,0.86)
0.70 (0.51,0.95)
EPIC m*
0.63 (0.49,0.84)
EPIC w
0.63 (0.45,0.90)
EPIC w*
Favours Does not favour

Vitamin C Vitamin C
.4 .6 .8 1 1.2
Relative risk
Figure 9.5 Estimates of the effects of an increase of 15.7 mmol/l plasma vitamin C on
coronary heart disease 5-year mortality estimated from the observational epidemiolo-
gical European Prospective Investigation Into Cancer and Nutrition (EPIC) (Khaw
et al., 2001) and the randomized controlled Heart Protection Study (Heart
Protection Study Collaborative Group, 2002). EPIC m, male, age-adjusted; EPIC m*,
male, adjusted for systolic blood pressure, cholesterol, body mass index, smoking,
diabetes, and vitamin supplement use; EPIC f, female, age-adjusted; EPIC f*,
female, adjusted for systolic blood pressure, cholesterol, body mass index, smoking,
diabetes, and vitamin supplement use.
to the supplement was associated with no decrement in CHD risk (Heart

Protection Study Collaborative Group, 2002).
What underlies the discrepancy between these findings? One possibility is
that there is considerable confounding between vitamin C levels and other
exposures that could increase the risk of CHD. In the British Women’s Heart
and Health study (BWHHS), for example, women with higher plasma vita-
min C levels were less likely to be in a manual social class, to have no car
access, to be a smoker, or to be obese and more likely to exercise, to be on a
low-fat diet, to have a daily alcoholic drink, and to be tall (Lawlor, Davey
Smith, Kundu, Bruckdorfer, & Ebrahim, 2004). Furthermore, for women in
their 60s and 70s, those with higher plasma vitamin C levels were less likely
to have come from a home 50 years or more previously in which their father
was in a manual job, there was no bathroom or hot water, or they had to
share a bedroom. They were also less likely to have limited educational
attainment. In short, a substantial amount of confounding by factors from
across the life course that predict elevated risk of CHD was seen. Table 9.1
illustrates how four simple dichotomous variables from across the life course
Table 9.1 Cardiovascular Mortality According to Cumulative Risk Indicator (Father’s

Social Class, Adulthood Social Class, Smoking, Alcohol Use)
n CVD Deaths Relative Risk

4 favorable (0 unfavorable) 517 47 1
3 favorable (1 unfavorable) 1,299 227 1.99 (1.45–2.73)
0 favorable (4 unfavorable) 758 220 4.55 (3.32–6.24)
From Davey Smith & Hart (2002).
can generate large differences in cardiovascular disease mortality (Davey

Smith & Hart, 2002).
In the BWHHS a 15.7 mmol/l higher plasma vitamin C level was
associated with a relative risk of incident CHD of 0.88 (95% confidence
interval [CI] 0.80–0.97), in the same direction as the estimates seen in the
observational study summarized in Figure 9.2. When adjusted for the same
confounders as were adjusted for in the observational study reported in Figure
9.5, the estimate changed very little—to 0.90 (95% CI 0.82–0.99). When
additional adjustment for confounders acting across the life course was
made, considerable attenuation was seen, with a residual relative risk of
0.95 (95% CI 0.85–1.05) (Lawlor et al., 2005). It is obvious that given inevi-
table amounts of measurement imprecision in the confounders or a limited
number of missing unmeasured confounders, the residual association
is essentially null and close to the finding of the RCT. Most studies
have more limited information on potential confounders than is available
in the BWHHS, and in other fields we may be even more ignorant of the
confounding factors we should measure. In these cases inferences drawn
from observational epidemiological studies may be seriously misleading.
As the major and compelling rationale for doing these observational
studies is to underpin public-health prevention strategies, their repeated fail-
ures are a major concern for public-health policy makers, researchers, and
funders.
Other processes in addition to confounding can produce robust, but non-
causal, associations in observational studies. Reverse causation—where the
disease influences the apparent exposure, rather than vice versa—may gen-
erate strong and replicable associations. For example, many studies have
found that people with low circulating cholesterol levels are at increased
risk of several cancers, including colon cancer. If causal, this is an important
association as it might mean that efforts to lower cholesterol levels would
increase the risk of cancer. However, it is possible that the early stages of
cancer may, many years before diagnosis or death, lead to a lowering in
cholesterol levels, rather than low cholesterol levels increasing the risk
of cancer. Similarly, studies of inflammatory markers such as C-reactive
protein and cardiovascular disease risk have shown that early stages of
atherosclerosis—which is an inflammatory process—may lead to elevation
in circulating inflammatory markers; and since people with atherosclerosis
are more likely to experience cardiovascular events, a robust, but noncausal,
association between levels of inflammatory markers and incident cardiovas-
cular disease is generated. Reverse causation can also occur through beha-
vioral processes—for example, people with early stages and symptoms of
cardiovascular disease may reduce their consumption of alcohol, which
would generate a situation in which alcohol intake appears to protect against
cardiovascular disease. A form of reverse causation can also occur through
reporting bias, with the presence of disease influencing reporting disposition.
In case–control studies people with the disease under investigation may
report on their prior exposure history in a different way from controls, per-
haps because the former will think harder about potential reasons that
account for why they have developed the disease.
Table 9.2a Means or proportions of blood pressure, pulse pressure, hypertension and
potential confounders by quarters of C-reactive protein (CRP) N = 3,529 (from Davey
Smith et al 2005)
Means or proportions by quarters of P trend

C-reactive protein (Range mg/L) across
categories
1 2 3 4
(0.16-0.85) (0.86-1.71) (1.72-3.88) (3.89-112.0)
Hypertension (%) 45.8 49.7 57.5 60. < 0.001

2
BMI (kg/m ) 25.2 27.0 28.5 29.7 < 0.001
HDLc (mmol/l) 1.80 1.69 1. 1.53 < 0.001
Lifecourse 4.08 4.37 4.46 4.75 < 0.001
socioeconomic
position score
Doctor diagnosis 3.5 2.8 4.1 8.4 < 0.001
of diabetes (%)
Current 7.9 9.6 10.9 15.4 < 0.001
smoker (%)
Physically 11.3 14.9 20.1 29.6 < 0.001
inactive (%)
Moderate alcohol 22.2 19.6 18.8 14.0 < 0.001
consumption (%)
Table 9.2b Means or proportions of CRP systolic blood pressure, hypertension and
potential confounders by 1059G/C genotype (from Davey Smith et al 2005)
Means or proportions by genotype P

GG GC or CC
a
CRP(mg/L log scale) 1.81 1.39 < 0.001
Hypertension (%) 53.3 53.1 0.95
BMI (kg/m2) 27.5 27.8 0.29
HDLc (mmol/l) 1.67 1.65 0.38
Lifecourse socioeconomic 4.35 4.42 0.53
position score
Doctor diagnosed diabetes (%) 4.7 4.5 0.80
Current smoker (%) 11.2 9.3 0.24
Physically inactive (%) 18.9 18.9 1.0
Moderate alcohol 18.6 19.8 0.56
consumption (%)
a
Geometric means and proportionate (%) change for a doubling of CRP
CRP: C-reactive protein; OR: odds ratio; FEV1: forced expiratory volume expiratory in one second;
HDLc: high density lipoprotein cholesterol; CVD: cardiovascular disease (stroke or coronary heart
disease)
In observational studies, associations between an exposure and disease

will generally be biased if there is selection according to an exposure–disease
combination in case–control studies or according to an exposure–disease risk
combination in prospective studies. Such selection may arise through differ-
ential participation in research studies, conducting studies in settings such as
hospitals where cases and controls are not representative of the general
population, or study of unusual populations (e.g., vegetarians). If, for exam-
ple, those people experiencing an exposure but at low risk of disease for
other reasons were differentially excluded from a study, the exposure would
appear to be positively related to disease outcome, even if there were no such
association in the underlying population. This is a form of ‘‘Berkson’s bias,’’
well known to epidemiologists (Berkson, 1946). A possible example of such
associative selection bias relates to the finding in the large American Cancer
Society volunteer cohort that high alcohol consumption was associated with a
reduced risk of stroke (Thun et al., 1997). This is somewhat counterintuitive
as the outcome category included hemorrhagic stroke (for which there is no
obvious mechanism through which alcohol would reduce risk) and because
alcohol is known to increase blood pressure, a major causal factor for stroke.
Population-based studies have found that heavy alcohol consumption tends to
increase stroke risk, particularly hemorrhagic stroke risk (Hart, Davey Smith,
Hole, & Hawthorne, 1999; Reynolds et al., 2003). Heavy drinkers who volun-
teer for a study known to be about the health effects of their lifestyle are
likely to be very unrepresentative of all heavy drinkers in the population, in

ways that render them to be at low risk of stroke. Moderate drinkers and
nondrinkers who volunteer may be more representative of moderate drinkers
and nondrinkers in the underlying population. Thus, the low risk of stroke in
the heavy drinkers who volunteer for the study could erroneously make it
appear that alcohol reduces the risk of stroke.
These problems of confounding and bias relate to the production of asso-
ciations in observational studies that are not reliable indicators of the true
direction of causal associations. A separate issue is that the strength of
associations between causal risk factors and disease in observational studies
will generally be underestimated due to random measurement imprecision in
indexing the exposure. A century ago, Charles Spearman demonstrated math-
ematically how such measurement imprecision would lead to what he termed
the ‘‘attenuation by errors’’ of associations (Spearman, 1904; Davey Smith &
Phillips, 1996). This has more latterly been renamed ‘‘regression dilution
bias.’’ (MacMahon S, Peto R, Cutler J, Collins R, Sorlie P, et al 1990)
Observational studies can and do produce findings that either spuriously
enhance or downgrade estimates of causal associations between modifiable
exposures and disease. This has serious consequences for the appropriateness
of interventions that aim to reduce disease risk in populations. It is for these
reasons that alternative approaches—including those within the Mendelian
randomization framework—need to be applied.
Mendelian Randomization
The basic principle utilized in the Mendelian randomization approach is that

if genetic variants either alter the level or mirror the biological effects of a
modifiable environmental exposure that itself alters disease risk, then these
genetic variants should be related to disease risk to the extent predicted by
their influence on exposure to the risk factor. Common genetic polymorph-
isms that have a well-characterized biological function (or are markers for
such variants) can therefore be utilized to study the effect of a suspected
environmental exposure on disease risk (Davey Smith & Ebrahim, 2003,
2004, 2005; Davey Smith, 2006; Lawlor, Harbord, Sterne, Timpson, &
Davey Smith, 2008; Ebrahim & Davey Smith, 2008). The exploitation
of situations in which genotypic differences produce effects similar to envir-
onmental factors (and vice versa) clearly resonates with the concepts of
‘‘phenocopy’’ and ‘‘genocopy’’ in developmental genetics (Box 9.1).
It may seem illogical to study genetic variants as proxies for environmen-
tal exposures rather than to measure the exposures themselves. However,
there are several crucial advantages of utilizing functional genetic variants
(or their markers) in this manner that relate to the problems with
box 9.1
Phenocopy, Genocopy, and Mendelian Randomization
The term phenocopy is attributed to Goldschmidt (1938) and is used to

describe the situation where an environmental effect could produce the
same effect as was produced by a genetic mutation. As Goldschmidt
(1938) explicated, ‘‘different causes produce the same end effect, presum-
ably by changing the same developmental processes in an identical way.’’
In human genetics the term has generally been applied to an environmen-
tally produced disease state that is similar to a clear genetic syndrome.
For example the niacin-deficiency disease pellagra is clinically similar to
the autosomal recessive condition Hartnup disease (Baron, Dent, Harris,
Hart, & Jepson, 1956), and pellagra has been referred to as a phenocopy
of the genetic disorder (Snyder, 1959; Guy, 1993). Hartnup disease is due
to reduced neutral amino acid absorption from the intestine and reabsorp-
tion from the kidney, leading to low levels of blood tryptophan, which in
turn leads to a biochemical anomaly that is similar to that seen when the
diet is deficient in niacin (Kraut & Sachs, 2005; Broer, Cavanaugh, &
Rasko, 2004). Genocopy is a less utilized term, attributed to
Schmalhausen (see Gause, 1942), but has generally been considered to
be the reverse of phenocopy—that is, when genetic variation generates an
outcome that could be produced by an environmental stimulus (Jablonka-
Tavory, 1982). It is clear that, even when the term genocopy is used
polemically (e.g., Rose, 1995), the two concepts are mirror images, reflect-
ing differently motivated accounts of how both genetic and environmental
factors influence physical state. For example, Hartnup disease can be
called a genocopy of pellagra, while pellagra can be considered a pheno-
copy of Hartnup disease. Mendelian randomization can, therefore, be
viewed as an appreciation of the phenocopy–genocopy nexus that allows
causation to be separated from association.
Phenocopies of major genetic disorders are generally rarely encoun-
tered in clinical medicine, but as Lenz (1973) comments, ‘‘they are, how-
ever, most important as models which might help to elucidate the
pathways of gene action.’’ Mendelian randomization is generally con-
cerned with less major (and, thus, common) disturbances and reverses
the direction of phenocopy ! genocopy, to utilize genocopies of known
genetic mechanism to inform us better about pathways through which the
environment influences health.
The scope of phenocopy–genocopy has been discussed by Zuckerkandl
and Villet (1988), who advance mechanisms through which there can be
(continued)
9.3 (continued)
Box 9.1 (continued)
equivalence between environmental and genotypic influences. Indeed, they

state that ‘‘no doubt all environmental effects can be mimicked by one or
several mutations.’’ The notion that genetic and environmental influences
can be both equivalent and interchangeable has received considerable
attention in developmental biology (e.g., West-Eberhard, 2003; Leimar,
Hammerstein, & Van Dooren, 2006). Furthermore, population genetic
analyses of correlations between different traits suggest there are
common pathways of genetic and environmental influences, with
Cheverud (1988) concluding that ‘‘most environmentally caused phenoty-
pic variants should have genetic counterparts and vice versa.’’
observational studies already outlined. First, unlike environmental exposures,

genetic variants are not generally associated with the wide range of beha-
vioral, social, and physiological factors that, for example, confound the asso-
ciation between vitamin C and CHD. This means that if a genetic variant is
used to proxy for an environmentally modifiable exposure, it is unlikely to be
confounded in the way that direct measures of the exposure will be. Further,
aside from the effects of population structure (see Palmer & Cardon, 2005,
for a discussion of the likely impact of this), such variants will not be asso-
ciated with other genetic variants, excepting those with which they are in
linkage disequilibrium. Empirical investigation of the associations of genetic
variants with potential confounding factors reveals that they do indeed tend
to be not associated with such factors (Davey Smith et al., 2008).
Second, we have seen how inferences drawn from observational studies
may be subject to bias due to reverse causation. Disease processes may
influence exposure levels such as alcohol intake or measures of intermediate
phenotypes such as cholesterol levels and C-reactive protein. However, germ-
line genetic variants associated with average alcohol intake or circulating
levels of intermediate phenotypes will not be influenced by the onset of
disease. This will be equally true with respect to reporting bias generated
by knowledge of disease status in case–control studies or of differential
reporting bias in any study design.
Third, associative selection bias, in which selection into a study is related to
both exposure level and disease risk and can generate spurious associations (as
illustrated with respect to alcohol and stroke), is unlikely to occur with respect to
genetic variants. For example, empirical evidence supports a lack of association
between a wide range of genetic variants and participation rates in a series of
cancer case–control studies (Bhatti et al., 2005).
Finally, a genetic variant will indicate long-term levels of exposure and if the
variant is taken as a proxy for such exposure, it will not suffer from the mea-
surement error inherent in phenotypes that have high levels of variability.
For example, groups defined by cholesterol level–related genotype will, over
a long period, experience the cholesterol difference seen between the groups.
For individuals, blood cholesterol is variable over time, and the use of single
measures of cholesterol will underestimate the true strength of association
between cholesterol and, say, CHD. Indeed, use of the Mendelian randomi-
zation approach predicts a strength of association that is in line with RCT
findings of the effects of cholesterol lowering when the increasing benefits
seen over the relatively short trial period are projected to the expectation for
differences over a lifetime (Davey Smith & Ebrahim, 2004), which will be
discussed further.
Categories of Mendelian Randomization
The term Mendelian randomization has now become widely used (see
Box 9.2), with a variety of meanings. This partly reflects the fact that there
are several categories of inference that can be drawn from studies utilizing
the Mendelian randomization approach. In the most direct forms, genetic
variants can be related to the probability or level of exposure (‘‘exposure
propensity’’) or to intermediate phenotypes believed to influence disease
risk. Less direct evidence can come from genetic variant–disease associations
that indicate that a particular biological pathway may be of importance,
perhaps because the variants modify the effects of environmental exposures.
Several examples of these categories have been given elsewhere (Davey Smith
& Ebrahim, 2003, 2004; Davey Smith, 2006; Ebrahim & Davey Smith, 2008);
here, a few illustrative cases are briefly outlined.
Exposure Propensity
Alcohol Intake and Health
The possible protective effect of moderate alcohol consumption on CHD risk

remains controversial (Marmot, 2001; Bovet & Paccaud, 2001; Klatsky, 2001).
Nondrinkers may be at a higher risk of CHD because health problems (per-
haps induced by previous alcohol abuse) dissuade them from drinking
(Shaper, 1993). As well as this form of reverse causation, confounding
could play a role, with nondrinkers being more likely to display an adverse
profile of socioeconomic or other behavioral risk factors for CHD (Hart et al.,
1999). Alternatively, alcohol may have a direct biological effect that lessens
box 9.2
Why ‘‘Mendelian Randomization’’?
Gregor Mendel (1822–1884) concluded from his hybridization studies with

pea plants that ‘‘the behaviour of each pair of differentiating characteris-
tics [such as the shape and color of seeds] in hybrid union is independent
of the other differences between the two original plants’’ (Mendel, 1866).
This formulation was actually the only regularity that Mendel referred to as
a ‘‘law,’’ and in Carl Correns’ 1900 paper (one of a trio appearing that year
that are considered to represent the rediscovery of Mendel) he refers to
this as ‘‘Mendel’s law’’ (Correns, 1900; Olby, 1966). Morgan (1913) dis-
cusses independent assortment and refers to this process as being rea-
lized ‘‘whenever two pairs of characters freely Mendelize.’’ Morgan’s use
of Mendel’s surname as a verb did not catch on, but Morgan later chris-
tened this principle ‘‘Mendel’s second law’’ (Morgan, 1919); it has been
known as this or as ‘‘the law of independent assortment’’ since this time.
The law suggests that inheritance of one trait is independent of—that is,
randomized with respect to—the inheritance of other traits. The analogy
with a randomized controlled trial will clearly be most applicable to
parent–offspring designs investigating the frequency with which one of
two alleles from a heterozygous parent is transmitted to offspring with
a particular disease. However, at the population level, traits influenced by
genetic variants are generally not associated with the social, behavioral,
and environmental factors that confound relationships observed in con-
ventional epidemiological studies. Thus, while the ‘‘randomization’’ is
approximate and not absolute in genetic association studies, empirical
observations suggest that it applies in most circumstances (Davey
Smith, Harbord, Milton, Ebrahim, & Sterne, 2005a; Bhatti et al., 2005;
Davey Smith et al., 2008).
The term Mendelian randomization itself was introduced in a somewhat
different context, in which the random assortment of genetic variants at
conception is utilized to provide an unconfounded study design for esti-
mating treatment effects for childhood malignancies (Gray & Wheatley,
1991; Wheatley & Gray, 2004). The term has recently become widely
used with the meaning ascribed to it in this chapter.
The notion that genetic variants can serve as an indicator of the action
of environmentally modifiable exposures has been expressed in many con-
texts. For example, since the mid-1960s various investigators have pointed
out that the autosomal dominant condition of lactase persistence is asso-
ciated with milk drinking. Protective associations of lactase persistence
(continued)
Box 9.2 (continued)

box 9.3
with osteoporosis, low bone mineral density, or fracture risk thus provide
evidence that milk drinking reduces the risk of these conditions (Birge,
Keutmann, Cuatrecasas, & Whedon, 1967; Newcomer, Hodgson, Douglas,
& Thomas, 1978). In a related vein, it was proposed in 1979 that as N-
acetyltransferase pathways are involved in the detoxification of arylamine,
a potential bladder carcinogen, the observation of increased bladder-
cancer risk among people with genetically determined slow-acetylator phe-
notype provided evidence that arylamines are involved in the etiology of
the disease (Lower et al., 1979).
Since these early studies various commentators have pointed out that
the association of genetic variants of known function with disease out-
comes provides evidence about etiological factors (McGrath, 1999; Ames,
1999; Rothman et al., 2001; Brennan, 2002; Kelada, Eaton, Wang, Rothman,
& Khoury, 2003). However, these commentators have not emphasized the
key strengths of Mendelian randomization- the avoidance of confounding,
the avoidance of bias due to reverse causation and reporting tendency,
and correction for the underestimation of risk associations due to variability
in behaviors and phenotypes (Davey Smith & Ebrahim, 2004).
These key concepts were present in Martijn Katan’s 1986 Lancet letter,
in which he suggested that genetic variants related to cholesterol level
could be used to investigate whether the observed association between
low cholesterol and increased cancer risk was real, and by Honkanen and
colleagues’ (1996) understanding of how lactase persistence could better
characterize the difficult-to-measure environmental influence of calcium
intake than could direct dietary reports. Since 2000 there have been several
reports using the term Mendelian randomization in the way it is used here
(Youngman et al., 2000; Fallon, Ben-Shlomo, & Davey Smith, 2001;
Clayton & McKeigue, 2001; Keavney, 2002; Davey Smith & Ebrahim,
2003), and its use is becoming widespread.
the risk of CHD—for example, by increasing the levels of protective high-

density lipoprotein (HDL) cholesterol (Rimm, 2001). It is, however, unlikely
that an RCT of alcohol intake, able to test whether there is a protective effect
of alcohol on CHD events, will be carried out.
Alcohol is oxidized to acetaldehyde, which in turn is oxidized by aldehyde
dehydrogenases (ALDHs) to acetate. Half of Japanese people are heterozy-
gous or homozygous for a null variant of ALDH2, and peak blood acetalde-
hyde concentrations post–alcohol challenge are 18 times and five times
higher, respectively, among homozygous null variant and heterozygous
individuals compared with homozygous wild-type individuals (Enomoto,

Takase, Yasuhara, & Takada, 1991). This renders the consumption of alcohol
unpleasant through inducing facial flushing, palpitations, drowsiness, and
other symptoms. As Figure 9.6a shows, there are very considerable differ-
ences in alcohol consumption according to genotype (Takagi et al., 2002).
The principles of Mendelian randomization are seen to apply—two factors
that would be expected to be associated with alcohol consumption, age
and cigarette smoking, which would confound conventional observat-
ional associations between alcohol and disease, are not related to genotype
despite the strong association of genotype with alcohol consumption
(Figure 9.6b).
It would be expected that ALDH2 genotype influences diseases known to
be related to alcohol consumption, and as proof of principle it has been
shown that ALDH2 null variant homozygosity—associated with low alcohol
consumption—is indeed related to a lower risk of liver cirrhosis (Chao et al.,
1994). Considerable evidence, including data from RCTs, suggests that alco-
hol increases HDL cholesterol levels (Haskell et al., 1984; Burr, Fehily,
Butland, Bolton, & Eastham, 1986) (which should protect against CHD). In
line with this, ALDH2 genotype is strongly associated with HDL cholesterol
in the expected direction (Figure 9.6c). With respect to blood pressure, obser-
vational evidence suggests that long-term alcohol intake produces an
increased risk of hypertension and higher prevailing blood pressure levels.
A meta-analysis of studies of ALDH2 genotype and blood pressure suggests
that there is indeed a substantial effect in this direction, as demonstrated in
Figures 9.7, 9.8, and 9.9 (Chen et al., 2008).
Alcohol intake has also been postulated to increase the risk of esophageal
cancer; however, some have questioned the importance of its role (Memik,
2003). Figure 9.9 presents findings from a meta-analysis of studies of ALDH2
genotype and esophageal-cancer risk (Lewis & Davey Smith, 2005), clearly
showing that people who are homozygous for the null variant, who therefore
consume considerably less alcohol, have a greatly reduced risk of esophageal
cancer. Indeed, this reduction in risk is close to that predicted by the joint
effect of genotype on alcohol consumption and the association of alcohol
consumption on esophageal-cancer risk in a meta-analysis of such observa-
tional studies (Gutjahr, Gmel, & Rehm, 2001). When the heterozygotes are
compared with the homozygous functional variant, an interesting picture
emerges: The risk of esophageal cancer is higher in the heterozygotes
who drink rather less alcohol than those with the homozygous functional
variant. This suggests that it is not alcohol itself that is the causal factor
but acetaldehyde and that the increased risk is apparent only in those who
drink some alcohol but metabolize it inefficiently, leading to high levels of
acetaldehyde.
Alcohol Intake ml/day 40
30
20
10
0
2*2/2*2 2*2/2*1 1*1/1*1
ALDH2 Genotype
Age 70 Smoker
70
60
Percentage
60 50
Years
40
50
30
40 20
2*2/2*2 2*2/2*1 1*1/1*1 2*2/2*2 2*2/2*1 1*1/1*1
65
60
55
HDL mg/dl
50
45
40
35
2*2/2*2 2*2/2*1 1*1/1*1
Figure 9.6 a Relationship between alcohol intake and ALDH2 genotype. b Relationship
between characteristics and ALDH2 genotype. c Relationship between HDL cholesterol
and ALDH2 genotype. From ‘‘Aldehyde dehydrogenase 2 gene is a risk factor for myo-
cardial infarction in Japanese men,’’ by S. Takagi, N. Iwai, R. Yamauchi, S. Kojima,
S. Yasuno, T. Baba, et al., 2002, Hypertension Research, 25, 677–681.
Odds radio in
Study ID Hypertension (95% CI)
12vs22 (Male)
Amamoto et al, 2002 [18] 1.67 (0.92, 3.03)
Iwai et al, 2004 [31] 1.57 (0.90, 2.72)
Saito et al, 2003 [28] 2.84 (0.79, 10.15)
Subtotal (I2=0.0%, p=0.701) 1.72 (1.17, 2.52)
11vs22 (Male)
Amamoto et al, 2002 [18] 2.50 (1.38, 4.54)
Iwai et al, 2004 [31] 2.02 (1.17, 3.47)
Saito et al, 2003 [28] 4.62 (1.31, 16.25)
Subtotal (I2 =0.0%, p =0.482) 2.42 (1.66, 3.55)
.6 .8 1 2 4 8 16
Figure 9.7 Forest plot of studies of ALDH2 genotype and hypertension. From L Chen
et al. 2008.
Intermediate Phenotypes
Genetic variants can influence circulating biochemical factors such as cho-
lesterol, homocysteine, and fibrinogen levels. This provides a method for
assessing causality in associations between these measures (intermediate phe-
notypes) and disease and, thus, whether interventions to modify the inter-
mediate phenotype could be expected to influence disease risk.
Cholesterol and CHD
Familial hypercholesterolemia is a dominantly inherited condition in which

very many rare mutations of the low-density lipoprotein receptor gene (about
10 million people affected worldwide, a prevalence of around 0.2%) lead to
high circulating cholesterol levels (Marks, Thorogood, Neil, & Humphries,
2003). The high risk of premature CHD in people with this condition was
readily appreciated, with an early U.K. report demonstrating that by age 50
half of men and 12% of women had suffered from CHD (Slack, 1969).
Compared with the population of England and Wales (mean total cholesterol
6.0 mmol/l), people with familial hypercholesterolemia (mean total choles-
terol 9 mmol/l) suffered a 3.9-fold increased risk of CHD mortality, although
very high relative risks among those aged less than 40 years have been
observed (Scientific Steering Committee, 1991). These observations regarding
Mean differnce
Study ID DBP in mmHg (95% CI)
12vs22 (Male)
Amamoto et al., 2002 [18] 2.70 (−0.30, 5.70)
Saito et al., 2003 [28] 3.90 (−0.95, 8.75)
Takagi et al., 2001 [19] 1.00 (−0.75, 2.75)
Tsuritani et al., 1995 [20] 2.10 (−2.29, 6.49)
Yamada et al., 2002 [29] 0.70 (−3.17, 4.57)
Subtotal (I2 =0.0%, p=0.720) 1.58 (0.29, 2.87)
11vs22 (Male)
Amamoto et al, 2002 [18] 4.40 (1.36, 7.44)
Saito et al., 2003 [28] 7.10 (2.36, 11.84)
Takagi et al., 2001 [19] 3.10 (1.35, 4.85)
Tsuritani et al., 1995 [20] 5.80 (1.50, 10.10)
Yamada et al., 2002 [29] 3.80 (0.01, 7.59)
Subtotal (I2 =0.0%, p =0.492) 3.95 (2.66, 5.24)
−4 −2 0 4 8 12
Mean differnce of
Study ID SBP in mmHg (95% CI)
12vs22 (Male)
Amamoto et al., 2002 [18] 6.00 (1.30, 10.70)
Saito et al., 2003 [28] 9.40 (2.75, 16.05)
Takagi et al., 2001 [19] 2.20 (−1.05, 5.45)
Tsuritani et al., 1995 [20] 3.10 (−3.18, 9.38)
Yamada et al., 2002 [29] 4.80 (0.21, 9.39)
Subtotal (I2 =12.1%, p=0.336) 4.24 (2.18, 6.31)
11vs22 (Male)
Amamoto et al., 2002 [18] 8.40 (3.67, 13.13)
Saito et al., 2003 [28] 13.90 (7.35, 20.45)
Takagi et al., 2001 [19] 5.90 (2.65, 9.15)
Tsuritani et al., 1995 [20] 6.80 (0.54, 13.06)
Yamada et al., 2002 [29] 6.80 (2.32, 11.28)
Subtotal (I2 =18.0%, p =0.300) 7.44 (5.39, 9.49)
−4 −2 0 4 8 12 16 20
Figure 9.8 Forest plot of studies of ALDH2 genotype and blood pressure. L Chen, G.
Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
genetically determined variation in risk provided strong evidence that the

associations between blood cholesterol and CHD seen in general populations
reflected a causal relationship. The causal nature of the association between
blood cholesterol levels and CHD has historically been controversial
(Steinberg, 2004). As both Daniel Steinberg (2005) and Ole Færgeman
(2003) discuss, many clinicians and public-health practitioners rejected the
notion of a causal link for a range of reasons. However, from the late
1930s onward, the finding that people with genetically high levels of
odds ratio
Study (95% CI) % Weight
Hori 0.87 (0.19, 4.06) 26.5
Matsuo 0.19 (0.02, 1.47) 15.0
Boonyphiphat 0.22 (0.03, 1.87) 13.9
Itoga 0.48 (0.06, 3.87) 14.4
Yokoyama (2002) 0.25 (0.06, 1.07) 30.2
Overall 0.36 (0.16, 0.80) 100.0
.1 .2 .5 1 2 5
odds ratio
Figure 9.9 Risk of esophageal cancer in individuals with the ALDH2*2*2 vs. ALDH2*1*1
genotype. From ‘‘Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illus-
trates the potentials and limitations of a Mendelian randomization approach,’’ by S. Lewis
& G. Davey Smith, 2005, Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971.
cholesterol had high risk for CHD should have been powerful and convincing
evidence of the causal nature of elevated blood cholesterol in the general
population.
With the advent of effective means of reducing blood cholesterol through
statin treatment, there remains no serious doubt that the cholesterol–CHD
relationship is causal. Among people without CHD, reducing total cholesterol
levels with statin drugs by around 1–1.5 mmol/l reduces CHD mortality
by around 25% over 5 years. Assuming a linear relationship between
blood cholesterol and CHD risk and given the difference in cholesterol of
3.0 mmol/l between people with familial hypercholesterolemia and the gen-
eral population, the RCT evidence on lowering total cholesterol and reducing
CHD mortality would predict a relative risk for CHD of around 2, as opposed
to 3.9, for people with familial hypercholesterolemia. However, the trials also
demonstrate that the relative reduction in CHD mortality increases over time
from randomization—and thus time with lowered cholesterol—as would be
expected if elevated levels of cholesterol operate over decades to influence the
development of atherosclerosis. People with familial hypercholesterolemia
will have had high total cholesterol levels throughout their lives, and this
would be expected to generate a greater risk than that predicted by the results
of lowering cholesterol levels for only 5 years. Furthermore, ecological studies
relating cholesterol levels to CHD demonstrate that the strength of
association increases as the lag period between cholesterol level assessment

and CHD mortality increases (Rose, 1982), again suggesting that long-term
differences in cholesterol level are the important etiological factor in CHD.
As discussed, Mendelian randomization is one method for assessing the
effects of long-term differences in exposures on disease risk, free from the
diluting problems of both measurement error and having only short-term
assessment of risk-factor levels. This reasoning provides an indication that
cholesterol-lowering efforts should be lifelong rather than limited to the
period for which RCT evidence with respect to CHD outcomes is available.
Recently, several common genetic variants have been identified that are
related to cholesterol level and CHD risk, and these have also demonstrated
effects on CHD risk consistent with lifelong differences in cholesterol level
(Davey Smith, Timpson & Ebrahim, 2008; Kathiresan et al., 2008).
C-Reactive Protein and CHD
Strong associations of C-reactive protein (CRP), an acute-phase inflammatory

marker, with hypertension, insulin resistance, and CHD have been repeatedly
observed (Danesh et al., 2004; Wu, Dorn, Donahue, Sempos, & Trevisan,
2002; Pradhan, Manson, Rifai, Buring, & Ridker, 2001; Han et al., 2002;
Sesso et al., 2003; Hirschfield & Pepys, 2003; Hu, Meigs, Li, Rifai, &
Manson, 2004), with the obvious inference that CRP is a cause of these
conditions (Ridker et al., 2005; Sjöholm & Nyström, 2005; Verma, Szmitko,
& Ridker, 2005). A Mendelian randomization study has examined the asso-
ciation between polymorphisms of the CRP gene and demonstrated that
while serum CRP differences were highly predictive of blood pressure and
hypertension, the CRP variants—which are related to sizeable serum CRP
differences—were not associated with these same outcomes (Davey Smith
et al., 2005b). It is likely that these divergent findings are explained by the
extensive confounding between serum CRP and outcomes. Current evidence
on this issue, though statistically underpowered, also suggests that CRP
levels do not lead to elevated risk of insulin resistance (Timpson et al.,
2005) or CHD (Casas et al., 2006). Again, confounding, and reverse causa-
tion—where existing coronary disease or insulin resistance may influence
CRP levels—could account for this discrepancy. Similar findings have been
reported for serum fibrinogen, variants in the beta-fibrinogen gene, and CHD
(Davey Smith et al., 2005a; Keavney et al., 2006). The CRP and fibrinogen
examples demonstrate that Mendelian randomization can both increase evi-
dence for a causal effect of an environmentally modifiable factor (as in the
examples of milk, alcohol, and cholesterol levels) and provide evidence
against causal effects, which can help direct efforts away from targets of
no preventative or therapeutic relevance.
Maternal Genotype as an Indicator of

Intrauterine Environment
Mendelian randomization studies can provide unique insights into the causal
nature of intrauterine environment influences on later disease outcomes. In
such studies, maternal genotype is taken to be a proxy for environmentally
modifiable exposures mediated through the mother that influence the intrau-
terine environment. For example, it is now widely accepted that neural tube
defects can in part be prevented by periconceptual maternal folate supple-
mentation (Scholl and Johnson, 2000. RCTs of folate supplementation have
provided the key evidence in this regard (MRC Vitamin Study Research
Group, 1991; Czeizel & Dudás, 1992). However, could we have reached the
same conclusion before the RCTs were carried out if we had access to evi-
dence from genetic association studies? Studies have looked at the MTHFR
677C!T polymorphism (a genetic variant that is associated with methylte-
trahydrofolate reductase activity and circulating homocysteine levels, the TT
genotype being associated with higher homocysteine levels) in newborns with
neural tube defects compared to controls and have found an increased risk in
TT vs. CC newborns, with a relative risk of 1.75 (95% CI 1.41–2.18) in a
meta-analysis of all such studies (Botto & Yang, 2000). Studies have also
looked at the association between this MTHFR variant in parents and the
risk of neural tube defect in their offspring. Mothers who have the TT
genotype have an increased risk of 2.04 (95% CI 1.49–2.81) of having an
offspring with a neural tube defect compared to mothers who have the CC
genotype (Roseboom et al., 2000). For TT fathers, the equivalent relative risk
is 1.18 (95% CI 0.65–2.12) (Scholl & Johnson, 2000). This pattern of associa-
tions suggests that it is the intrauterine environment—influenced by mater-
nal TT genotype—rather than the genotype of offspring that is related to
disease risk (Figure 9.10). This is consistent with the hypothesis that mater-
nal folate intake is the exposure of importance.
In this case, the findings from observational studies, genetic association
studies, and an RCT are closely similar. Had the technology been available,
the genetic association studies, with the particular influence of maternal
versus paternal genotype on neural tube defect risk, would have provided
strong evidence of the beneficial effect of folate supplementation before the
results of any RCT had been completed, although trials would still have been
necessary to confirm that the effect was causal for folate supplementation.
Certainly, the genetic association studies would have provided better evidence
than that given by conventional epidemiological studies, which would have
had to cope with the problems of accurately assessing diet and the consider-
able confounding of maternal folate intake with a wide variety of lifestyle
and socioeconomic factors that may also influence neural tube defect risk.
Father – TT – but no way that

Mother – TT – foetus exposed this can affect in utero
in utero: RR 2.04 exposure of foetus: RR 1.18
Foetus – TT – inherits 50% from mother and 50% from father – hence
intermediate risk: RR 1.75
Figure 9.10 Inheritance of MTHFR polymorphism and neural tube defects.
The association of genotype with neural tube defect risk does not suggest that
genetic screening is indicated; rather, it demonstrates that an environmental
intervention may benefit the whole population, independent of the genotype
of individuals receiving the intervention.
Studies utilizing maternal genotype as a proxy for environmentally mod-
ifiable influences on the intrauterine environment can be analyzed in a
variety of ways. First, the mothers of offspring with a particular outcome
can be compared to a control group of mothers who have offspring without
the outcome in a conventional case–control design but with the mother as
the exposed individual (or control) rather than the offspring with the parti-
cular health outcome (or the control offspring). Fathers could serve as a
control group when autosomal genetic variants are being studied. If the
exposure is mediated by the mother, maternal genotype, rather than offspring
genotype, will be the appropriate exposure indicator. Clearly, maternal and
offspring genotypes are associated but conditional on each other; it should be
the maternal genotype that shows the association with the health outcome
among the offspring. Indeed, in theory it would be possible to simply com-
pare genotype distributions of mothers and offspring, with a higher preva-
lence among mothers providing evidence that maternal genotype, through an
intrauterine pathway, is of importance. However, the statistical power of such
an approach is low, and an external control group, whether fathers or women
who have offspring without the health outcome, is generally preferable.
The influence of high levels of alcohol intake by pregnant women on the
health and development of their offspring is well recognized for very high
levels of intake, in the form of fetal alcohol syndrome (Burd, 2006). However,
the influence outside of this extreme situation is less easy to assess, particu-
larly as higher levels of alcohol intake will be related to a wide array of
potential sociocultural, behavioral, and environmental confounding factors.
Furthermore, there may be systematic bias in how mothers report alcohol
intake during pregnancy, which could distort associations with health out-
comes. Therefore, outside of the case of very high alcohol intake by mothers,
it is difficult to establish a causal link between maternal alcohol intake and

offspring developmental characteristics. Some studies have approached this
by investigating alcohol-metabolizing genotypes in mothers and offspring
outcomes.
Although sample sizes have been low and the analytical strategies not
optimal, they provide some evidence to support the influence of maternal
genotype (Gemma, Vichi, & Testai, 2007; Jacobson et al., 2006; Warren & Li,
2005). For example, in one study mental development at age 7.5 was delayed
among offspring of mothers possessing a genetic variant associated with less
rapid alcohol metabolism. Among these mothers there would presumably be
less rapid clearance of alcohol and, thus, an increased influence of maternal
alcohol on offspring during the intrauterine period (Jacobson et al., 2006).
Offspring genotype was not independently related to these outcomes, indicat-
ing that the crucial exposure was related to maternal alcohol levels. As in the
MTHFR examples, these studies are of relevance because they provide evi-
dence of the influence of maternal alcohol levels on offspring development,
rather than because they highlight a particular maternal genotype that is of
importance. In the absence of alcohol drinking, the maternal genotype would
presumably have no influence on offspring outcomes. The association of
maternal genotype and offspring outcome suggests that the alcohol level in
mothers, and therefore their alcohol consumption, has an influence on
offspring development.
Implications of Mendelian Randomization

Study Findings
Establishing the causal influence of environmentally modifiable risk factors

from Mendelian randomization designs informs policies for improving popu-
lation health through population-level interventions. This does not imply that
the appropriate strategy is genetic screening to identify those at high risk and
application of selective exposure reduction policies. For example, the implica-
tion of studies on maternal MTHFR genotype and offspring neural tube
defect risk is that the population risk for neural tube defects can be reduced
through increased folate intake periconceptually and in early pregnancy. It
does not suggest that women should be screened for MTHFR genotype;
women without the TT genotype but with low folate intake are still exposed
to preventable risk of having babies with neural tube defects. Similarly, estab-
lishing the association between genetic variants (such as familial defective
ApoB) associated with elevated cholesterol level and CHD risk strengthens
causal evidence that elevated cholesterol is a modifiable risk factor for CHD
for the whole population. Thus, even though the population attributable risk
for CHD of this variant is small, it usefully informs public-health approaches

to improving population health. It is this aspect of Mendelian randomization
that illustrates its distinction from conventional risk identification and
genetic screening purposes of genetic epidemiology.
Mendelian Randomization and RCTs
RCTs are clearly the definitive means of obtaining evidence on the effects of
modifying disease risk processes. There are similarities in the logical struc-
ture of RCTs and Mendelian randomization, however. Figure 9.11 illustrates
this, drawing attention to the unconfounded nature of exposures proxied for
by genetic variants (analogous to the unconfounded nature of a randomized
intervention), the lack of possibility of reverse causation as an influence on
exposure–outcome associations in both Mendelian randomization and RCT
settings, and the importance of intention-to-treat analyses—that is, analysis
by group defined by genetic variant, irrespective of associations between the
genetic variant and the proxied for exposure within any particular individual.
The analogy with RCTs is also useful with respect to one objection that
has been raised for Mendelian randomization studies. This is that the envir-
onmentally modifiable exposure proxied for by the genetic variants (such as
alcohol intake or circulating CRP levels) is influenced by many other factors
in addition to the genetic variants (Jousilahti & Salomaa, 2004). This is, of
course, true. However, consider an RCT of blood pressure–lowering medica-
tion. Blood pressure is influenced mainly by factors other than taking blood
pressure–lowering medication—obesity, alcohol intake, salt consumption and
Mendelian Randomized
randomization controlled trial
Random segregation of
Randomization method
alleles
Control: Exposed: Control:

Exposed: one
other No
allelle
allelle Intervention intervention
Confounders Confounders
equal between equal between
groups groups
Outcomes compared between Outcomes compared between

groups groups
Figure 9.11 Mendelian randomization and randomized controlled trial designs

compared.
other dietary factors, smoking, exercise, physical fitness, genetic factors, and
early-life developmental influences are all of importance. However, the ran-
domization that occurs in trials ensures that these factors are balanced
between the groups that receive the blood pressure–lowering medication
and those that do not. Thus, the fact that many other factors are related to
the modifiable exposure does not vitiate the power of RCTs; neither does it
vitiate the strength of Mendelian randomization designs.
A related objection is that the genetic variants often explain only a trivial
proportion of the variance in the environmentally modifiable risk factor that
is being proxied for (Glynn, 2006). Again, consider an RCT of blood pressure–
lowering medication where 50% of participants received the medication
and 50% received a placebo. If the antihypertensive therapy reduced blood
pressure (BP) by a quarter of a standard deviation (SD), which is approxi-
mately the situation for such pharmacotherapy, then within the whole study
group treatment assignment (i.e., antihypertensive use vs. placebo) will
explain less than 2% of the variance in blood pressure. In the example of
CRP haplotypes used as instruments for CRP levels, these haplotypes explain
1.66% of the variance in CRP levels in the population (Lawlor et al., 2008).
As can be seen, the quantitative association of genetic variants as instru-
ments can be similar to that of randomized treatments with respect to the
biological processes that such treatments modify. Both logic and quantifica-
tion fail to support criticisms of the Mendelian randomization approach
based on either the obvious fact that many factors influence most pheno-
types of interest or the fact that particular genetic variants account for only a
small proportion of variance in the phenotype.
Mendelian Randomization and Instrumental

Variable Approaches
As well as the analogy with RCTs, Mendelian randomization can also be

likened to instrumental variable approaches, which have been heavily utilized
in econometrics and social science, although rather less so in epidemiology.
In an instrumental variable approach the instrument is a variable that is
related to the outcome only through its association with the modifiable expo-
sure of interest. The instrument is not related to confounding factors nor is
its assessment biased in a manner that would generate a spurious association
with the outcome. Furthermore, the instrument will not be influenced by the
development of the outcome (i.e., there will be no reverse causation). Figure
9.12 presents this basic schema, where the dotted line between genotype and
outcome provides an unconfounded and unbiased estimate of the causal
association between the exposure that the genotype is proxying for and
the outcome. The development of instrumental variable methods within
Geneotype
Exposure Outcome
Confounders; reverse
causation; bias
Figure 9.12 Mendelian randomization as an instrumental variables approach.
econometrics, in particular, has led to a sophisticated suite of statistical

methods for estimating causal effects, and these have now been applied
within Mendelian randomization studies (e.g., Davey Smith et al., 2005a,
2005b; Timpson et al., 2005). The parallels between Mendelian randomiza-
tion and instrumental variable approaches are discussed in more detail
elsewhere (Thomas & Conti, 2004; Lawlor et al., 2008).
The instrumental variable method allows for estimation of the causal effect
size of the modifiable environmental exposure of interest and the outcome,
together with estimates of the precision of the effect. Thus, in the example of
alcohol intake (indexed by ALDH2 genotype) and blood pressure discussed
earlier it is possible to utilize the joint associations of ALDH2 genotype and
alcohol and ALDH2 genotype and blood pressure to estimate the causal influ-
ence of alcohol intake on blood pressure. Figure 9.13 reports such an analysis,
showing that for a 1 g/day increase in alcohol intake there are robust increases
in diastolic and systolic blood pressure among men (Chen et al., 2008).
Mendelian Randomization and Gene by

Environment Interaction
Mendelian randomization is one way in which genetic epidemiology can inform

our understanding about environmental determinants of disease. A more con-
ventional approach has been to study interactions between environmental expo-
sures and genotype (Perera, 1997; Mucci, Wedren, Tamimi, Trichopoulos, &
Adami, 2001). From epidemiological and Mendelian randomization perspec-
tives, several issues arise with gene–environment interactions.
The most reliable findings in genetic association studies relate to the
main effects of polymorphisms on disease risk (Clayton & McKeigue,
2001). The power to detect meaningful gene–environment interaction is
low (Wright, Carothers, & Campbell, 2002), with the result being that there
are a large number of reports of spurious gene–environment interactions in
Alcohol-BP
effect (95% CI)
Diastolic:
Amamoto et al., 2002 [18] 0.17 (0.06, 0.28)
Takagi et al., 2001 [19] 0.15 (0.08, 0.22)
Tsuritani et al., 1995 [20] 0.16 (0.07, 0.26)
Subtotal (I2 = 0.0%, p = 0.970) 0.16 (0.11, 0.21)
Systolic:
Amamoto et al., 2002 [18] 0.29 (0.12, 0.47)
Takagi et al., 2001 [19] 0.28 (0.16, 0.40)
Tsuritani et al., 1995 [20] 0.18 (0.05, 0.31)
Subtotal (I2 = 0.0%, p = 0.439) 0.24 (0.16, 0.32)
0 .1 .2 .3 .4 .5
mmHg per g/day
Figure 9.13 Instrumental variable estimates of difference in systolic and diastolic blood
pressure produced by 1g per day hyper alcohol intake.
the medical literature (Colhoun, McKeigue, & Davey Smith, 2003). The pre-
sence or absence of statistical interactions depends upon the scale (e.g., linear
or logarithmic with respect to the exposure–disease outcome), and the mean-
ing of observed deviation from either an additive or a multiplicative model is
not clear. Furthermore, the biological implications of interactions (however
defined) are generally uncertain (Thompson, 1991). Mendelian randomization
is most powerful when studying modifiable exposures that are difficult to
measure and/or considerably confounded, such as dietary factors. Given mea-
surement error—particularly if this is differential with respect to other factors
influencing disease risk—interactions are both difficult to detect and often
misleading when, apparently, they are found (Clayton & McKeigue, 2001).
The situation is perhaps different with exposures that differ qualitatively
rather than quantitatively between individuals. Consider the issue of the
influence of smoking tobacco on bladder-cancer risk. Observational studies
suggest an association, but clearly confounding and a variety of biases could
generate such an association. The potential carcinogens in tobacco smoke of
relevance to bladder-cancer risk include aromatic and heterocyclic amines,
which are detoxified by N-acetyltransferase 2 (NAT2). Genetic variation in
NAT2 enzyme levels leads to slower or faster acetylation states. If the carci-
nogens in tobacco smoke do increase the risk of bladder cancer, then it would
be expected that slow acetylators, those who have a reduced rate of detoxifica-
tion of these carcinogens, would be at an increased risk of bladder cancer if
they were smokers, whereas if they were not exposed to these carcinogens
(and the major exposure route for those outside of particular industries is
through tobacco smoke), then an association of genotype with bladder-cancer
risk would not be anticipated. Table 9.3 tabulates findings from a large study
reported in a way that allows analysis of this simple hypothesis (Gu, Liang,
Wang, Lu, & Wu, 2005). As can be seen, the influence of the NAT2 slow-
acetylation genotype is appreciable only among those also exposed to heavy
smoking. Since the genotype will be unrelated to confounders, it is difficult
to reason why this situation should arise unless smoking is a causal factor
with respect to bladder cancer. Thus, the presence of a sizable effect of
genotype in the exposed group but not in the unexposed group provides
evidence as to the causal nature of the environmentally modifiable risk
factor—in this example, smoking. It must be recognized, however, that
gene by environment interactions interpreted within the Mendelian randomi-
zation framework as evidence regarding the causal nature of environmentally
modifiable exposures are not protected from confounding to the extent that
main genetic effects are. In the NAT2/smoking/bladder cancer example any
factor related to smoking—such as social class—will tend to show a greater
association with bladder cancer within NAT2 slow acetylators than within
NAT2 rapid acetylators. Because there is not a one-to-one association of
social class with smoking, this will not produce the qualitative interaction
of essentially no effect of the genotype in one exposure stratum and an effect
in the other, as in the NAT2/smoking interaction, but rather a qualitative
interaction of a greater effect of NAT2 in the poorer social classes (among
whom smoking is more prevalent) and a smaller (but still evident) effect
in the better-off social classes, among whom smoking is less prevalent.
Thus, situations in which both the biological basis of an expected interaction
is well understood and a qualitative (effect vs. no effect) interaction may be
anticipated are the ones that are most amenable to interpretations related to
the general causal nature of the environmentally modifiable risk factor.
Problems and Limitations of Mendelian Randomization
We consider Mendelian randomization to be one of the brightest current

prospects for improving causal understanding within population-based
studies. There are, however, several potential limitations to the application
of this methodology (Davey Smith & Ebrahim, 2003; Little & Khoury, 2003).
Table 9.3 NAT2 (Slow vs. Fast Acetylator) risk, stratified by smoking status and Bladder
Cancer
Overall Never/Light Smokers Heavy Smokers

1.35 (1.04–1.75) 1.10 (0.78–1.53) 2.11 (1.30–3.43)
From data in Gu et al. (2005).

Failure to Establish Reliable Genotype–Intermediate

Phenotype or Genotype–Disease Associations
If the associations between genotype and a potential intermediate phenotype
or between genotype and disease outcome are not reliably estimated, then
interpreting these associations in terms of their implications for potential
environmental causes of disease will clearly be inappropriate. This is not
an issue peculiar to Mendelian randomization; rather, the nonreplicable
nature of perhaps most apparent findings in genetic association studies is
a serious limitation to the whole enterprise. This issue has been discussed
elsewhere (Cardon & Bell, 2001; Colhoun et al., 2003) and will not be dealt
with further here. Instead, problems with the Mendelian randomizat-
ion approach even when reliable genotype–phenotype associations can be
determined will be addressed.
Confounding of Genotype–Environmentally Modifiable Risk

Factor–Disease Associations
The power of Mendelian randomization lies in its ability to avoid the often
substantial confounding seen in conventional observational epidemiology.
However, confounding can be reintroduced into Mendelian randomization
studies; and when interpreting the results, whether this has arisen needs
to be considered.
Linkage Disequilibrium
It is possible that the locus under study is in linkage disequilibrium (i.e., is

associated) with another polymorphic locus, with the effect of the polymorph-
ism under investigation being confounded by the influence of the other
polymorphism. It may seem unlikely—given the relatively short distances
over which linkage disequilibrium is seen in the human genome—that a
polymorphism influencing, say, CHD risk would be associated with another
polymorphism influencing CHD risk (and thus producing confounding).
There are, nevertheless, cases of different genes influencing the same meta-
bolic pathway being in physical proximity. For example, different polymorph-
isms influencing alcohol metabolism appear to be in linkage disequilibrium
(Osier et al., 2002).
Pleiotropy and the Multifunction of Genes
Mendelian randomization is most useful when it can be used to relate a single

intermediate phenotype to a disease outcome. However, polymorphisms may
(and probably often will) influence more than one intermediate phenotype,
and this may mean they proxy for more than one environmentally modifi-
able risk factor. This can be the case through multiple effects mediated by
their RNA expression or protein coding, through alternative splicing, where
one polymorphic region contributes to alternative forms of more than one
protein (Glebart, 1998), or through other mechanisms. The most robust
interpretations will be possible when the functional polymorphism appears
to directly influence the level of the intermediate phenotype of interest (as in
the CRP example), but such examples are probably going to be less common
in Mendelian randomization than cases where the polymorphism can influ-
ence several systems, with different potential interpretations of how the
effect on outcome is generated.
How to Investigate Reintroduced Confounding Within

Mendelian Randomization
Linkage disequilibrium and pleiotropy can reintroduce confounding and vitiate

the power of the Mendelian randomization approach. Genomic knowledge may
help in estimating the degree to which these are likely to be problems in any
particular Mendelian randomization study, through, for instance, explication of
genetic variants that may be in linkage disequilibrium with the variant under
study or the function of a particular variant and its known pleiotropic effects.
Furthermore, genetic variation can be related to measures of potential con-
founding factors in each study, and the magnitude of such confounding can
be estimated. Empirical studies to date suggest that common genetic variants
are largely unrelated to the behavioral and socioeconomic factors considered
to be important confounders in conventional observational studies. However,
relying on measuring of confounders does, of course, remove the central
purpose of Mendelian randomization, which is to balance unmeasured as
well as measured confounders (as randomization does in RCTs).
In some circumstances the genetic variant will be related to the environ-
mentally modifiable exposure of interest in some populations but not in
others. An example of this relates to the alcohol ALDH2 genotype and
blood pressure example discussed earlier. The results displayed relate to
men because in the populations under study women drink very little what-
ever their genotype (Figure 9.14). If ALDH2 genetic variation influenced
blood pressure for reasons other than its influence on alcohol intake, for
example, if it was in linkage disequilibrium with another genetic variant
that influenced blood pressure through another pathway or if there was a
pleiotropic effect of the genetic variant on blood pressure, the same geno-
type–blood pressure association should be seen among men and women.
If, however, the genetic variant influences only blood pressure through its
effect on alcohol intake, an effect should be seen only in men. Figure 9.15
demonstrates that the genotype–blood pressure association is indeed seen
only in men, further strengthening evidence that the genotype–blood pres-

sure association depends upon the genotype influencing alcohol intake and
that the associations do indeed provide casual evidence of an influence of
alcohol intake on blood pressure.
In some cases it may be possible to identify two separate genetic variants
that are not in linkage disequilibrium with each other but that serve as
proxies for the environmentally modifiable risk factor of interest. If both
variants are related to the outcome of interest and point to the same under-
lying association, then it becomes much less plausible that reintroduced
confounding explains the association since it would have to be acting in
the same way for these two unlinked variants. This can be likened to RCTs
of different blood pressure–lowering agents, which work through different
mechanisms and have different potential side effects but lower blood pres-
sure to the same degree. If the different agents produce the same reductions
in cardiovascular disease risk, then it is unlikely that this is through agent-
specific effects of the drugs; rather, it points to blood pressure lowering as
being key. The use of multiple genetic variants working through different
pathways has not been applied in Mendelian randomization to date but
represents an important potential development in the methodology.
Canalization and Developmental Stability

Perhaps a greater potential problem for Mendelian randomization than rein-
troduced confounding arises from the developmental compensation that may
60
Women
Men
50
Alcohol g/day
40
30
20
10
0
*1*1 *1*2 *2*2
Figure 9.14 ALDH2 genotype by alcohol consumption (g/day): five studies, n = 6,815.
From ‘‘Alcohol intake and blood pressure: A systematic review implementing
Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, &
S. Lewis, 2008, PLoS Medicine, 5, e52.
22vs11 (Male)
Saito et al., 2003 −13.90 (−20.45, −7.35)
Tsuritani et al., 1955 −6.80 (−13.06, −0.54)
Amamoto et al., 2002 −8.40 (−13.13, −3.67)
Yamada et al., 2002 −6.80 (−11.28, −2.32)
Takagi et al., 2001 −5.90 (−9.15, −2.65)
Subtotal (I-squared = 18.0%, p = 0.300) −7.44 (−9.49, −5.39)
22vs11 (Female)
Amamoto et al., 2002 0.90 (-3.33, 5.13)
Takagi et al., 2001 0.10 (-3.07, 3.27)
Subtotal (I-squared = 0.0%, p = 0.767) 0.39 (-2.15, 2.93)
Figure 9.15 ALDH2 genotype and systolic blood pressure. From ‘‘Alcohol intake and blood
pressure: A systematic review implementing Mendelian randomization approach,’’ by L.
Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
occur through a polymorphic genotype being expressed during fetal or early

postnatal development and, thus, influencing development in such a way as
to buffer against the effect of the polymorphism. Such compensatory pro-
cesses have been discussed since C. H. Waddington (1942) introduced the
notion of canalization in the 1940s. Canalization refers to the buffering of the
effects of either environmental or genetic forces attempting to perturb devel-
opment, and Waddington’s ideas have been well developed both empirically
and theoretically (Wilkins, 1997; Rutherford, 2000; Gibson & Wagner, 2000;
Hartman, Garvik, & Hartwell, 2001; Debat & David, 2001; Kitami & Nadeau,
2002; Gu et al., 2003; Hornstein & Shomron, 2006). Such buffering can be
achieved either through genetic redundancy (more than one gene having the
same or similar function) or through alternative metabolic routes, where the
complexity of metabolic pathways allows recruitment of different pathways to
reach the same phenotypic end point. In effect, a functional polymorphism
expressed during fetal development or postnatal growth may influence the
expression of a wide range of other genes, leading to changes that may
compensate for the influence of the polymorphism. Put crudely, if a
person has developed and grown from the intrauterine period onward
within an environment in which one factor is perturbed (e.g., there is ele-
vated CRP due to genotype), then that person may be rendered resistant to
the influence of lifelong elevated circulating CRP, through permanent
changes in tissue structure and function that counterbalance its effects. In
intervention trials—for example, RCTs of cholesterol-lowering drugs—the
intervention is generally randomized to participants during middle age; simi-
larly, in observational studies of this issue, cholesterol levels are ascertained
during adulthood. In Mendelian randomization, on the other hand, rando-
mization occurs before birth. This leads to important caveats when attempt-
ing to relate the findings of conventional observational epidemiological
studies to the findings of studies carried out within the Mendelian randomi-
zation paradigm.
The most dramatic demonstrations of developmental compensation come

from knockout studies, where a functioning gene is essentially removed from an
organism. The overall phenotypic effects of such knockouts have often been
much lower than knowledge of the function of the genes would predict, even in
the absence of others genes carrying out the same function as the knockout
gene (Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner , 2000).
For example, pharmacological inhibition demonstrates that myoglobulin is
essential to maintain energy balance and contractile function in the myocar-
dium of mice, yet disrupting the myoglobulin gene resulted in mice devoid of
myoglobulin with no disruption of cardiac function (Garry et al., 1998).
In the field of animal studies—such as knockout preparations or trans-
genic animals manipulated so as to overexpress foreign DNA—the interpre-
tive problem created by developmental compensation is well recognized
(Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner, 2000).
Conditional preparations—in which the level of transgene expression can
be induced or suppressed through the application of external agents—are
now being utilized to investigate the influence of such altered gene expres-
sion after the developmental stages during which compensation can occur
(Bolon & Galbreath, 2002). Thus, further evidence on the issue of genetic
buffering should emerge to inform interpretations of both animal and
human studies.
Most examples of developmental compensation relate to dramatic genetic
or environmental insults; thus, it is unclear whether the generally small
phenotypic differences induced by common functional polymorphisms will
be sufficient to induce compensatory responses. The fact that the large gene–
environment interactions that have been observed often relate to novel expo-
sures that have not been present during the evolution of a species (e.g., drug
interactions) (Wright et al., 2002) may indicate that homogenization of
response to exposures that are widely experienced—as would be the case
with the products of functional polymorphisms or common mutations—
has occurred; canalizing mechanisms could be particularly relevant in these
cases. Further work on the basic mechanisms of developmental stability and
how this relates to relatively small exposure differences during development
will allow these considerations to be taken forward. Knowledge of the stage of
development at which a genetic variant has functional effects will also allow
the potential of developmental compensation to buffer the response to the
variant to be assessed.
In some Mendelian randomization designs developmental compensation
is not an issue. For example, when maternal genotype is utilized as an
indicator of the intrauterine environment, the response of the fetus will
not differ whether the effect is induced by maternal genotype or by environ-
mental perturbation and the effect on the fetus can be taken to indicate the
effect of environmental influences during the intrauterine period. Also, in

cases where a variant influences an adulthood environmental exposure (e.g.,
ALDH2 variation and alcohol intake), developmental compensation to geno-
type will not be an issue. In many cases of gene by environment interaction
interpreted with respect to causality of the environmental factor, the same
applies. However, in some situations there remains the somewhat unsatisfac-
tory position of Mendelian randomization facing a potential problem that
cannot currently be adequately assessed. The parallels between Mendelian
randomization in human studies and equivalent designs in animal studies
are discussed in Box 9.3.
Complexity of Associations and Interpretations
The interpretation of findings from studies that appear to fall within the
Mendelian randomization remit can often be complex, as has been previously
discussed with respect to MTHFR and folate intake (Davey Smith & Ebrahim,
2003). As a second example, consider the association of extracellular super-
oxide dismutase (EC-SOD) and CHD. EC-SOD is an extracellular scavenger
of superoxide anions, and thus, genetic variants associated with higher cir-
culating EC-SOD levels might be considered to mimic higher levels of anti-
oxidants. However, findings are dramatically opposite to this—bearers of such
variants have an increased risk of CHD (Juul et al., 2004). The explanation of
this apparent paradox may be that the higher circulating EC-SOD levels
associated with the variant arises from movement of EC-SOD from arterial
walls; thus, the in situ antioxidative properties of these arterial walls are lower
in individuals with the variant associated with higher circulating EC-SOD.
The complexity of these interpretations—together with their sometimes spec-
ulative nature—detracts from the transparency that otherwise makes
Mendelian randomization attractive.
Lack of Suitable Genetic Variants to Proxy for

Exposure of Interest
An obvious limitation of Mendelian randomization is that it can examine

only areas for which there are functional polymorphisms (or genetic markers
linked to such functional polymorphisms) that are relevant to the modifiable
exposure of interest. In the context of genetic association studies more gen-
erally it has been pointed out that in many cases even if a locus is involved in
a disease-related metabolic process there may be no suitable marker or func-
tional polymorphism to allow study of this process (Weiss & Terwilliger,
2000). In an earlier work on Mendelian randomization (Davey Smith &
Ebrahim, 2003) we discussed the example of vitamin C since one of our
box 9.3
Meiotic Randomization in Animal Studies
The approach to causal inference underlying Mendelian randomization is

also utilized in nonhuman animal studies. For instance, in investigations of
the structural neuroanatomical factors underlying behavioral traits in
rodents, there has been use of genetic crosses that lead to different on-
average structural features (Roderic, Wimer, & Wimer, 1976; Weimer, 1973;
Lipp et al., 1989). Lipp et al. (1989) refer to this as ‘‘meiotic randomization’’
and consider that the advantages of this method are that the brain-morphol-
ogy differences that are due to genetic difference occur before any of the
behavioral traits develop and, therefore, these differences cannot be a feed-
back function of behavior (which is equivalent to the avoidance of reverse
causality in human Mendelian randomization studies) and that other
difference between the animals are randomized with respect to the brain-
morphology differences of interest (equivalent to the avoidance of confound-
ing in human Mendelian randomization studies). Li and colleagues (2006)
apply this method to the dissection of adiposity and body composition in
mice and point out that in experimental crosses
meiosis serves as a randomization mechanism that distributes natu-

rally occurring genetic variation in a combinatorial fashion among a set
of cross progeny. Genetically randomized populations share the proper-
ties of statistically designed experiments that provide a basis for causal
inference. This is consistent with the notion that causation flows from
genes to phenotypes. We propose that the inference of causal direction
can be extended to include relationships among phenotypes.
Mendelian randomization within epidemiology reflects similar thinking

among transgenic animal researchers. Williams and Wagner (2000)
consider that
A properly designed transgenic experiment can be a thing of exquisite

beauty in that the results support absolutely unambiguous conclusions
regarding the function of a given gene or protein within the authentic
biological context of an intact animal. A transgenic experiment may
provide the most rigorous test possible of a mechanistic hypothesis
that was generated by previous observational studies. A successful
transgenic experiment can cut through layers of uncertainty that
cloud the interpretation of the results produced by other experimental
designs.
(continued)
Box 9.3 (continued)
The problems of interpreting some aspects of transgenic animal stu-

dies may also apply to Mendelian randomization within genetic epidemiol-
ogy, however; and linked progress across the fields of genomics, animal
experimentation, and epidemiology will better define the scope of
Mendelian randomization in the future.
examples of how observational epidemiology appeared to have got the wrong

answer related to vitamin C. We considered whether the association between
vitamin C and CHD could have been studied utilizing the principles of
Mendelian randomization. We stated that polymorphisms exist that are
related to lower circulating vitamin C levels—for example, the haptoglobin
polymorphism (Langlois, Delanghe, De Buyzere, Bernard, & Ouyang, 1997;
Delanghe, Langlois, Duprez, De Buyzere, & Clement, 1999)—but in this case
the effect on vitamin C is at some distance from the polymorphic protein
and, as in the apolipoprotein E example, the other phenotypic differences
could have an influence on CHD risk that would distort examination of the
influence of vitamin C levels through relating genotype to disease.
SLC23A1—a gene encoding for the vitamin C transporter SVCT1 and
vitamin C transport by intestinal cells—would be an attractive candidate for
Mendelian randomization studies. However, by 2003 (the date of our earlier
report) a search for variants had failed to find any common single-nucleotide
polymorphism that could be used in such a way (Erichsen, Eck, Levine, &
Chanock, 2001). We therefore used this as an example of a situation where
suitable polymorphisms for studying the modifiable risk factor of interest—in
this case, vitamin C—could not be located. However, since the earlier report,
a functional variation in SLC23A1 has been identified that is related to cir-
culating vitamin C levels (N. J. Timpson et al., 2010). We use this example
not to suggest that the obstacle of locating relevant genetic variation for
particular problems in observational epidemiology will always be overcome
but to point out that rapidly developing knowledge of human genomics will
identify more variants that can serve as instruments for Mendelian randomi-
zation studies.
Conclusions: Mendelian Randomization, What It Is and

What It Is Not
Mendelian randomization is not predicated on the presumption that genetic

variants are major determinants of health and disease within populations.
There are many cogent critiques of genetic reductionism and the overselling
of ‘‘discoveries’’ in genetics that reiterate obvious truths so clearly (albeit
somewhat repetitively) that there is no need to repeat them here (e.g.,
Berkowitz, 1996; Baird, 2000; Holtzman, 2001; Strohman, 1993; Rose,
1995). Mendelian randomization does not depend upon there being ‘‘genes
for’’ particular traits and certainly not in the strict sense of a gene for a trait
being one that is maintained by selection because of its causal association
with that trait (Kaplan & Pigliucci, 2001). The association of genotype and
the environmentally modifiable factor that it proxies for will be like most
genotype–phenotype associations, one that is contingent and cannot be
reduced to individual-level prediction but within environmental limits will
pertain at a group level (Wolf, 1995). This is analogous to an RCT of anti-
hypertensive agents, where at the collective level the group randomized to
active medication will have lower mean blood pressure than the group ran-
domized to placebo but at the individual level many participants randomized
to active treatment will have higher blood pressure than many individuals
randomized to placebo. Indeed, in the phenocopy/genocopy example of pel-
lagra and Hartnup disease discussed in Box 9.1, only a minority of the
Hartnup gene carriers develop symptoms but at the group level they have
a much greater tendency for such symptoms and a shift in amino acid levels
that reflects this (Scriver, Mahon, & Levy, 1987; Scriver, 1988). These group-
level differences are what creates the analogy between Mendelian randomiza-
tion and RCTs, outlined in Figure 9.11.
Finally, the associations that Mendelian randomization depend upon do
need to pertain to a definable group at a particular time but do not need to be
immutable. Thus, ALDH2 variation will not be related to alcohol consump-
tion in a society where alcohol is not consumed, and the association will vary
by gender and by cultural group and may change over time (Higuchi et al.,
1994; Hasin et al., 2002). Within the setting of a study of a well-defined
group, however, the genotype will be associated with group-level differences
in alcohol consumption and group assignment will not be associated with
confounding variables.
Mendelian Randomization and Genetic Epidemiology

Critiques of contemporary genetic epidemiology often focus on two features
of findings from genetic association studies: that the population attributable
risk of the genetic variants is low and that in any case the influence of
genetic factors is not reversible. Illustrating both of these criticisms,
Terwilliger and Weiss (2003, p. 35) suggest as reasons for considering that
many of the current claims regarding genetic epidemiology are hype (1) that
alleles identified as increasing the risk of common diseases ‘‘tend to be
involved in only a small subset of all cases of such diseases’’ and (2) that in
any case ‘‘while the concept of attributable risk is an important one for
evaluating the impact of removable environmental factors, for non-removable
genetic risk factors, it is a moot point.’’ These evaluations of the role of
genetic epidemiology are not relevant when considering the potential contri-
butions of Mendelian randomization. This approach is not concerned with
the population attributable risk of any particular genetic variant but the
degree to which associations between the genetic variant and disease out-
comes can demonstrate the importance of environmentally modifiable factors
as causes of disease, for which the population attributable risk is of relevance
to public-health prioritization. Consider, for example, the case of familial
hypercholesterolemia or familial defective apo B. The genetic mutations asso-
ciated with these conditions will account for only a trivial percentage of cases
of CHD within the population (i.e., the population attributable risk will be
low). For example, in a Danish population, the frequency of familial defective
apo B is 0.08% and, despite its sevenfold increased risk of CHD, will generate
a population attributable risk of only 0.5% (Tybjaerg-Hansen, Steffensen,
Meinertz, Schnohr, & Nordestgaard, 1998). However, by identifying blood
cholesterol levels as a causal factor for CHD, the triangular association
between genotype, blood cholesterol, and CHD risk identifies an environmen-
tally modifiable factor with a very high population attributable risk—assuming
that 50% of the population have raised blood cholesterol above 6.0 mmol/l
and this is associated with a relative risk of twofold, a population attributable
risk of 33% is obtained. The same logic applies to the other examples—the
attributable risk of the genotype is low, but the population attributable risk of
the modifiable environmental factor identified as causal through the geno-
type–disease associations is large. The same reasoning applies when consid-
ering the suggestion that since genotype cannot be modified, genotype–
disease associations are not of public-health importance (Terwilliger &
Weiss, 2003). The point of Mendelian randomization approaches is not to
attempt to modify genotype but to utilize genotype–disease associations to
strengthen inferences regarding modifiable environmental risks for disease
and then reduce disease risk in the population through applying this
knowledge.
Mendelian randomization differs from other contemporary approaches to
genetic epidemiology in that its central concern is not with the magnitude of
genetic variant influences on disease but, rather, with what the genetic asso-
ciations tell us about environmentally modifiable causes of disease. Many
years ago, in this Noble Prize acceptance speech, the pioneering geneticist
Thomas Hunt Morgan contrasted his views with the then popular genetic
approach to disease, eugenics. He thought that ‘‘through public hygiene and
protective measures of various kinds we can more successfully cope with
some of the evils that human flesh is heir to. Medical science will here take
the lead—but I hope that genetics can at times offer a helping hand’’
(Morgan, 1935). More than seven decades later, it might now be time
that genetic research can directly strengthen the knowledge base of public
health.
References
Ames, B. N. (1999). Cancer prevention and diet: Help from single nucleotide poly-
morphisms. Proceedings of the National Academy of Sciences USA, 96, 12216–12218.
Baird, P. (2000). Genetic technologies and achieving health for populations.
International Journal of Health Services, 30, 407–424.
Baron, D. N., Dent, C. E., Harris, H., Hart, E. W., & Jepson, J. B. (1956). Hereditary
pellagra-like skin rash with temporary cerebellar ataxia, constant renal amino-
aciduria, and other bizarre biochemical features. Lancet, 268, 421–429.
Berkowitz, A. (1996). Our genes, ourselves? Bioscience, 46, 42–51.
Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital
data. Biometric Bulletin, 2, 47–53.
Bhatti, P., Sigurdson, A. J., Wang, S. S., Chen, J., Rothman, N., Hartge, P., et al.
(2005). Genetic variation and willingness to participate in epidemiological research:
Data from three studies. Cancer Epidemiology, Biomarkers and Prevention, 14,
2449–2453.
Birge, S. J., Keutmann, H. T., Cuatrecasas, P., & Whedon, G. D. (1967). Osteoporosis,
intestinal lactase deficiency and low dietary calcium intake. New England Journal of
Medicine, 276, 445–448.
Bolon, B., & Galbreath, E. (2002). Use of genetically engineered mice in drug discovery
and development: Wielding Occam’s razor to prune the product portfolio.
International Journal of Toxicology, 21, 55–64.
Botto, L. D., & Yang, Q. (2000). 5,10-Methylenetetrahydrofolate reductase gene variants
and congenital anomalies: A HuGE review. American Journal of Epidemiology, 151,
862–877.
Bovet, P., & Paccaud, F. (2001). Alcohol, coronary heart disease and public health:
Which evidence-based policy? International Journal of Epidemiology, 30, 734–737.
Brennan, P. (2002). Gene environment interaction and aetiology of cancer: What does
it mean and how can we measure it? Carcinogenesis, 23(3), 381–387.
Broer, S., Cavanaugh, J. A., & Rasko, J. E. J. (2004). Neutral amino acid transport in
epithelial cells and its malfunction in Hartnup disorder. Transporters, 33, 233–236.
Burd, L. J. (2006). Interventions in FASD: We must do better. Child: Care, Health, and
Development, 33, 398–400.
Burr, M. L., Fehily, A. M., Butland, B. K., Bolton, C. H., & Eastham, R. D. (1986).
Alcohol and high-density-lipoprotein cholesterol: A randomized controlled trial.
British Journal of Nutrition, 56, 81–86.
Cardon, L. R., & Bell, J. I. (2001). Association study designs for complex diseases.
Nature Reviews: Genetics, 2, 91–99.
Casas, J. P., Shah, T., Cooper, J., Hawe, E., McMahon, A. D., Gaffney, D., et al. (2006).
Insight into the nature of the CRP–coronary event association using Mendelian
randomization. International Journal of Epidemiology, 35, 922–931.
Chao, Y.-C., Liou, S.-R., Chung, Y.-Y., Tang, H.-S., Hsu, C.-T., Li, T.-K., et al. (1994).
Polymorphism of alcohol and aldehyde dehydrogenase genes and alcoholic cirrhosis
in Chinese patients. Hepatology, 19, 360–366.
Chen, L., Davey Smith, G., Harbord, R., & Lewis, S. (2008). Alcohol intake and blood
pressure: A systematic review implementing Mendelian randomization approach.
PLoS Medicine, 5, e52.
Cheverud, J. M. (1988). A comparison of genetic and phenotypic correlations.
Evolution, 42, 958–968.
Clayton, D., & McKeigue, P. M. (2001). Epidemiological methods for studying genes
and environmental factors in complex diseases. Lancet, 358, 1356–1360.
Colhoun, H., McKeigue, P. M., & Davey Smith, G. (2003). Problems of reporting
genetic associations with complex outcomes. Lancet, 361, 865–872.
Correns, C. (1900). G. Mendel’s Regel über das Verhalten der Nachkommenschaft
der Bastarde. Berichte der Deutschen Botanischen Gesellschaft, 8, 158–168. (English
translation, Correns, C. [1966]. G. Mendel’s law concerning the behavior of
progeny of varietal hybrids. In Stern and Sherwood [pp. 119–132]. New York:
W. H. Freeman.)
Czeizel, A. E., & Dudás, I. (1992). Prevention of the first occurrence of neural-tube
defects by periconceptional vitamin supplementation. New England Journal of
Medicine, 327, 1832–1835.
Danesh, J., Wheller, J. B., Hirschfield, G. M., Eda, S., Eriksdottir, G., Rumley, A., et al.
(2004). C-reactive protein and other circulating markers of inflammation in the predic-
tion of coronary heart disease. New England Journal of Medicine, 350, 1387–1397.
Davey Smith, G. (2006). Cochrane Lecture. Randomised by (your) god: Robust infer-
ence from an observational study design. Journal of Epidemiology and Community
Health, 60, 382–388.
Davey Smith, G., & Ebrahim, S. (2002). Data dredging, bias, or confounding
[Editorial]. British Medical Journal, 325, 1437–1438.
Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic
epidemiology contribute to understanding environmental determinants of disease?
Davey Smith, G., & Ebrahim, S. (2004). Mendelian randomization: Prospects, poten-
tials, and limitations. International Journal of Epidemiology, 33, 30–42.
Davey Smith, G., & Ebrahim, S. (2005). What can Mendelian randomization tell us
about modifiable behavioural and environmental exposures. British Medical Journal,
330, 1076–1079.
Davey Smith, G., Harbord, R., Milton, J., Ebrahim, S., & Sterne, J. A. C. (2005a). Does
elevated plasma fibrinogen increase the risk of coronary heart disease? Evidence
from a meta-analysis of genetic association studies. Arteriosclerosis, Thrombosis, and
Vascular Biology, 25, 2228–2233.
Davey Smith, G., & Hart, C. (2002). Lifecourse socioeconomic and behavioural influ-
ences on cardiovascular disease mortality: The Collaborative study. American Journal
of Public Health, 92, 1295–1298.
Davey Smith, G., Lawlor, D. A., Harbord, R., Timpson, N. J., Day, I., & Ebrahim, S.
(2008). Clustered environments and randomized genes: A fundamental distinction
between conventional and genetic epidemiology. PLoS Medicine, 4, 1985–1992.
Davey Smith, G., Lawlor, D., Harbord, R., Timpson, N., Rumley, A., Lowe, G., et al.
(2005b). Association of C-reactive protein with blood pressure and hypertension:
Lifecourse confounding and Mendelian randomization tests of causality.
Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 1051–1056.
Davey Smith, G., & Phillips, A. N. (1996). Inflation in epidemiology: ‘‘The proof and
measurement of association between two things’’ revisited. British Medical Journal,
312, 1659–1661.
Davey Smith, G., Timpson, N. & Ebrahim, S. (2008). Strengthening causal inference
in cardiovascular epidemiology through Mendelian randomization. Annals of
Medicine, 40, 524–541.
Debat, V., & David, P. (2001). Mapping phenotypes: Canalization, plasticity and devel-
opmental stability. Trends in Ecology and Evolution, 16, 555–561.
Delanghe, J., Langlois, M., Duprez, D., De Buyzere, M., & Clement, D. (1999).
Haptoglobin polymorphism and peripheral arterial occlusive disease. Atherosclerosis,
145, 287–292.
Ebrahim, S., & Davey Smith, G. (2008). Mendelian randomization: Can genetic epi-
demiology help redress the failures of observational epidemiology? Human Genetics,
123, 15–33.
Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A., & Hennekens, C. H. (2004).
Randomized trials of vitamin E in the treatment and prevention of cardiovascular
disease. Archives of Internal Medicine, 164, 1552–1556.
Enomoto, N., Takase, S., Yasuhara, M., & Takada, A. (1991). Acetaldehyde metabolism
in different aldehyde dehydrogenase-2 genotypes. Alcoholism, Clinical and
Experimental Research, 15, 141–144.
Erichsen, H. C., Eck, P., Levine, M., & Chanock, S. (2001). Characterization of the
genomic structure of the human vitamin C transporter SVCT1 (SLC23A2). Journal of
Nutrition, 131, 2623–2627.
Færgeman, O. (2003). Coronary artery disease: Genes drugs and the agricultural connec-
tion. Amsterdam: Elsevier.
Fallon, U. B., Ben-Shlomo, Y., & Davey Smith, G. (2001, March 14). Homocysteine and
coronary heart disease. Heart. http://heart.bmjjournals.com/cgi/eletters/85/2/153
Garry, D. J., Ordway, G. A., Lorenz, J. N., Radford, E. R., Chin, R. W., Grange, R., et al.
(1998). Mice without myoglobulin. Nature, 395, 905–908.
Gause, G. F. (1942). The relation of adaptability to adaption. Quarterly Review of
Biology, 17, 99–114.
Gemma, S., Vichi, S., & Testai, E. (2007). Metabolic and genetic factors contributing to
alcohol induced effects and fetal alcohol syndrome. Neuroscience and Biobehavioral
Reviews, 31, 221–229.
Gerlai, R. (2001). Gene targeting: Technical confounds and potential solutions in
behavioural and brain research. Behavioural Brain Research, 125, 13–21.
Gibson, G., & Wagner, G. (2000). Canalization in evolutionary genetics: A stabilizing
theory? BioEssays, 22, 372–380.
Glebart, W. M. (1998). Databases in genomic research. Science, 282, 659–661.
Glynn, R. K. (2006). Genes as instruments for evaluation of markers and causes
[Commentary]. International Journal of Epidemiology, 35, 932–934.
Goldschmidt, R. B. (1938). Physiological genetics. New York: McGraw-Hill.
Gray, R., & Wheatley, K. (1991). How to avoid bias when comparing bone
marrow transplantation with chemotherapy. Bone Marrow Transplantation, 7(Suppl.
3), 9–12.
Gu, J., Liang, D., Wang, Y., Lu, C., & Wu, X. (2005). Effects of N-acetyl transferase 1
and 2 polymorphisms on bladder cancer risk in Caucasians. Mutation Research, 581,
97–104.
Gu, Z., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W., & Li, W.-H. (2003). Role of
duplicate genes in genetic robustness against null mutations. Nature, 421:63–66.
Gutjahr, E., Gmel, G., & Rehm, J. (2001). Relation between average alcohol consump-
tion and disease: An overview. European Addiction Research, 7, 117–127.
Guy, J. T. (1993). Oral manifestations of systematic disease. In C. W. Cummings, J.
Frederick, L. Harker, C. Krause, & D. Schuller (Eds.), Otolaryngology—head and neck
surgery (Vol. 2). St. Louis: Mosby Year Book.
Han, T. S., Sattar, N., Williams, K., Gonzalez-Villalpando, C., Lean, M. E., & Haffner,
S. M. (2002). Prospective study of C-reactive protein in relation to the development
of diabetes and metabolic syndrome in the Mexico City Diabetes Study. Diabetes
Care, 25, 2016–2021.
Hart, C., Davey Smith, G., Hole, D., & Hawthorne, V. (1999). Alcohol consumption
and mortality from all causes, coronary heart disease, and stroke: Results from a
prospective cohort study of Scottish men with 21 years of follow up. British Medical
Journal, 318, 1725–1729.
Hartman, J. L., Garvik, B., & Hartwell, L. (2001). Principles for the buffering of genetic
variation. Science, 291, 1001–1004.
Hasin, D., Aharonovich, E., Liu, X., Mamman, Z., Matseoane, K., Carr, L., et al. (2002).
Alcohol and ADH2 in Israel: Ashkenazis, Sephardics, and recent Russian immi-
grants. American Journal of Psychiatry, 159(8), 1432–1434.
Haskell, W. L., Camargo, C., Williams, P. T., Vranizan, K. M., Krauss, R. M., Lindgren,
F. T., et al. (1984). The effect of cessation and resumption of moderate alcohol
intake on serum high-density-lipoprotein subfractions. New England Journal of
Medicine, 310, 805–810.
Heart Protection Study Collaborative Group. (2002). MRC/BHF Heart Protection Study
of antioxidant vitamin supplementation in 20536 high-risk individuals: A rando-
mised placebo-controlled trial. Lancet, 360, 23–33.
Higuchi, S., Matsuushita, S., Imazeki, H., Kinoshita, T., Takagi, S., & Kono, H. (1994).
Aldehyde dehydrogenase genotypes in Japanese alcoholics. Lancet, 343, 741–742.
Hirschfield, G. M., & Pepys, M. B. (2003). C-reactive protein and cardiovascular dis-
ease: New insights from an old molecule. Quarterly Journal of Medicine, 9, 793–807.
Holtzman, N. A. (2001). Putting the search for genes in perspective. International
Journal of Health Services, 31, 445.
Honkanen, R., Pulkkinen, P., Järvinen, R., Kröger, H., Lindstedt, K., Tuppurainen, M.,
et al. (1996). Does lactose intolerance predispose to low bone density? A population-
based study of perimenopausal Finnish women. Bone, 19, 23–28.
Hornstein, E., & Shomron, N. (2006). Canalization of development by microRNAs.
Nature Genetics, 38, S20–S24.
Hu, F. B., Meigs, J. B., Li, T. Y., Rifai, N., & Manson, J. E. (2004). Inflammatory
markers and risk of developing type 2 diabetes in women. Diabetes, 53, 693–700.
Jablonka-Tavory, E. (1982). Genocopies and the evolution of interdependence.
Evolutionary Theory, 6, 167–170.
Jacobson, S. W., Carr, L. G., Croxford, J., Sokol, R. J., Li, T. K., & Jacobson, J. L. (2006).
Protective effects of the alcohol dehydrogenase-ADH1B allele in children exposed to
alcohol during pregnancy. Journal of Pediatrics, 148, 30–37.
Jousilahti, P., & Salomaa, V. (2004). Fibrinogen, social position, and Mendelian rando-
misation. Journal of Epidemiology and Community Health, 58, 883.
Juul, K., Tybjaerg-Hansen, A., Marklund, S., Heegaard, N. H. H., Steffensen, R.,
Sillesen, H., et al. (2004). Genetically reduced antioxidative protection and increased
ischaemic heart disease risk: The Copenhagen City Heart Study. Circulation,
109, 59–65.
Kaplan, J. M., & Pigliucci, M. (2001). Genes ‘‘for’’ phenotypes: A modern history view.
Biology and Philosophy, 16, 189–213.
Katan, M. B. (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet,
I, 507–508 (reprinted International Journal of Epidemiology, 2004, 34, 9).
Kathiresan, S., Melander, O., Anevski, D., Guiducci, C., Burtt, N. P., Roos, C., et al.
(2008). Polymorphisms associated with cholesterol and risk of cardiovascular events.
New England Journal of Medicine, 358, 1240–1249.
Keavney, B. (2002). Genetic epidemiological studies of coronary heart disease.
Keavney, B., Danesh, J., Parish, S., Palmer, A., Clark, S., Youngman, L., et al.;
International Studies of Infarct Survival (ISIS) Collaborators. (2006). Fibrinogen
and coronary heart disease: Test of causality by ‘‘Mendelian randomization.’’
Kelada, S. N., Eaton, D. L., Wang, S. S., Rothman, N. R., & Khoury, M. J. (2003). The
role of genetic polymorphisms in environmental health. Environmental Health
Perspectives, 111, 1055–1064.
Khaw, K.-T., Bingham, S., Welch, A., Luben, R., Wareham, N., Oakes, S., et al. (2001).
Relation between plasma ascorbic acid and mortality in men and women in EPIC-
Norfolk prospective study: A prospective population study. Lancet, 357, 657–663.
Kitami, T., & Nadeau, J. H. (2002). Biochemical networking contributes more to
genetic buffering in human and mouse metabolic pathways than does gene duplica-
tion. Nature Genetics, 32, 191–194.
Klatsky, A. L. (2001). Could abstinence from alcohol be hazardous to your health
[Commentary]? International Journal of Epidemiology, 30, 739–742.
Kraut, J. A., & Sachs, G. (2005). Hartnup disorder: Unravelling the mystery. Trends in
Pharmacological Sciences, 26, 53–55.
Langlois, M. R., Delanghe, J. R., De Buyzere, M. L., Bernard, D. R., & Ouyang, J.
(1997). Effect of haptoglobin on the metabolism of vitamin C. American Journal of
Clinical Nutrition, 66, 606–610.
Lawlor, D. A., Davey Smith, G., Kundu, D., Bruckdorfer, K. R., & Ebrahim, S. (2004).
Those confounded vitamins: what can we learn from the differences between obser-
vational versus randomised trial evidence? Lancet, 363, 1724–1727.
Lawlor, D. A., Ebrahim, S., Kundu, D., Bruckdorfer, K. R., Whincup, P. H., & Davey
Smith, G. (2005). Vitamin C is not associated with coronary heart disease risk once
life course socioeconomic position is taken into account: Prospective findings from
the British Women’s Heart and Health Study. Heart, 91, 1086–1087.
Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., & Davey Smith, G.
(2008). Mendelian randomization: Using genes as instruments for making causal
inferences in epidemiology. Statistics in Medicine, 27, 1133–1163.
Leimar, O., Hammerstein, P., & Van Dooren, T. J. M. (2006). A new perspective on
developmental plasticity and the principles of adaptive morph determination.
American Naturalist, 167, 367–376.
Lenz, W. (1973). Phenocopies. Journal of Medical Genetics, 10, 34–48.
Lewis, S., & Davey Smith, G. (2005). Alcohol, ALDH2 and esophageal cancer:
A meta-analysis which illustrates the potentials and limitations of a Mendelian
randomization approach. Cancer Epidemiology, Biomarkers and Prevention, 14,
1967–1971.
Li, R., Tsaih, S. W., Shockley, K., Stylianou, I. M., Wergedal, J., Paigen, B., et al. (2006).
Structural model analysis of multiple quantative traits. PLoS Genetics, 2, 1046–1057.
Lipp, H. P., Schwegler, H., Crusio, W. E., Wolfer, D. P., Leisinger-Trigona, M. C.,
Heimrich, B., et al. (1989). Using genetically-defined rodent strains for the identi-
fication of hippocampal traits relevant for two-way avoidance behaviour: A non-
invasive approach. Experientia, 45, 845–859.
Little, J., & Khoury, M. J. (2003). Mendelian randomization: A new spin or real pro-
gress? Lancet, 362, 930–931.
Lower, G. M., Nilsson, T., Nelson, C. E., Wolf, H., Gamsky, T. E., & Bryan, G. T. (1979).
N-Acetylransferase phenotype and risk in urinary bladder cancer: Approaches in
molecular epidemiology. Environmental Health Perspectives, 29, 71–79.
MacMahon S, Peto R, Collins R, Godwin J, MacMahon S, Cutler J et al. (1990). Blood
pressure, stroke, and coronary heart disease. The Lancet, 335, 765-774.
Marks, D., Thorogood, M., Neil, H. A. W., & Humphries, S. E. (2003). A review on
diagnosis, natural history and treatment of familial hypercholesterolaemia.
Atherosclerosis, 168, 1–14.
Marmot, M. (2001). Reflections on alcohol and coronary heart disease. International
Journal of Epidemiology, 30, 729–734.
McGrath, J. (1999). Hypothesis: Is low prenatal vitamin D a risk-modifying factor for
schizophrenia? Schizophrenia Research, 40, 173–177.
Memik, F. (2003). Alcohol and esophageal cancer, is there an exaggerated accusation?
Hepatogastroenterology, 54, 1953–1955.
Mendel, G. (1866). Experiments in plant hybridization. Retrieved from http://
www.mendelweb.org/archive/Mendel.Experiments.txt
Millen, A. E., Dodd, K. W., & Subar, A. F. (2004). Use of vitamin, mineral, nonvitamin,
and nonmineral supplements in the United States: The 1987, 1992, and 2000
National Health Interview Survey results. Journal of the American Dietetic
Morange, M. (2001). The misunderstood gene. Cambridge, MA: Harvard University
Press.
Morgan, T. H. (1913). Heredity and sex. New York: Columbia University Press.
Morgan, T. H. (1919). Physical basis of heredity. Philadelphia: J. B. Lippincott.
Morgan, T. H. (1935). The relation of genetics to physiology and medicine. Scientific
Monthly, 41, 5–18.
MRC Vitamin Study Research Group. (1991). Prevention of neural tube defects:
Results of the Medical Research Council vitamin study. Lancet, 338, 131–137.
Mucci, L. A., Wedren, S., Tamimi, R. M., Trichopoulos, D., & Adami, H. O. (2001).
The role of gene–environment interaction in the aetiology of human cancer:
Examples from cancers of the large bowel, lung and breast. Journal of Internal
Medicine, 249, 477–493.
Newcomer, A. D., Hodgson, S. F., Douglas, M. D., & Thomas, P. J. (1978). Lactase
deficiency: Prevalence in osteoporosis. Annals of Internal Medicine, 89, 218–220.
Olby, R. C. (1966). Origins of Mendelism. London: Constable.
Osier, M. V., Pakstis, A. J., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., et al.
(2002). A global perspective on genetic variation at the ADH genes reveals unusual
patterns of linkage disequilibrium and diversity. American Journal of Human
Genetics, 71, 84–99.
Palmer L and Cardon L. (2005). Shaking the tree: Mapping complex disease genes
with linkage disequilibrium. Lancet, 366, 1223–1234.
Perera, F. P. (1997). Environment and cancer: Who are susceptible? Science, 278,
1068–1073.
Pradhan, A. D., Manson, J. E., Rifai, N., Buring, J. E., & Ridker, P. M. (2001).
C-reactive protein, interleukin 6, and risk of developing type 2 diabetes mellitus.
Journal of the American Medical Association, 286, 327–334.
Radimer, K., Bindewald, B., Hughes, J., Ervin, B., Swanson, C., & Picciano, M. F. (2004).
Dietary supplement use by US adults: Data from the National Health and
Nutrition Examination Survey, 1999–2000. American Journal of Epidemiology, 160,
339–349.
Reynolds, K., Lewis, L. B., Nolen, J. D. L., Kinney, G. L., Sathya, B., & He, J. (2003).
Alcohol consumption and risk of stroke: A meta-analysis. Journal of the American
Ridker, P. M., Cannon, C. P., Morrow, D., Rifai, N., Rose, L. M., McCabe, C. H., et al.
(2005). C-reactive protein levels and outcomes after statin therapy. New England
Journal of Medicine, 352, 20–28.
Rimm, E. (2001). Alcohol and coronary heart disease—laying the foundation for future
work [Commentry]. International Journal of Epidemiology, 30, 738–739.
Rimm, E. B., Stampfer, M. J., Ascherio, A., Giovannucci, E., Colditz, G. A., & Willett,
W. C. (1993). Vitamin E consumption and the risk of coronary heart disease in men.
Roderic, T. H., Wimer, R. E., & Wimer, C. C. (1976). Genetic manipulation of neuroa-
natomical traits. In L. Petrinovich & J. L. McGaugh (Eds.), Knowing, thinking, and
believing. New York: Plenum Press.
Rose, G. (1982). Incubation period of coronary heart disease. British Medical Journal,
284, 1600–1601.
Rose, S. (1995). The rise of neurogenetic determinism. Nature, 373, 380–382.
Roseboom, T. J., van der Meulen, J. H., Osmond, C., Barker, D. J. P., Ravelli, A. C. J.,
Schroeder-Tanka, J. M., et al. (2000). Coronary heart disease after prenatal exposure
to the Dutch famine, 1944–45. Heart, 84, 595–598.
Rothman, N., Wacholder, S., Caporaso, N. E., Garcia-Closas, M., Buetow, K., & Fraumeni,
J. F. (2001). The use of common genetic polymorphisms to enhance the epidemiologic
study of environmental carcinogens. Biochimica et Biophysica Acta, 1471, C1–C10.
Rutherford, S. L. (2000). From genotype to phenotype: Buffering mechanisms and the
storage of genetic information. BioEssays, 22, 1095–1105.
Scholl, T. O., & Johnson, W. G. (2000). Folic acid: Influence on the outcome of
pregnancy. American Journal of Clinical Nutrition, 71(Suppl.), 1295S–1303S.
Scientific Steering Committee on Behalf of the Simon Broome Register Group. (1991).
Risk of fatal coronary heart disease in familial hyper-cholesterolaemia. British
Medical Journal, 303, 893–896.
Scriver, C. R. (1988). Nutrient–gene interactions: The gene is not the disease and vice
versa. American Journal of Clinical Nutrition, 48, 1505–1509.
Scriver, C. R., Mahon, B., & Levy, H. L. (1987). The Hartnup phenotype: Mendelain
transport disorder, multifactorial disease. American Journal of Human Genetics, 40,
401–412.
Sesso, D., Buring, J. E., Rifai, N., Blake, G. J., Gaziano, J. M., & Ridker, P. M. (2003).
C-reactive protein and the risk of developing hypertension. Journal of the American
Shaper, A. G. (1993). Alcohol, the heart, and health [Editorial]. American Journal of
Public Health, 83, 799–801.
Shastry, B. S. (1998). Gene disruption in mice: Models of development and disease.
Molecular and Cellular Biochemistry, 181, 163–179.
Sjöholm, A., & Nyström, T. (2005). Endothelial inflammation in insulin resistance.

Lancet, 365, 610–612.
Slack, J. (1969). Risks of ischaemic heart disease in familial hyperlipoproteinaemic
states. Lancet, 2, 1380–1382.
Snyder, L. H. (1959). Fifty years of medical genetics. Science, 129, 7–13.
Spearman, C. (1904). The proof and measurement of association between two things.
American Journal of Psychology, 15, 72–101.
Stampfer, M. J., Hennekens, C. H., Manson, J. E., Colditz, G. A., Rosner, B., & Willett,
W. C. (1993). Vitamin E consumption and the risk of coronary disease in women.
Steinberg, D. (2004). Thematic review series. The pathogenesis of atherosclerosis. An
interpretive history of the cholesterol controversy: part 1. Journal of Lipid Research,
45, 1583–1593.
Steinberg, D. (2005). Thematic review series. The pathogenesis of atherosclerosis. An
interpretive history of the cholesterol controversy: part II. The early evidence linking
hypercholesterolemia to coronary disease in humans. Journal of Lipid Research, 46,
179–190.
Strohman, R. C. (1993). Ancient genomes, wise bodies, unhealthy people: The limits
of a genetic paradigm in biology and medicine. Perspectives in Biology and Medicine,
37, 112–145.
Takagi, S., Iwai, N., Yamauchi, R., Kojima, S., Yasuno, S., Baba, T., et al. (2002).
Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in
Japanese men. Hypertension Research, 25, 677–681.
Terwilliger, J. D., & Weiss, W. M. (2003). Confounding, ascertainment bias, and the
blind quest for a genetic ‘‘fountain of youth.’’ Annals of Medicine, 35, 532–544.
Thomas, D. C., & Conti, D. V. (2004). Commentary on the concept of ‘‘Mendelian
randomization.’’ International Journal of Epidemiology, 33, 17–21.
Thompson, W. D. (1991). Effect modification and the limits of biological inference
from epidemiological data. Journal of Clinical Epidemiology, 44, 221–232.
Thun, M. J., Peto, R., Lopez, A. D., Monaco, J. H., Henley, S. J., Heath, C. W., et al.
(1997). Alcohol consumption and mortality among middle-aged and elderly U.S.
adults. New England Journal of Medicine, 337, 1705–1714.
Timpson, N. J., Lawlor, D. A., Harbord, R. M., Gaunt, T. R., Day, I. N. M., Palmer, L.
J., et al. (2005). C-reactive protein and its role in metabolic syndrome: Mendelian
randomization study. Lancet, 366:1954–1959.
Timpson NJ, Forouhi NH, Brion M-J et al. (2010). Genetic variation at the SLC23A1
locus is associated with circulating levels of L-ascorbic acid (Vitamin C). Evidence
from 5 independent studies with over 15000 participants. Am J Clin Nutr, on-line,
001: 10.3945/ajen.2010.29438.
Tybjaerg-Hansen, A., Steffensen, R., Meinertz, H., Schnohr, P., & Nordestgaard, B. G.
(1998). Association of mutations in the apolipoprotein B gene with hypercholester-
olemia and the risk of ischemic heart disease. New England Journal of Medicine, 338,
1577–1584.
Verma, S., Szmitko, P. E., & Ridker, P. M. (2005). C-reactive protein comes of age.
Nature Clinical Practice, 2, 29–36.
Waddington, C. H. (1942). Canalization of development and the inheritance of
acquired characteristics. Nature, 150, 563–565.
Warren, K. R., & Li, T. K. (2005). Genetic polymorphisms: Impact on the risk of fetal
alcohol spectrum disorders. Birth Defects Research A: Clinical and Molecular
Teratology, 73, 195–203.
Weimer, R. E. (1973). Dissociation of phenotypic correlation: Response to posttrial ether-
ization and to temporal distribution of practice trials. Behavior Genetics, 3, 379–386.
Weiss, K., & Terwilliger, J. (2000). How many diseases does it take to map a gene with
SNPs? Nature Genetics, 26, 151–157.
West-Eberhard, M. J. (2003). Developmental plasticity and evolution. New York: Oxford
University Press.
Wheatley, K., & Gray, R. (2004). Mendelian randomization—an update on its use to
evaluate allogeneic stem cell transplantation in leukaemia [Commentary].
Wilkins, A. S. (1997). Canalization: A molecular genetic perspective. BioEssays, 19,
257–262.
Williams, R. S., & Wagner, P. D. (2000). Transgenic animals in integrative biology:
Approaches and interpretations of outcome. Journal of Applied Physiology, 88, 1119–1126.
Wolf, U. (1995). The genetic contribution to the phenotype. Human Genetics, 95, 127–148.
Wright, A. F., Carothers, A. D., & Campbell, H. (2002). Gene–environment interac-
tions—the BioBank UK study. Pharmacogenomics Journal, 2, 75–82.
Wu, T., Dorn, J. P., Donahue, R. P., Sempos, C. T., & Trevisan, M. (2002). Associations
of serum C-reactive protein with fasting insulin, glucose, and glycosylated hemoglo-
bin: The Third National Health and Nutrition Examination Survey, 1988–1994.
Youngman, L. D., Keavney, B. D., Palmer, A., Parish, S., Clark, S., Danesh, J., et al.
(2000). Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial
infarction and in 6002 controls: test of causality by ‘‘Mendelian randomization.’’
Circulation, 102(Suppl. II), 31–32.
Zuckerkandl, E., & Villet, R. (1988). Concentration—affinity equivalence in gene reg-
ulation: Convergence and envirnonmental effects. Proceedings of the National
Academy of Sciences USA, 85, 4784–4788.
10
Rare Variant Approaches to Understanding the

Causes of Complex Neuropsychiatric Disorders
matthew w. state
The distinction between genetic variation that is present in more than 5% of

the population (defined as common) and genetic variation that does not meet
this threshold (defined as rare) is often lost in the discussion of psychiatric
genetics. As a general proposition, the field has come to equate the hunt for
common variants (or alleles) with the search for genes causing or contribut-
ing to psychiatric illness. Indeed, the majority of studies on mood disorders,
autism, schizophrenia, obsessive–compulsive disorder, attention-deficit/hyper-
activity disorder, and Tourette syndrome have restricted their analyses to the
potential contribution of common alleles. Studies focusing on rare genetic
mutations have, until quite recently, been viewed as outside the mainstream
of efforts aimed at elucidating the biological substrates of serious
psychopathology.
Both the implicit assumption that common alleles underlie the lion’s
share of risk for most common neuropsychiatric conditions and the notion
that the most expeditious way to elucidate their biological bases will be to
concentrate efforts on common alleles deserve careful scrutiny. Indeed, key
findings across all of human genetics, including those within psychiatry,
support the following alternative conclusions: (1) for disorders such as
autism and schizophrenia, the study of rare variants already holds the
most immediate promise for defining the molecular and cellular mechanisms
of disease (McClellan, Susser, & King, 2007; O’Roak & State, 2008); (2)
common variation will be found to carry much more modest risks than
previously anticipated (Altshuler & Daly, 2007; Saxena et al., 2007); and (3)
rare variation will account for substantial risk for common complex disorders,
particularly for neuropsychiatric conditions with relatively early onset and
chronic course.
This chapter addresses the rare variant genetic approach specifically with
respect to mental illness. It first introduces the distinction between the key
252
10 Rare Variant Approaches to Neuropsychiatric Disorders 253
characteristics of common and rare genetic variation. It then briefly addresses

the methodologies employed to demonstrate a causal or contributory role for
genes in complex disease, focusing on how these approaches differ in terms
of the ability to detect and confirm the role of rare variation. The chapter will
then turn to a consideration of the genetics of autism-spectrum disorders as a
case study of the manner in which rare variants may contribute to the under-
standing of psychiatric genetics, and finally, the discussion will conclude with
a consideration of the implications of emerging genomic technologies for this
process.
Genetic Variation
The search for ‘‘disease genes’’ is more precisely the search for disease-
related genetic variation. Basic instructions are coded in DNA to create and
sustain life; these instructions vary somewhat between individuals, creating a
primary source of human diversity. Variation in these instructions is also
thought to be largely responsible for differences in susceptibility to diseases
influenced by genes.
Concretely, when individuals differ at the level of DNA, it is often with
regard to the sequence of its four constituent parts, called ‘‘nucleotides’’ or
‘‘bases,’’ which make up the DNA code: adenine (A), guanine (G), cytosine
(C), and thymine (T). Indeed, within the human genome, variations at indi-
vidual nucleotides appear quite frequently (approximately 1 in every 1,000
bases) (International Human Genome Sequencing Consortium, 2004; Lander
et al., 2001; McPherson et al., 2001). The vast majority of this variation is
related to an individual’s ethnic origin and has no overt consequence for
human disease. However, it is not known at present what proportion of
the observed differences between individuals either within our outside of
regions of the genome that specify the production of proteins (through the
process of transcription and translation) might confer subtle alterations in
function. At present, while elegant and inventive approaches are being
employed to address the question, particularly with regard to ‘‘noncoding’’
DNA (Noonan, 2009; Prabhakar et al., 2008), the consequences of sequence
variations identified in these regions remain difficult to interpret.
Consequently, while only 2% of the genome is ultimately translated into
protein, it is this subset that is most readily understood with regard to its
impact on a phenotype of interest (International Human Genome
Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001).
The terminology applied to genetic variation may be somewhat confusing
due to a number of redundant or loosely defined terms. While a threshold of
5% is often used as the cutoff for rare variation, many authors also
distinguish between these and very rare (<1%) alleles. Common variations,
regardless of their impact on gene function, are often referred to as poly-
morphisms or alleles, but both terms are also at times applied to any change in
the genome, regardless of its frequency, that does not appear to be deleter-
ious to the function of the RNA or protein that it encodes. In the ensuing
discussion we use the term polymorphism to refer to common variants. Some
authors refer to rare variants and mutations synonymously. In the current
discussion, mutation will refer to the subcategory of rare variation that is
thought to cause or carry risk for disease.
Several other terms warrant definition here: Common variations at a single
nucleotide are typically referred to as single-nucleotide polymorphisms (SNPs).
For example, if the sequence in a specific region of DNA is ACTCTCCT in
most individuals, but in more than 5% of individuals the same region reads
as ACTCTACT on at least one of a pair of chromosomes carrying this
sequence, this would represent an SNP with the major allele being C and
the minor allele being A. Moreover, the frequency of the ‘‘A’’ would be
referred to as the minor allele frequency. SNPs are thought to be most often
the consequence of a single error in replication of the DNA at some point in
human history that has subsequently spread through the population.
A second form of variation often used in genetic studies involves variable
numbers of DNA repeats: These are short, repetitive sequences of nucleotides
that are prone to instability during DNA replication. This type of variation is
known as a short tandem repeat (STR). In this case, the sequence abbreviated
GTACACAGT found on one chromosome in an individual might be found to
be GTACACACACACACT on a second chromosome or in another individual.
Interestingly, these types of repeats are so frequently prone to change that
there are often multiple forms within a population, but they are not so
changeable as to be likely to undergo expansion among closely related indi-
viduals. These properties, along with the ease of assaying STRs, make them
highly suitable for tracing DNA inheritance from generation to generation, as
will be discussed later.
A third, much more recently appreciated type of variation is known as a
copy number variant (CNV) (Iafrate et al., 2004; Redon et al., 2006; Sebat et al.,
2004). In this case, the structure of chromosomes varies among individuals.
For example, a deletion or duplication of DNA might be present on one
chromosome in an individual but not in another. CNVs specifically refer to
these types of changes that fall below the resolution of the light microscope.
Chromosomal variations that exceed this threshold are now referred to as
‘‘gross’’ cytogenetic deletions, duplications, or rearrangements. The lower size
bound of a CNV and how it is distinguished from a small insertion or
deletion of DNA sequence (called an in/del) vary from author to author,
but a common cutoff is 1,000 base pairs of DNA.
Over the last few years it has become clear that CNVs are distributed
throughout the genome, populating even those regions that contain genes
coding for RNAs and proteins. Previously, the finding of a loss of a coding
region in an affected individual was taken as prima facie evidence for a
causal relationship between the variation and the observed phenotype.
However, as microarrays ushered in an era of much higher resolution ana-
lysis of the genome, it has become clear that this conventional wisdom
reflected an implicit, incorrect assumption regarding the ‘‘intactness’’ of
the genome. In fact, it is now clear that widespread copy number variation
is seen among control populations (Redon et al., 2006; Sharp, Cheng, &
Eichler, 2006), requiring more rigorous approaches to demonstrating a rela-
tionship between a structural change in the genome and a clinical outcome.
Finally, an important distinction regarding the distribution of disorder-
related variation is often made: If multiple variations in a single gene lead
to or contribute to an outcome of interest, this is referred to as allelic hetero-
geneity. If variations in many different genes may lead to a single disease or
syndrome, which is referred to as locus heterogeneity (a locus is simply a given
region of the genome). Both are widely observed in human disease and have
been invoked across nearly all complex disorders to help explain the difficul-
ties that have been encountered in clarifying the genetic bases of disease
(Botstein & Risch, 2003).
Irrespective of whether a variant is introduced into the sequence or structure
of the DNA, once it is present in the human genome, natural selection plays a
defining role. Our current understanding of this process is certainly more
nuanced than when first proposed, but the basic notions continue to serve
well to understand the dynamics of variation: Changes that do not impact repro-
ductive fitness in a negative fashion may be readily passed from generation to
generation and, over time, have the potential to become common. Alternatively,
changes that result in impaired fitness are subject to negative or purifying
selection, decreasing the frequency of that allele in the population.
Of course, the impact on fitness is only one of several forces that dictate
the population frequency of a specific genetic variant. A variant newly intro-
duced into the population would most likely be rare, regardless of its func-
tional consequences, given the large number of possible positions for this
change and the lack of time for it to be distributed through the population.
Moreover, the history of specific ethnic groups, including migration patterns
and social norms, can significantly influence the dynamics of particular
genetic variants over time.
With regard to disease risk, it has nonetheless proven quite instructive to
think about the distinction between advantageous or neutral and deleterious
variants. Based on these classical notions, alleles contributing to early-onset
disorders that reduce fertility or indirectly lead to decreased reproductive
fitness would be expected over time to be driven down in frequency in the

population and become rare. Of course, given the caveats presented in the
previous paragraph, it is important to recall that while deleterious variation is
likely to be rare, rare variation is not uniformly deleterious.
Alleles related to diseases of later life, those that do not have a negative
impact on fitness or which carry small deleterious effects, those in which a
positive effect of a given variation counterbalances a negative consequence
(balancing selection), and any variation that is physically quite close to an
allele that is advantageous (a ‘‘hitchhiking’’ allele) are some of the mechan-
isms that would be expected to lead to common genetic variants.
The relative contribution of rare and common variation to psychiatric ill-
ness is a matter of considerable theoretical and practical significance.
Demonstrating the Role of Common and Rare

Variations in Human Disease Risk
The field of human genetics has a record of tremendous accomplishment in

those disorders for which a single gene directly causes or dramatically
increases the risk for a disease or syndrome (Botstein & Risch, 2003). As
noted, the task of gene discovery in disorders in which many genes and
many variations may potentially play a causal or contributory role has
remained challenging. In the face of continued uncertainty, two alternative
(but not mutually exclusive) paradigms have emerged: a common variant:
common disease model (Chakravarti, 1999) and a rare variant:common
disease hypothesis (Ji et al., 2008; Pritchard, 2001).
Until quite recently, the former has been favored in the study of neurop-
sychiatric disorders. For instance, it has been widely held that syndromes
such as schizophrenia, depression, bipolar disorder, and autism result from
the combined effect of multiple common genetic variants, each carrying
modest effects and interacting with environmental factors. This hypothesis
has garnered favor for a variety of reasons. First, for these and other complex
disorders, both modeling and early experimental evidence have essentially
ruled out the idea of a single gene of major effect explaining a substantial
portion of the disease/syndrome in the population (Risch, 1990). Second, in
many of these conditions, extended family members can be found to show
signs of subtle, subclinical or ‘‘component’’ phenotypes, suggesting they are
nearing but have not reached a liability threshold (Constantino et al., 2006;
Constantino & Todd, 2005). Third, most of the genetic variation carried
within a population is in the form of common variants (International
Human Genome Sequencing Consortium, 2004; Lander et al., 2001;
McPherson et al., 2001), and it has been hypothesized that common
disorders will likely reflect this architecture. Finally, as many disorders of

interest are both relatively common and found throughout the world, the
notion that common polymorphisms shared across ethnic groups will be
identified as risk factors is intuitively attractive.
An alternative model, the rare variant:common disease hypothesis, has
long been proposed and has gained particular traction over the past several
years due to several convergent factors: (1) recent very strong rare variant
findings from studies of CNVs in autism and schizophrenia, (2) an appraisal
of a wave of successful common variant investigations across all of medicine,
and (3) the emergence of new genomic technologies that are making large-
scale genomewide rare variant studies feasible.
The rare variant:common disease hypothesis posits that common disor-
ders may be the result of multiple individually rare variants that contribute
either alone or in combination to common phenotypes. This notion is parti-
cularly attractive for disorders with early onset and those that theoretically
alter reproductive fitness. As noted for such conditions, in the absence of
balancing selection, one would expect that, on average, alleles with appreci-
able effects would be driven down in frequency by natural selection. In
addition, one would expect a significant contribution from individually rare
mutations for disorders in which so-called sporadic or de novo variation plays
an important role.
Of course, the contributions of common and rare variants are not
mutually exclusive; it is quite possible, and indeed likely, that both will be
found to contribute to many common neuropsychiatric conditions (McClellan
et al., 2007; Veenstra-Vanderweele, Christian, & Cook, 2004; Veenstra-
VanderWeele & Cook, 2004). However, the distinction is of tremendous
importance with respect to current genetic studies, in part because the
approaches employed to identify a causal or contributory role for genes in
neuropsychiatric disorders can differ, sometimes dramatically, in terms of
their ability to detect rare versus common variation. This issue will be
addressed specifically with regard to three major strategies for gene identifi-
cation: linkage, association, and cytogenetic/CNV analysis.
Linkage Studies
In linkage analysis, one seeks to determine if the transmission of any chro-
mosomal segment from one generation to another within a family or families
coincides with the presence of the phenotype of interest. If every chromo-
some (or, in practice, every autosome) is evaluated simultaneously, the study
is referred to as a genomewide linkage scan.
This process of tracing inheritance relies fundamentally on genetic varia-
tion. If every chromosome were identical, it would be impossible to observe
the process of transfer from grandparent to parent to child. As one is

not able feasibly (quite yet) to read the entire sequence of DNA, genetic
variants such as SNPs or STRs are used to mark the difference between
chromosomes. These genetic ‘‘markers’’ then are evaluated for intergenera-
tional transmission.
Broadly, linkage studies evaluate the probability that a given phenotype
and a particular genetic marker, or series of markers, are transmitted together
from one generation to another. To appreciate how this process might lead to
the identification of a disease gene, one needs to refer back to basic genetic
principles. When egg and sperm are formed, homologous chromosomes
from each of the 22 pairs of autosomes exchange large blocks of information
through a process known as homologous recombination and crossing over. As a
consequence, each of the chromosomes in the haploid gamete is, on average,
a mixture of the two parental chromosomes. During this process of gamete
formation, the likelihood that any two points on a parental chromosome will
have a crossover event between them is a function of how far apart they are
physically on that chromosome: Loci at opposite ends will be likely to be
separated by a crossover. Regions that are close to each other will be less
likely to have a crossover between them and will tend to pass together
through multiple generations. Thus, the closer a particular marker is to a
mutation causing the disorder, the more likely the marker and the mutation
will be to travel together in every generation. For a disorder in which a single
gene or genetic variation is being transmitted within a family resulting in an
identifiable clinical outcome, narrowing the region containing an offending
genetic variant is relatively straightforward, given a sufficient number of
families, markers, and transmissions. It is worthwhile to stress that this
type of analysis is initially aimed at identifying a location (the piece of the
ancestral chromosome) that carries both the marker and the mutation. The
subsequent process of identifying the mutation in the involved gene within a
linkage interval is typically referred to as fine mapping.
Two basic approaches to linkage analysis predominate in the hunt for
human disease genes. The first is known as parametric linkage: A hypothesis
about the nature of the proposed genetic transmission is developed, related
parameters are specified based on this hypothesis (e.g., whether the disorder
is dominantly inherited, recessively inherited, or sex-linked), and one inves-
tigates the actual pattern of transmission among subjects. The odds of seeing
the given pattern of transmission are determined based on the hypothesis
that the disease and marker(s) are very close to each other, and this is
compared to the odds of observing the same pattern of transmission in the
absence of this linkage (the null hypothesis). The ratio of observed versus
expected odds is most often then transformed into a log10 scale and
expressed as the logarithm of the odds, or lod, score.
Parametric linkage analysis is a powerful approach to investigating rare

Mendelian disorders, as demonstrated by the current accumulation of more
than 1,200 genes identified for such conditions (http://www.ncbi.nlm.nih.
gov/omim/). In those instances in which this approach to linkage has been
used with respect to complex neuropsychiatric disorders, it is typically the
result of the identification of a rare family or families demonstrating inheri-
tance that is simpler than what is presumed to be the norm for the overall
condition (Brzustowicz, Hodgkinson, Chow, Honer, & Bassett, 2000;
Laumonnier et al., 2004; Strauss et al., 2006). The reasons for this include
the fact that such analyses can be quite sensitive to misspecification of para-
meters and are limited in their ability to tolerate locus heterogeneity within
and across families or bilineal inheritance (risks coming from both maternal
and paternal lineages).
Given a clear consensus that no common neuropsychiatric condition will
be accounted for solely by a single rare genetic variation, many researchers,
particularly those interested in common alleles, began to favor an alternative
approach known as nonparametric linkage. This approach does not require the
specification of a hypothesis regarding the mode or character of inheritance.
Instead, one seeks to identify any region of the genome that is shared among
affected related individuals (or not shared in affected or unaffected relative
pairs) more often than would be expected by chance. This method does not
require all of the identified families to carry the same causal or contributory
genetic variation among them. Consequently, like parametric linkage, such
studies are tolerant to allelic heterogeneity and theoretically able to identify
both common and rare variants. However, they are not as well suited as
association studies (discussed later, see Genetic Association) to accommodate
a substantial amount of locus heterogeneity as the sample sizes necessary to
identify many loci simultaneously, particularly those with modest effects, can
be impractically large (Risch & Merikangas, 1996).
Like parametric linkage, nonparametric analyses often use the lod score to
measure the statistical significance of the result. While there is ongoing
debate over the precise criteria to declare a result significant, the most com-
monly used thresholds involve a genomewide lod score of 3.0 for parametric
studies and 3.6 for nonparametric studies (Lander & Kruglyak, 1995).
Of course, statistical significance may be strong evidence for a genotype–
phenotype relationship, but it is not generally considered definitive. For both
parametric and nonparametric approaches, either replication in an indepen-
dent sample or, more importantly, the identification of the offending deleter-
ious mutation(s) is taken as substantiating evidence. This has proven easier
in practice in the case of parametric versus nonparametric analyses, likely
owing both to its use in the study of simpler Mendelian disorders (in which
the relationship between genotype and phenotype is highly reliable and often
nearly 1:1) as well as to the fact that fine mapping is more readily accom-
plished in parametric versus nonparametric studies.
Biological studies of the implicated gene in vitro and in vivo, including
modeling the identified human mutations, is another highly desirable avenue
for developing convergent evidence to support a linkage finding. The practical
reality is that in neuropsychiatric disorders the relevant tissue is most often
not accessible for direct study in humans, rendering model systems particu-
larly attractive. Nonetheless, it is important to recall that there are critical
differences between the human and the rodent brain (or fly or worm) and
that these differences may be particularly relevant to the domains of function
that are of most interest in neuropsychiatric disorders. On the one hand, the
demonstration of a ‘‘neural’’ phenotype in an animal carrying a human
mutation or allele may be instructive and often the first step toward the
identification of the relevant highly conserved molecular pathways across
species. However, there have been many instances in which knockouts of
genes recapitulating clearly causal Mendelian mutations in neurodevelop-
mental syndromes have not resulted in phenotypes resembling those found
in humans, suggesting the need for some caution in the interpretation of
model systems data.
Genetic Association
In contrast to linkage studies, association methodologies are ‘‘cross-sectional’’
as they investigate variation across populations as opposed to studying
genetic transmissions within families. In essence, the methodology relies
on a classic case–control design: Genetic variants are identified as the
‘‘exposure,’’ and the allele frequency is compared in affected and unaffected
individuals. It is important to mention here that while case–control analysis
has become the most widely used association strategy of late, there are
variations on this theme, called transmission tests, that rely on a combination
of linkage and association and evaluate parent–child trios. These approaches
have also been quite popular, particularly with regard to pediatric disorders.
Until recently, genetic association studies could feasibly investigate only
one or a small number of known, common genetic polymorphisms in or
near an identified gene(s) of interest. Relative to nonparametric linkage ana-
lyses, the approach is theoretically better able to detect small increments of
risk; but, importantly, it is not able in practice to detect rare variants con-
tributing at a particular locus. Given the popularity of common variant
candidate gene association studies across all of medicine and particularly
in psychiatric genetics, this distinction is quite important: It is not uncom-
mon for either positive or negative results to be reported with respect to a
gene suggesting that it is or is not associated with a disorder. In fact, the
methodology in practice tests only for the contribution of common variation

at that locus and not the gene itself.
In addition to the tolerance to locus heterogeneity, common variant asso-
ciation studies have been extremely popular for a variety of reasons, including
their logistical ease. The evaluation of known polymorphisms is relatively
inexpensive, and the ascertainment of unrelated cases and controls, or even
family-based trios in childhood disorders, is far easier than identifying and
recruiting either affected siblings or finding large multiply affected families.
These facts, coupled with the general conviction that common polymorph-
isms account for the majority of risk in the disorders of interest, understand-
ably have made so-called candidate gene association studies historically the
most frequently attempted type of human genetic investigation.
Widespread experience with this approach has led to an appreciation of
some of its drawbacks as well, beyond that related to its inability to detect the
contribution of rare variants. Perhaps the most pressing is the observation
that the vast majority of positive findings from these studies in the medical
literature have not been replicated (Hirschhorn & Altshuler, 2002;
Hirschhorn, Lohmueller, Byrne, & Hirschhorn, 2002). Among the various
reasons for this are (1) the potential to misinterpret genetic variation related
to ethnic differences in cases and controls as disorder-related polymorphisms,
(2) sample sizes that have until quite recently been inadequate to reliably
identify the small differences in risk attributable to alleles contributing to
complex disorders, and (3) preferential publication of positive results among
an ample group of underpowered studies.
It is also likely that the requirement that investigators choose a small
number of candidate genes and markers to evaluate in any given study has
contributed to difficulties in identifying true-positive associations. A complete
evaluation of these limitations is beyond the scope of this discussion; it is
sufficient to note that common variant association studies are able to detect
only the contribution of variants that are being tested directly, typically
restricted to alleles of 5% population frequency or greater, and those that
are nearby, again sharing this minimum frequency (Zondervan & Cardon,
2004). Until the early 2000s, a sufficient number of known SNPs was not
available, and it was impractical to test simultaneously the number of mar-
kers that would be required to provide information regarding all genes or
most genes in the genome (International HapMap Consortium, 2005; Daly,
Rioux, Schaffner, Hudson, & Lander, 2001; Gabriel et al., 2002).
However, recently, this calculation has been transformed: first, through the
identification of millions of SNPs via the sequencing of the human genome
and, second, through the development of microarray technologies that allow
for hundreds of thousands to millions of SNPs to be evaluated in a single
patient in a single low-cost experiment. As a result, genomewide association
studies (GWASs) have become the gold standard for common variant dis-
covery in complex disorders (Hirschhorn & Daly, 2005). These investigations
take advantage of SNPs spaced across the entire genome to conduct associa-
tion studies without the requirement of an a priori hypothesis regarding a
specific gene or genes. This technological advance, in conjunction with a now
sufficiently large collection of patient samples, has led to a spate of studies
that have begun to confirm the clear contribution of common alleles to
common disease (Bilguvar et al., 2008; Hakonarson et al., 2007; Saxena
et al., 2007; Scott et al., 2007; Zeggini & McCarthy, 2007).
Several aspects of these recent findings deserve comment here. First, it is
remarkable to begin to see concrete evidence for the common allele:common
disease hypothesis after years of uncertainty. It is notable, however, that the
scale of the effect of individual alleles identified in recent studies has been
extraordinarily modest, explaining why very large sample sizes have been
required to clarify contradictory results (Altshuler & Daly, 2007). In neurop-
sychiatric genetics specifically, a great deal of effort has been expended trying
to understand inconsistent common variant association findings. These
recent investigations suggest that the simplest answer may suffice: When
the sample size is sufficiently large and genomewide association is employed,
reproducible results will emerge if common variants play a role (Psychiatric
GWAS Consortium Steering Committee, 2009; Ma et al., 2009; McMahon
et al., 2010; Weiss, Arking, Daly, & Chakravarti, 2009). Similarly, the total
amount of individual variation in disease risk accounted for by the identified
common alleles has been surprisingly modest. This underscores the fact that
the contribution of rare variation might help to explain a larger amount of
risk in complex disorders than previously anticipated.
Whether a candidate gene study or GWAS, typically the first evidence for a
probabilistic relationship between a variation and a clinical phenomenon of
interest involves surpassing a preordained statistical threshold. In candidate
gene common variant association, there is not yet complete agreement on
this issue, including regarding how to appropriately correct for multiple
comparisons. The difficulties that have attended replication of studies using
this approach have now led to the general expectation that some type of
internal replication of association will be attempted prior to publication. Of
course, replication in an independent laboratory in a separate sample
remains the gold standard. In addition, as either common variant methodol-
ogy may detect association of an allele that is near, but not directly contained
within, the tested set of alleles, the identification of the ‘‘functional’’ variant
within the association interval is generally considered strong supporting evi-
dence (State, 2006). As noted, associated variants identified in regulatory and
other noncoding sequences mapping very far from known coding regions can
pose significant challenges in this regard. Finally, while statistical thresholds
for candidate gene studies remain a matter of debate, there has emerged
something of a consensus regarding appropriate thresholds for GWAS ana-
lyses that are quite stringent and seem to contribute to the markedly
improved reliability of this approach compared to prior generations of
common variant association analysis.
Both GWAS and most candidate gene studies assay known alleles with a
preordained minor allele frequency, typically restricting the analysis to
common variants. In contrast, a mutation burden approach applies association
strategies to rare variants. This is critically important if one desires to test the
hypothesis that rare variants may contribute broadly in the population to the
occurrence of a common complex disorder or phenotype as opposed to explain-
ing Mendelian inheritance within one or a small number of families.
Establishing a population association of rare alleles may in practice be
quite challenging. Taken individually, rare and especially very rare alleles at
a given locus would require sample sizes that could not practically be reached
to achieve a statistically significant result. An alternative method of addres-
sing risk assesses the total amount of rare variation present within a gene or
genes of interest in cases versus controls. The identification of such rare
variations, apart from CNVs (which will be described later, see Cytogenetics
and CNV Detection), requires that individual genes be comprehensively eval-
uated using either direct sequencing or a multistep mutation-screening pro-
cess that identifies sequences containing possible variations followed by
confirmation via sequencing (Abelson et al., 2005).
While intuitively attractive, until quite recently, technological realities have
placed significant limits on mutation burden approaches; detection of pre-
viously unknown rare variants, even via mutation screening, has been many
times more expensive than genotyping of known variants. Consequently, in
practice, the method has been applied only to candidate genes. Nonetheless,
several notable investigations have highlighted the value of these investiga-
tions. For example, Helen Hobbs and colleagues have convincingly demon-
strated that mutations in genes known to be involved in rare forms of
hypocholesterolemia are present in the general population and contribute
to the overall variation in high-density lipoprotein levels (Cohen et al.,
2004). Similarly, recent work from Richard Lifton’s lab has shown a signifi-
cant contribution of rare alleles in genes responsible for rare syndromic
forms of hypotension to blood pressure variation in the general population
(Ji et al., 2008). Moreover, technological advances are promising to vastly
expand the application of these types of approaches. Within the past year,
so-called next-generation sequencing has made the evaluation of all coding
segments of the genome a practical reality (Choi et al., 2009; Ng et al., 2009);
and within a relatively short time frame, whole-genome sequencing promises
to become commonplace.
Irrespective of the methods to detect rare variation, it is worth noting that

mutation burden analysis poses its own challenges with regard to demon-
strating risk. For example, it may be difficult to identify the type of variation
that is truly of interest; namely, variations that result in a functional altera-
tion of a gene. This is more easily identified among nonsynonymous or
nonsense mutations (i.e., nucleotide changes that would be predicted to
result in the substitution of one amino acid for another or the introduction
of a stop codon into a protein). However, even among this limited set of
variation, distinguishing functional from neutral mutations may be quite
challenging. While there are several widely used software programs available
to predict deleterious changes in coding sequence, in practice these have not
proven to be highly reliable. Given the relatively small numerator expected in
rare variant discovery studies, the introduction of even a small number of
alleles that cannot be appropriately classified is a potentially critical confound
(Ji et al., 2008).
Of note, the approach so far successfully adopted by the Hobbs and Lifton
groups has been to study genes that are already known to contribute to rare
recessive syndromes and for which specific functional alleles have been defi-
nitively identified. These functional alleles may serve as a touchstone for the
analysis of heterozygous mutations in the same genes. This approach seems
to have mitigated some of the liability that has attended the selection of
candidate genes in common variant studies. Certainly, at present, mutation
burden studies will be most effectively applied when the function of the gene
being examined is well known and biological assays exist to evaluate the
consequences of the identified rare variants.
Cytogenetics and CNV Detection

For 40 years, geneticists have been leveraging the discovery of gross micro-
scopic chromosomal abnormalities to identify disease genes. This approach
led to some of the most important initial findings in the field. For example,
in the late 1960s, abnormal constriction on the X chromosome was observed
in boys with mental retardation (Lubs, 1969; Sutherland, 1977), leading to the
discovery of fragile X syndrome and ultimately to cloning of the fragile X
mental retardation protein (Fu et al., 1991; Verkerk et al., 1991).
As the technology to examine chromosomes has advanced, so too has the
power of these approaches. Molecular methods including fluorescence in situ
hybridization now readily allow for the identification of the precise location of
chromosomal disruption caused by a balanced translocation or inversion.
Consequently, a strategy of cloning genes at breakpoints has been applied to
the study of a variety of disorders including mental retardation, autism, Tourette
syndrome, and schizophrenia (Abelson et al., 2005; Dave & Sanger, 2007;
Millar et al., 2000; Vorstman et al., 2006). Typically in this approach, the
mapping of a translocation, chromosomal inversion, or deletion is used as a
means of identifying a candidate gene(s), which is then further studied for
rare variants in patients without known chromosomal abnormalities
(Abelson et al., 2005; Jamain et al., 2003).
The most recent advances in cytogenetics have been particularly fascinat-
ing. As previously noted, new technologies have recently led to the discovery
that submicroscopic variations in chromosomal structure are widespread
throughout the genomes of normal individuals. Of note, such CNVs can be
detected using microarrays, including those designed for SNP genotyping,
which currently have a resolution of as small as several hundred bases.
One unexpected consequence of CNV detection has been to cast doubt on
causal inferences associated with previous cytogenetic investigations. As
noted, prior to the discovery of CNVs (particularly their presence in coding
regions of the genome), it was largely presumed that a rearrangement or loss
of genetic material disrupting gene structure was the likely cause of an
observed phenotype. It is now clear that rearrangements may often physically
disrupt genes without overtly negative consequences. Conversely, structural
derangements that do not map to coding regions of the genome have been
known for some time to have deleterious potential (Kleinjan & van
Heyningen, 1998; State et al., 2003).
A final important observation about copy number detection is that it was
the first practical technique that was able to identify both common and rare
changes in chromosomal structure at high resolution on a genomewide scale.
The implications of this technological advance are discussed in more detail in
the following section.
Case Study: Rare Variants and Autism-Spectrum

Disorders
Autism is a pervasive developmental disorder characterized by findings in

three broad domains: delayed and deviant language development, fundamen-
tally impaired social communication, and highly restricted interests or stereo-
typies. A range of related phenotypes including Asperger syndrome and
pervasive developmental disorder (not otherwise specified) comprise a con-
sensus category of autism-spectrum disorders (ASDs). In the Diagnostic and
Statistical Manual, fourth edition (DSM-IV), autism is contained within the
section on pervasive developmental disorders, which includes (in addition to
the syndromes mentioned) Rett syndrome, a rare developmental disorder
largely confined to girls, and childhood disintegrative disorder (also some-
times called ‘‘Heller syndrome’’), in which an extended period of normal
development is followed by significant regression in developmental mile-

stones, leading to an autism phenotype.
This syndrome serves well as a case study of the potential value of rare
variant approaches to psychiatric genetics. Extensive genetic investigations
have already been undertaken in this area, an effort that is perhaps second
only to that given to schizophrenia in terms of the scale of the investment.
Moreover, as will be argued, ASD is in some sense a paradigmatic develop-
mental neuropsychiatric condition, particularly with respect to conceptualiz-
ing the role of rare variant studies.
Several decades of study have led to some clear consensus regarding the
genetics of autism: The concordance rate is far higher in monozygotic than
dizygotic twins (Bailey et al., 1995), suggesting a high degree of heritability.
Moreover, as with all common neuropsychiatric syndromes, twin and family
studies demonstrate that neither a substantial portion of classically defined
autism nor other syndromes in the autism spectrum can be explained by
variation within a single gene transmitted in Mendelian fashion (Gupta &
State, 2007; O’Roak & State, 2008).
Despite this general consensus, the underlying allelic architecture of
autism remains uncertain. The vast majority of studies over the past
decade have focused on the potential contribution of common variants
based on a general consensus that autism is likely to be accounted for by
the common variant:common disease model. Despite this conviction, the
significant contribution of specific common alleles has only recently been
suggested with any degree of confidence (Alarcon et al., 2008; Arking
et al., 2008; Campbell et al., 2006; Ma et al., 2009; Weiss et al., 2009); and
similar to other medical conditions, the overall individual risk attributable to
these alleles is exceedingly modest. Indeed, while recent studies have begun
to confirm the contribution of common alleles to ASDs, there is no other
neuropsychiatric disorder for which there is stronger evidence supporting the
importance of rare variants.
In practice, there have been two ways in which rare variant approaches
have already made important contributions to the field. The first involves the
use of a so-called outlier strategy in which unusual families or rare individuals
lead to the identification of rare disease-related alleles that illuminate the
pathophysiology of an ASD. The second involves recent evidence that a rare
variant:common disease model may apply, suggesting that a substantial risk is
carried in the population of affected individuals in the form of rare alleles.
Outlier Approaches
In 2004, the identification of several cases of affected females with deletions
on the X chromosome led Jamain and colleagues (2003) to evaluate genes in
the region of these deletions for rare variants among nearly 200 individuals
with ASDs. The authors found a single clearly deleterious mutation in the
gene Neuroligin 4 in one family with two affected males. In a second family,
a rare variant was also found in Neuroligin 3, a closely related molecule on
the X chromosome. This variant was not as unequivocally damaging to pro-
tein function but has subsequently been shown to influence synaptic activity
in mice (Tabuchi et al., 2007). Shortly after the initial report regarding
NLGNs and ASDs, a separate research group used parametric linkage in
an extended family with mental retardation and ASD to identify the same
X-chromosome interval for Neuroligin 4 (Laumonnier et al., 2004). Fine map-
ping of this region showed a unique, highly deleterious mutation in NLGN4
present in every affected family member, consistent with Mendelian expecta-
tions. These two findings represented the first identification of a functional
mutation in cases of idiopathic autism (i.e., autism not accompanied by some
other evidence of genetic syndrome) and the first convincing independent
replication of a genetic finding in ASDs.
Neuroligins are postsynaptic transmembrane neuronal adhesion mole-
cules that interact with neurexins, which are present on the presynaptic
terminal (Lise & El-Husseini, 2006). Subsequent studies have confirmed
that the mutations identified in NLGN4 in humans with ASDs lead to
abnormalities in the specification of excitatory glutamatergic synapses
in vitro as well as to synaptic maturation defects in mice (Chih, Afridi,
Clark, & Scheiffele, 2004; Chih, Engelman, & Scheiffele, 2005; Chih,
Gollan, & Scheiffele, 2006; Varoqueaux et al., 2006), providing important
convergent support for the initial finding. While additional mutation screen-
ings of individuals with ASDs have not led to the characterization of further
clearly functional variants in NLGN4 (Blasi et al., 2006), several recent studies
have provided strong additional evidence for the importance of this finding
through the identification of rare mutations among affected individuals in
molecules that interact directly or indirectly with the NLGN4 protein. These
include SHANK3 (Durand et al., 2007; Moessner et al., 2007) and Neurexin-1
(Kim et al., 2008; Marshall et al., 2008; Szatmari et al., 2007).
Another notable rare variant finding reported in the New England Journal
of Medicine (Strauss et al., 2006) used parametric linkage analysis to identify a
rare homozygous mutation in the gene contactin-associated protein 2
(CNTNAP2) among the Old Order Amish population that led to intractable
seizure, autism, and mental retardation. The study was notable both for the
statistical power due to the inbred nature of this population and for the
availability of pathological brain specimens due to epilepsy surgery performed
on several of the probands. As with NLGN4, CNTNAP2 is a neuronal adhe-
sion molecule (Poliak et al., 1999), and recent work has demonstrated that it
too is present in the synaptic plasma membrane (Bakkaloglu et al., 2008).
Moreover, two common variant association studies and a rare variant muta-
tion burden analysis have pointed to this molecule as carrying risk for ASDs
(Alarcon et al., 2008; Arking et al., 2008; Bakkaloglu et al., 2008).
These findings raise several important issues with regard to rare variants
and autism. First, they demonstrate the utility of the outlier approach to
provide clues to the pathophysiology of complex disorders. Prior to the iden-
tification of NLGN4, no specific data had implicated a molecular or cellular
mechanism underlying any aspect of idiopathic autism. Subsequently, con-
siderable effort has been aimed at delineating the relationship between
synapse function and ASDs (Zoghbi, 2003). Similarly, the identification and
characterization of CNTNAP2, coupled with the long-standing appreciation of
increased rates of seizures in individuals with autism, has raised considerable
interest in neuronal migration and its potential contribution to ASDs.
These findings also point to some of the challenges of demonstrating
causality when rare events are being investigated. In the initial identification
of NLGN, the link between the observed mutation and the observed pheno-
type was not inferred due to statistical evidence but, rather, to the specifics of
the gene itself and the nature of the observed mutation. In this case, the fact
that the gene is located on the X chromosome (thus, only one copy is present
in males) and the mutation is clearly deleterious to the formation of protein
product led the authors to conclude that the rare variant and the autism
phenotype must be related. While they were able to show in their small
family that transmission of the mutation was consistent with Mendelian
expectations, there were not sufficient observations to support this finding
with statistical analysis, nor was there a sufficient number of rare variants
identified to conduct a mutation burden study (Laumonnier et al., 2004).
Nonetheless, the nature of the NLGN4X mutation, its recapitulation
in vitro and in model systems, the independent replication using an alter-
native method (parametric linkage) in a separate family, and the identification
of additional rare variants in a molecular pathway specified by NLGN4
together strongly support the relevance of this molecule for ASDs. The
rarity of the clearly deleterious variants among affected individuals and the
finding that mutations in NGLN4 do not always result in observable pathol-
ogy (Macarov et al., 2007) are reminders that even highly penetrant mutations
do not always lead to the phenotype of interest and that rare variant discovery
may ultimately be extraordinarily valuable even if the initial observations
remain restricted to one or an extremely small number of events.
Autism and the Rare Variant:Common Disease Hypothesis

From a theoretical standpoint, there is ample reason to believe that rare
variants may carry considerable risk for ASD among the general population
of affected individuals. As noted, the disorder has an early onset and affects
the fundamental ability of individuals to make and keep social relationships;
additionally, the monozygotic–dizygotic concordance difference is consistent
with a considerable burden of new (and therefore rare) variation. Moreover,
there is consistent evidence that autism incidence increases with paternal age
(Cantor, Yoon, Furr, & Lajonchere, 2007; Reichenberg et al., 2006), as does the
burden of de novo mutation (Crow, 2000).
Perhaps most importantly, there is long-standing evidence that individuals
with autism are many times more likely than normally developing controls to
carry rare (now considered) gross microscopic chromosomal abnormalities,
including de novo rearrangements (Bugge et al., 2000; Wassink, Piven, &
Patil, 2001). In 2007, Jonathan Sebat and colleagues at Cold Spring Harbor
provided dramatic confirmation of the importance of individually rare cyto-
genetic events in ASD when they evaluated patients with autism in search of
de novo copy number changes and showed that in apparently sporadic cases
there is a substantial increased burden of rare copy number variation com-
pared both to familial cases of autism and to controls (Sebat et al., 2007). The
detection of rare variation at a high resolution across the entire genome,
the demonstration of its cumulative burden in the ASD phenotype, and
the specific CNVs identified which may provide specific clues to the identity
of genes with other rare variants contributing to autism all represent a very
significant step forward in the search for multiple independent mutations
contributing to ASD. These findings have subsequently been supported by
additional studies demonstrating an increased burden of de novo CNVs in
autism versus controls (Marshall et al., 2008), as well as studies demonstrat-
ing association of rare, recurrent CNVs with ASDs (Bucan et al., 2009;
Glessner et al., 2009; Kumar et al., 2008; Szatmari et al., 2007; Weiss
et al., 2008).
These latter findings underscore how much the dogma regarding rare
variation has begun to change, spurred by the advent of CNV detection.
For example, the most notable finding with regard to specific copy number
alterations and their involvement in ASD has been with respect to a region
on the short arm of chromosome 16 (16p11.2) (Kumar et al., 2008; Szatmari
et al., 2007; Weiss et al., 2008). The first studies to systematically address the
role of this variation in ASD relied on standard case–control association
methodology (Kumar et al., 2008; Weiss et al., 2008). They did not seek to
demonstrate a one-to-one relationship between carrying the variation and
having the phenotype within families, which would have been expected in
the era of standard cytogenetics. Indeed, in these initial analyses, 16p11 was
observed in unaffected individuals and, more importantly, within families de
novo variations were found in one affected individual but not in a second
affected sibling. This latter observation would previously have been taken as
strong evidence against the contribution of this variation within these pedi-
grees, based on Mendelian expectations for rare disorders.
These findings highlight the shift from conceptualizing rare variation as
being synonymous with Mendelian inheritance. Indeed, as the risks asso-
ciated with common variation have been found to be much smaller than
previously anticipated, prior notions about effect sizes that would come
under negative selection and result in rare transmitted alleles must also be
reconsidered. Moreover, as the consequences of rare variants may be more
subtle than previously anticipated and their contribution more complex, a
shift to association methodologies became a necessity. Fortunately, the field
of psychiatric genetics has several decades of experience with these strategies,
which point to the key requirements for the next generation of rare variant
studies, including controlling for population stratification, accounting for
multiple comparisons, and leveraging sufficiently large sample sizes to
allow for the detection of alleles of comparatively modest individual effect.
New Technology and Rare Variants
Several recent technological advances promise to soon provide the opportu-

nity to more easily identify rare variation contributing to autism and all
complex disorders. First, the resolution of CNV detection is increasing at a
tremendous pace owing to the ability to place an increasing number of
probes on a microarray. In the past several years, there has been a rapid
increase from approximately 10,000 to more than 2 million probes on a
single assay. Coupled with this, the cost of conducting such experiments
has been vastly decreasing, allowing for more and more comprehensive
assessment of the contribution of CNVs to a variety of neuropsychiatric
phenotypes.
A more fundamental change is also on the horizon. As mentioned briefly,
the means by which genomes are sequenced is undergoing a dramatic trans-
formation. The throughput of platforms able to read sequence directly is
increasing exponentially, while the cost per nucleotide is plummeting. This
will have a profound impact on the identification of genetic variation. Indeed,
the field is rapidly approaching an era in which the entire sequence of patient
and control DNAs will be directly analyzed and the ability to detect both
common and rare variations contributing to disease will be exhaustive.
Moreover, these new sequencing platforms will also be able to detect gains
and losses in DNA at a very high resolution, leading to unprecedented
simultaneous detection of a significant proportion of all the variation that
is thought to contribute to neuropsychiatric disorders. This development
promises to set the stage for an era of discovery that will dwarf even the
astonishing recent pace resulting from the development of CNV technology

and the implementation of whole-genome association technologies.
It is also very likely that these studies will do more than dramatically
expand our understanding of both the causal and contributory roles of
DNA variation in neuropsychiatric disorders. They will reveal the limits of
this avenue of investigation as well, focusing attention then on the need to
understand epigenetics, environmental modifications, and posttranscriptional
and translational processes and their contribution to complex mental
conditions.
References
Abelson, J. F., Kwan, K. Y., O’Roak, B. J., Baek, D. Y., Stillman, A. A., Morgan, T. M.,
et al. (2005). Sequence variants in SLITRK1 are associated with Tourette’s syndrome.
Science, 310(5746), 317–320.
Alarcon, M., Abrahams, B. S., Stone, J. L., Duvall, J. A., Perederiy, J. V., Bomar, J. M., et al.
(2008). Linkage, association, and gene-expression analyses identify CNTNAP2 as an
autism-susceptibility gene. American Journal of Human Genetics, 82(1), 150–159.
Altshuler, D., & Daly, M. (2007). Guilt beyond a reasonable doubt. Nature Genetics,
39(7), 813–815.
Arking, D. E., Cutler, D. J., Brune, C. W., Teslovich, T. M., West, K., Ikeda, M., et al. (2008).
A common genetic variant in the neurexin superfamily member CNTNAP2 increases
familial risk of autism. American Journal of Human Genetics, 82(1), 160–164.
Bailey, A., Le Couteur, A., Gottesman, I., Bolton, P., Simonoff, E., Yuzda, E., et al.
(1995). Autism as a strongly genetic disorder: Evidence from a British twin study.
Psychological Medicine, 25(1), 63–77.
Bakkaloglu, B., O’Roak, B. J., Louvi, A., Gupta, A. R., Abelson, J. F., Morgan, T. M.,
et al. (2008). Molecular cytogenetic analysis and resequencing of contactin associated
protein-like 2 in autism spectrum disorders. American Journal of Human Genetics,
82(1), 165–173.
Bilguvar, K., Yasuno, K., Niemela, M., Ruigrok, Y. M., von Und Zu Fraunberg, M., van
Duijn, C. M., et al. (2008). Susceptibility loci for intracranial aneurysm in European
and Japanese populations. Nature Genetics, 40(12), 1472–1477.
Blasi, F., Bacchelli, E., Pesaresi, G., Carone, S., Bailey, A. J., & Maestrini, E. (2006).
Absence of coding mutations in the X-linked genes neuroligin 3 and neuroligin 4 in
individuals with autism from the IMGSAC collection. American Journal of Medical
Genetics B Neuropsychiatric Genetics, 141B(3), 220–221.
Botstein, D., & Risch, N. (2003). Discovering genotypes underlying human pheno-
types: Past successes for Mendelian disease, future approaches for complex disease.
Nature Genetics, 33(Suppl.), 228–237.
Brzustowicz, L. M., Hodgkinson, K. A., Chow, E. W., Honer, W. G., & Bassett, A. S.
(2000). Location of a major susceptibility locus for familial schizophrenia on
chromosome 1q21-q22. Science, 288(5466), 678–682.
Bucan, M., Abrahams, B. S., Wang, K., Glessner, J. T., Herman, E. I., Sonnenblick, L. I.,
et al. (2009). Genome-wide analyses of exonic copy number variants in a family-based
study point to novel autism susceptibility genes. PLoS Genetics, 5(6), e1000536.
Bugge, M., Bruun-Petersen, G., Brondum-Nielsen, K., Friedrich, U., Hansen, J.,
Jensen, G., et al. (2000). Disease associated balanced chromosome rearrangements:
A resource for large scale genotype–phenotype delineation in man. Journal of
Medical Genetics, 37(11), 858–865.
Campbell, D. B., Sutcliffe, J. S., Ebert, P. J., Militerni, R., Bravaccio, C., Trillo, S., et al.
(2006). A genetic variant that disrupts MET transcription is associated with autism.
Proceedings of the National Academy of Sciences USA, 103(45), 16834–16839.
Cantor, R. M., Yoon, J. L., Furr, J., & Lajonchere, C. M. (2007). Paternal age and autism
are associated in a family-based sample. Molecular Psychiatry, 12(5), 419–421.
Chakravarti, A. (1999). Population genetics—making sense out of sequence. Nature
Genetics, 21(1 Suppl.), 56–60.
Chih, B., Afridi, S. K., Clark, L., & Scheiffele, P. (2004). Disorder-associated mutations
lead to functional inactivation of neuroligins. Human Molecular Genetics, 13(14),
1471–1477.
Chih, B., Engelman, H., & Scheiffele, P. (2005). Control of excitatory and inhibitory
synapse formation by neuroligins. Science, 307(5713), 1324–1328.
Chih, B., Gollan, L., & Scheiffele, P. (2006). Alternative splicing controls selective
trans-synaptic interactions of the neuroligin–neurexin complex. Neuron, 51(2),
171–178.
Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., et al. (2009).
Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.
Proceedings of the National Academy of Sciences USA, 106(45), 19096–19101.
Cohen, J. C., Kiss, R. S., Pertsemlidis, A., Marcel, Y. L., McPherson, R., & Hobbs, H.
H. (2004). Multiple rare alleles contribute to low plasma levels of HDL cholesterol.
Science, 305(5685), 869–872.
Constantino, J. N., Lajonchere, C., Lutz, M., Gray, T., Abbacchi, A., McKenna, K., et al.
(2006). Autistic social impairment in the siblings of children with pervasive devel-
opmental disorders. American Journal of Psychiatry, 163(2), 294–296.
Constantino, J. N., & Todd, R. D. (2005). Intergenerational transmission of subthres-
hold autistic traits in the general population. Biological Psychiatry, 57(6), 655–660.
Crow, J. F. (2000). The origins, patterns and implications of human spontaneous
mutation. Nature Reviews Genetics, 1(1), 40–47.
Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J., & Lander, E. S. (2001). High-
resolution haplotype structure in the human genome. Nature Genetics, 29(2), 229–232.
Dave, B. J., & Sanger, W. G. (2007). Role of cytogenetics and molecular cytogenetics in
the diagnosis of genetic imbalances. Seminars in Pediatric Neurology, 14(1), 2–6.
Durand, C. M., Betancur, C., Boeckers, T. M., Bockmann, J., Chaste, P., Fauchereau,
F., et al. (2007). Mutations in the gene encoding the synaptic scaffolding protein
SHANK3 are associated with autism spectrum disorders. Nature Genetics, 39(1),
25–27.
Fu, Y. H., Kuhl, D. P., Pizzuti, A., Pieretti, M., Sutcliffe, J. S., Richards, S., et al.
(1991). Variation of the CGG repeat at the fragile X site results in genetic instability:
Resolution of the Sherman paradox. Cell, 67(6), 1047–1058.
Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., et al.
(2002). The structure of haplotype blocks in the human genome. Science, 296(5576),
2225–2229.
Glessner, J. T., Wang, K., Cai, G., Korvatska, O., Kim, C. E., Wood, S., et al. (2009).
Autism genome-wide copy number variation reveals ubiquitin and neuronal genes.
Nature, 459(7246), 569–573.
Gupta, A. R., & State, M. W. (2007). Recent advances in the genetics of autism.
Biological Psychiatry, 61(4), 429–437.
Hakonarson, H., Grant, S. F., Bradfield, J. P., Marchand, L., Kim, C. E., Glessner, J. T.,
et al. (2007). A genome-wide association study identifies KIAA0350 as a type 1
diabetes gene. Nature, 448(7153), 591–594.
Hirschhorn, J. N., & Altshuler, D. (2002). Once and again—issues surrounding repli-
cation in genetic association studies. Journal of Clinical Endocrinology and
Metabolism, 87(10), 4438–4441.
Hirschhorn, J. N., & Daly, M. J. (2005). Genome-wide association studies for common
diseases and complex traits. Nature Reviews Genetics, 6(2), 95–108.
Hirschhorn, J. N., Lohmueller, K., Byrne, E., & Hirschhorn, K. (2002). A comprehen-
sive review of genetic association studies. Genetics in Medicine, 4(2), 45–61.
Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L., Donahoe, P. K., Qi, Y., et al.
(2004). Detection of large-scale variation in the human genome. Nature Genetics,
36(9), 949–951.
International HapMap Consortium. (2005). A haplotype map of the human genome.
Nature, 437(7063), 1299–1320.
International Human Genome Sequencing Consortium. (2004). Finishing the euchro-
matic sequence of the human genome. Nature, 431(7011), 931–945.
Jamain, S., Quach, H., Betancur, C., Rastam, M., Colineaux, C., Gillberg, I. C., et al.
(2003). Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4
are associated with autism. Nature Genetics, 34(1), 27–29.
Ji, W., Foo, J. N., O’Roak, B. J., Zhao, H., Larson, M. G., Simon, D. B., et al. (2008).
Rare independent mutations in renal salt handling genes contribute to blood
pressure variation. Nature Genetics, 40(5), 592–599.
Kim, H. G., Kishikawa, S., Higgins, A. W., Seong, I. S., Donovan, D. J., Shen, Y., et al.
(2008). Disruption of neurexin 1 associated with autism spectrum disorder.
American Journal of Human Genetics, 82(1), 199–207.
Kleinjan, D. J., & van Heyningen, V. (1998). Position effect in human genetic disease.
Human Molecular Genetics, 7(10), 1611–1618.
Kumar, R. A., KaraMohamed, S., Sudi, J., Conrad, D. F., Brune, C., Badner, J. A., et al.
(2008). Recurrent 16p11.2 microdeletions in autism. Human Molecular Genetics,
17(4), 628–638.
Lander, E., & Kruglyak, L. (1995). Genetic dissection of complex traits: Guidelines for
interpreting and reporting linkage results. Nature Genetics, 11(3), 241–247.
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., et al. (2001).
Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.
Laumonnier, F., Bonnet-Brilhault, F., Gomot, M., Blanc, R., David, A., Moizard, M. P.,
et al. (2004). X-linked mental retardation and autism are associated with a mutation
in the NLGN4 gene, a member of the neuroligin family. American Journal of Human
Genetics, 74(3), 552–557.
Lise, M. F., & El-Husseini, A. (2006). The neuroligin and neurexin families: From
structure to function at the synapse. Cellular and Molecular Life Sciences, 63(16),
1833–1849.
Lubs, H. A. (1969). A marker X chromosome. American Journal of Human Genetics,
21(3), 231–244.
Ma, D., Salyakina, D., Jaworski, J. M., Konidari, I., Whitehead, P. L., Andersen, A. N.,
et al. (2009). A genome-wide association study of autism reveals a common novel
risk locus at 5p14.1. Annals of Human Genetics, 73(Pt 3), 263–273.
Macarov, M., Zeigler, M., Newman, J. P., Strich, D., Sury, V., Tennenbaum, A., et al.
(2007). Deletions of VCX-A and NLGN4: A variable phenotype including normal
intellect. Journal of Intellectual Disability Research, 51(Pt 5), 329–333.
Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C., Feuk, L., Skaug, J., et al. (2008).
Structural variation of chromosomes in autism spectrum disorder. American Journal
of Human Genetics, 82(2), 477–488.
McClellan, J. M., Susser, E., & King, M. C. (2007). Schizophrenia: A common disease
caused by multiple rare alleles. British Journal of Psychiatry, 190, 194–199.
McMahon, F. J., Akula, N., Schulze, T. G., Muglia, P., Tozzi, F., Detera-Wadleigh, S.
D., et al. (2010). Meta-analysis of genome-wide association data identifies a risk
locus for major mood disorders on 3p21.1. Nature Genetics, 42(2), 128–131.
McPherson, J. D., Marra, M., Hillier, L., Waterston, R. H., Chinwalla, A., Wallis, J.,
et al. (2001). A physical map of the human genome. Nature, 409(6822), 934–941.
Millar, J. K., Wilson-Annan, J. C., Anderson, S., Christie, S., Taylor, M. S., Semple, C.
A., et al. (2000). Disruption of two novel genes by a translocation co-segregating
with schizophrenia. Human Molecular Genetics, 9(9), 1415–1423.
Moessner, R., Marshall, C. R., Sutcliffe, J. S., Skaug, J., Pinto, D., Vincent, J., et al.
(2007). Contribution of SHANK3 mutations to autism spectrum disorder. American
Journal of Human Genetics, 81(6), 1289–1297.
Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., et al.
(2009). Targeted capture and massively parallel sequencing of 12 human exomes.
Nature, 461(7261), 272–276.
Noonan, J. P. (2009). Regulatory DNAs and the evolution of human development.
Current Opinion in Genetics and Development, 19(6), 557–564.
O’Roak, B. J., & State, M. W. (2008). Autism genetics: Strategies, challenges, and
opportunities. Autism Research, 1(1), 4–17.
Poliak, S., Gollan, L., Martinez, R., Custer, A., Einheber, S., Salzer, J. L., et al. (1999).
Caspr2, a new member of the neurexin superfamily, is localized at the juxtaparanodes
of myelinated axons and associates with K+ channels. Neuron, 24(4), 1037–1047.
Prabhakar, S., Visel, A., Akiyama, J. A., Shoukry, M., Lewis, K. D., Holt, A., et al.
(2008). Human-specific gain of function in a developmental enhancer. Science,
321(5894), 1346–1350.
Pritchard, J. K. (2001). Are rare variants responsible for susceptibility to complex
diseases? American Journal of Human Genetics, 69(1), 124–137.
Psychiatric GWAS Consortium Steering Committee. (2009). A framework for inter-
preting genome-wide association studies of psychiatric disorders. Molecular
Psychiatry, 14(1), 10–17.
Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., et al. (2006).
Global variation in copy number in the human genome. Nature, 444(7118), 444–454.
Reichenberg, A., Gross, R., Weiser, M., Bresnahan, M., Silverman, J., Harlap, S., et al.
(2006). Advancing paternal age and autism. Archives of General Psychiatry, 63(9),
1026–1032.
Risch, N. (1990). Linkage strategies for genetically complex traits. I. Multilocus
models. American Journal of Human Genetics, 46(2), 222–228.
Risch, N., & Merikangas, K. (1996). The future of genetic studies of complex human
diseases. Science, 273(5281), 1516–1517.
Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I., Chen, H., et al.
(2007). Genome-wide association analysis identifies loci for type 2 diabetes and
triglyceride levels. Science, 316(5829), 1331–1336.
Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., et al.
(2007). A genome-wide association study of type 2 diabetes in Finns detects multiple
susceptibility variants. Science, 316(5829), 1341–1345.
Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., et al. (2007).
Strong association of de novo copy number mutations with autism. Science,
316(5823), 445–449.
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., et al. (2004). Large-
scale copy number polymorphism in the human genome. Science, 305(5683), 525–528.
Sharp, A. J., Cheng, Z., & Eichler, E. E. (2006). Structural variation of the human
genome. Annual Review of Genomics and Human Genetics, 7, 407–442.
State, M. W. (2006). A surprising METamorphosis: Autism genetics finds a common
functional variant. Proceedings of the National Academy of Sciences USA, 103(45),
16621–16622.
State, M. W., Greally, J. M., Cuker, A., Bowers, P. N., Henegariu, O., Morgan, T. M.,
et al. (2003). Epigenetic abnormalities associated with a chromosome 18(q21-q22)
inversion and a Gilles de la Tourette syndrome phenotype. Proceedings of the National
Academy of Sciences USA, 100(8), 4684–4689.
Strauss, K. A., Puffenberger, E. G., Huentelman, M. J., Gottlieb, S., Dobrin, S. E.,
Parod, J. M., et al. (2006). Recessive symptomatic focal epilepsy and mutant
contactin-associated protein-like 2. New England Journal of Medicine, 354(13),
1370–1377.
Sutherland, G. R. (1977). Fragile sites on human chromosomes: Demonstration of
their dependence on the type of tissue culture medium. Science, 197(4300), 265–266.
Szatmari, P., Paterson, A. D., Zwaigenbaum, L., Roberts, W., Brian, J., Liu, X. Q., et al.
(2007). Mapping autism risk loci using genetic linkage and chromosomal rearrange-
ments. Nature Genetics, 39(3), 319–328.
Tabuchi, K., Blundell, J., Etherton, M. R., Hammer, R. E., Liu, X., Powell, C. M., et al.
(2007). A neuroligin-3 mutation implicated in autism increases inhibitory synaptic
transmission in mice. Science, 318(5847), 71–76.
Varoqueaux, F., Aramuni, G., Rawson, R. L., Mohrmann, R., Missler, M., Gottmann,
K., et al. (2006). Neuroligins determine synapse maturation and function. Neuron,
51(6), 741–754.
Veenstra-Vanderweele, J., Christian, S. L., & Cook, E. H., Jr. (2004). Autism as a
paradigmatic complex genetic disorder. Annual Review of Genomics and Human
Genetics, 5, 379–405.
Veenstra-VanderWeele, J., & Cook, E. H., Jr. (2004). Molecular genetics of autism
spectrum disorder. Molecular Psychiatry, 9(9), 819–832.
Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., et al.
(1991). Identification of a gene (FMR-1) containing a CGG repeat coincident with a
breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell,
65(5), 905–914.
Vorstman, J. A., Staal, W. G., van Daalen, E., van Engeland, H., Hochstenbach, P. F., &
Franke, L. (2006). Identification of novel autism candidate regions through analysis
of reported cytogenetic abnormalities associated with autism. Molecular Psychiatry,
11(1), 18–28.
Wassink, T. H., Piven, J., & Patil, S. R. (2001). Chromosomal abnormalities in a clinic
sample of individuals with autistic disorder. Psychiatric Genetics, 11(2), 57–63.
Weiss, L. A., Arking, D. E., Daly, M. J., & Chakravarti, A. (2009). A genome-wide linkage
and association scan reveals novel loci for autism. Nature, 461(7265), 802–808.
Weiss, L. A., Shen, Y., Korn, J. M., Arking, D. E., Miller, D. T., Fossdal, R., et al. (2008).
Association between microdeletion and microduplication at 16p11.2 and autism.
New England Journal of Medicine, 358(7), 667–675.
Zeggini, E., & McCarthy, M. I. (2007). Identifying susceptibility variants for type 2
diabetes. Methods in Molecular Biology, 376, 235–250.
Zoghbi, H. Y. (2003). Postnatal neurodevelopmental disorders: Meeting at the synapse?
Science, 302(5646), 826–830.
Zondervan, K. T., & Cardon, L. R. (2004). The complex interplay among factors that
influence allelic association. Nature Reviews Genetics, 5(2), 89–100.
part iii
Causal Thinking in Psychiatry

11
Causal Thinking in Developmental Disorders

e. jane costello and adrian angold1
In this chapter we (1) lay out a definition of development as it relates to

psychopathology; (2) make the case that nearly all psychiatric disorders are
‘‘developmental’’; and (3) examine, with some illustrations, methods from
developmental research that can help to identify causal mechanisms leading
to mental illness.
What Do We Mean by Development?
The philosopher Ernst Nagel (1957, p. 15) defined development in a way that
links it to both benign and pathological outcomes:
The concept of development involves two essential components: the

notion of a system possessing a definite structure and a definite set
of pre-existing capacities; and the notion of a sequential set of changes
in the system, yielding relatively permanent but novel increments not
only in its structure but in its modes of operation.
As summarized by Leon Eisenberg (1977, p. 220), "the process of develop-

ment is the crucial link between genetic determinants and environmental
variables, between individual psychology and sociology." It is characteristic
of such systems that they consist of feedback and feedforward loops of vary-
ing complexity. Organism and environment are mutually constraining, how-
ever, with the result that developmental pathways show relatively high levels
of canalization (Angoff, 1988; Cairns, Gariépy, & Hood, 1990; Gottlieb &
Willoughby, 2006; Greenough, 1991; McGue, 1989; Plomin, DeFries, &
Loehlin, 1977; Scarr & McCartney, 1983).
1. Sections of this chapter are based in part on Costello (2008) and Costello & Angold (2006).
279
Like individual ‘‘normal’’ development, diseases have inherent develop-

mental processes of their own—processes that obey certain laws and follow
certain stages even as they destroy the individual in whom they develop (Hay
& Angold, 1993). A developmental approach to disease asks what happens
when developmental processes embodied in pathogenesis collide with the
process of ‘‘normal’’ human development.
The progression seen in chronic diseases (among which we categorize
most psychiatric disorders) has much in common with this view of develop-
ment. It is "structured" by the nature of the transformation of the organism
that begins the process, and in general, it follows a reasonably regular course,
although with wide variations in rate. Furthermore, there is hierarchical
integration as a disease develops. Each stage in the progress of a given
disease builds on the previous stages, and many of the manifestations of
earlier stages are "integrated" into later symptomatology. For example, con-
sider the well-established path to substance abuse (Kandel & Davies, 1982):
Beer or wine ! Cigarettes or hard liquor ! Marijuana ! Other

illicit drugs
It is characteristic of this pathway that the number of individuals at each

level becomes smaller but that those at the higher levels continue to show
behaviors characteristic of the earlier stages. Having described such a path-
way, the task is to understand the process by which it is established and to
invent preventive strategies appropriate to the various stages of the develop-
mental pathway. Such strategies must be appropriate to the developmental
stage of both the individual at risk and the disorder. This developmental
process is probabilistic, not determined (Gottlieb & Willoughby, 2006); never-
theless, studying the interplay or interaction among risk factors, developmen-
tal processes, and disease processes can yield causal insights.
What Can Timing of Risk Exposure Tell Us About the

Causes of Psychiatric Disorders?
The pathways leading to psychiatric disorders begin early in life. It is becom-

ing increasingly clear that most adults with psychiatric disorders report
onset in the first two decades of life (Insel & Fenton, 2005), even when
they are using retrospective recall over several decades. However, the origins
of a disorder may be even earlier. Twenty years ago, in their seminal work
Statistical Methods in Cancer Research, Breslow and Day (1987, p. 2) pointed
out that
Most chronic diseases are the results of a process extending over dec-
ades, and many of the events occurring in this period play a substantial
role in the study of physical growth, of mental and hormonal develop-
ment, and in the process of aging, the essential feature is that changes
over time are followed at the individual level.
The methods developed by cancer and cardiovascular epidemiologists to

explore causal relationships in such diseases can, we believe, provide at
least a useful starting place for thinking about psychiatric disorders.
We can narrow down the range of causal links between risk exposure and
disease by understanding more about different aspects of the timing of risk
exposure in relation to developmental stage. Age at first exposure, time since
first exposure, duration of exposure, and intensity of exposure are all inter-
related aspects of timing that may have different implications for causality.
Age at first exposure has been studied most intensively of all the aspects
of risk over time in developmental psychopathology because of the theoretical
importance attached to early experiences in Freudian and other psychody-
namic models of development. For example, researchers investigating the
role of attachment in children’s development have concentrated on the very
early months and years of life as the crucial period during which the inability
to form one or more relationships may have damaging effects that last into
childhood and perhaps even into adulthood (Sroufe, 1988). The critical date
of risk onset appears to occur after 6 months, but the duration of the risk
period is not yet clear.
Timing of exposure has received less attention in studies of developmental
psychopathology. Brown and Harris (1978), in their work on the social origins
of depression, argued that women who lost their mothers in the first decade
of life were more vulnerable as adults to depressive episodes in the face of
severe life events. The study supporting this hypothesis had several limita-
tions, including retrospective design, and did not address the question of
whether these women were also at greater risk of depressive episodes
during later childhood and adolescence. It is not clear whether the crucial
factor was the length of time since mother’s death, the age of the child at the
time of exposure, or some combination of the two.
An example of the importance of timing of exposure to a protective factor
comes from the study of Alzheimer disease. Risk of Alzheimer disease is
reduced in people who have been exposed to nonsteroidal anti-inflammatory
drugs (NSAIDs) but only if that exposure occurred several years before the
age at which Alzheimer begins to frequently appear in the population
(Breitner, 2007). A causal inference is that whatever NSAIDs do to prevent
Alzheimer disease, they work on some prodromal aspects of the disease.
Duration of exposure to poverty was examined by Offord and colleagues

(1992) in their repeated surveys of a representative sample of children in
Ontario. They showed that children whose families were living below the
poverty level at two measurement points were at increased risk of behavioral
disorders compared with children whose families were living below the pov-
erty level on either one occasion only or never. Using prospective longitudinal
data from the Dunedin, New Zealand, longitudinal study, Moffitt (1990)
found that children identified at age 13 as both delinquent and hyperactive
had experienced significantly more family adversity (poverty, poor maternal
education, and mental health) consistently from the age of 7 than children
who were only delinquent or only hyperactive at age 13.
Data from the longitudinal Great Smoky Mountains study were used to
study the role of duration of exposure to a protective factor. A representative
population sample of 1,420 rural children aged 9–13 at intake were given
repeated psychiatric assessments beginning in 1993. One-quarter of the
sample were American Indian and the rest predominantly White. In 1996
a casino opening on an Indian reservation gave every American Indian an
income supplement that increased annually. The younger children in the
sample were exposed to this increase in family income for longer than
those in the oldest cohort, most of whom left home not long after the
income supplement began. We examined the effect of this increase in
family income on drug abuse at age 21. Drug use and abuse were signifi-
cantly lower in those who had the longest exposure to the increased family
income than in either of the two older cohorts. They were also significantly
lower than in the youngest White cohort, who had not received any family
income supplement. We concluded that 3 or 4 years of exposure to the
protective effects of increased family income were needed for it to have a
long-term effect at age 21.
Intensity of exposure to lead provides an example of a definite dose–
response relationship (Needleman & Bellinger, 1991). Another aspect of
intensity is the number of different risk factors to which a child is exposed.
Sameroff & Seifer (1995), Rutter (1988), and others have pointed out that
children exposed to one risk factor are at increased risk of exposure to others
(e.g., father not in the home and poverty) and that the dose–response rela-
tionship to an increasing number of different risk factors is not a simple
linear one. Most children appear to be able to cope with a single adverse
circumstance, but rates of psychopathology rise sharply in children exposed
to several adverse circumstances or events (Seifer, Sameroff, Baldwin, &
Balwin, 1989).
These examples of different relationships between time and etiology are
far from exhaustive, but they indicate how we might use developmental
studies in different ways to look at etiology.
What Can ‘‘Normal’’ Development Tell Us About the

Causes of Psychiatric Disorders?
There are two main streams of psychological research: one into ‘‘norms’’ of
human development and behavior and the other into differences among
individuals and groups. Developmental psychopathology aims to integrate
the two (Cicchetti, 1984). The more we understand about normal develop-
ment in the general population, the more we can learn about the causes of
pathology in the minority of the population who have disorders.
Human development is marked by stages or turning points at which change
in one or more systems occurs quite rapidly, inducing a qualitative difference
in capacity (Pickles & Hill, 2006). Here, we consider what we can learn about
the causes of psychiatric disorders from what developmental science has
learned about two key developmental turning points: the period before and
immediately after birth and the long process that leads to sexual maturation.
Prenatal and perinatal development can carry risk for psychopathology later
in life. Several lines of research suggest that intrauterine growth retardation
creates risk for a range of psychiatric outcomes at different developmental
stages, depending on the timing of exposure in relation to time-specific vul-
nerabilities of the developing organism (Barker, 2004). Low birth weight has
been implicated in risk for schizophrenia (Nilsson et al., 2005; Silverton,
Mednick, Schulsinger, Parnas, & Harrington, 1988), attention-deficit/hyperac-
tivity disorder (ADHD) (Botting, Powls, Cooke, & Marlow, 1997; Breslau et al.,
1996; Breslau & Chilcoat, 2000; Pharoah, Stevenson, Cooke, & Stevenson,
1994; Szatmari, Saigal, Rosenbaum, Campbell, & King, 1990), and eating dis-
orders (Favaro, Tenconi, & Santonastaso, 2006). Since 1990, studies have been
published both supporting (Botting et al., 1997; Frost, Reinherz, Pakiz-Camras,
Giaconia, & Lefkowitz, 1999; Gale & Martyn, 2004; Gardner et al., 2004; Patton,
Coffey, Carlin, Olsson, & Morley, 2004; Pharoah, Stevenson, Cooke, &
Stevenson, 1994; Weisglas-Kuperus, Koot, Baerts, Fetter, & Sauer, 1993) and
disconfirming (Buka, Tsuang, & Lipsitt, 1993; Cooke, 2004; Jablensky, Morgan,
Zubrick, Bower, & Yellachich, 2005; Osler, Nordentoft, & Nybo Andersen, 2005;
Szatmari et al., 1990) the idea that low birth weight predicts depression.
The problem is moving beyond correlation to causal explanations. Clearly,
experimental assignment to a high-risk perinatal environment is not ethical
for human research. In a longitudinal study, we tried to narrow down the
range of possible causal explanations by testing two competing hypotheses to
explain why the incidence of depression increases dramatically in girls, but
not boys, when they are about age 13. A simple bivariate analysis showed that
depression was much more common (38.1% vs. 8.4%) in girls who had
weighed less than 2,500 g at birth than in other girls. One hypothesis
states that low birth weight is one of a range of risk factors that could lead
to depression, including other perinatal risk factors, childhood factors, and

adolescent factors. A second hypothesis, the fetal origins hypothesis (Barker,
2003), posits that low birth weight is a marker for poor intrauterine condi-
tions for growth and development. These provoke adjustments in fetal phy-
siological development, with long-term consequences for function and health
(Bateson et al., 2004). According to this model, there is ‘‘a mismatch between
physiologic capacities established in early development and the environments
in which they later must function’’ (Worthman & Kuzara, 2005, p. 98).
Adjustments by the fetus to suboptimal conditions may maximize chances
for survival during gestation and early development but at a deferred cost if
such adjustments leave the individual less prepared to deal with conditions
encountered later in life. In this case, effects of low birth weight might be
latent until the system encounters adversities that strain its capacity to adapt.
The stress threshold may be lower than one that would trigger illness in
individuals of normal birth weight.
Low birth weight was just one of a number of potential stress factors yet
continued to predict adolescent depression independently when a wide range
of potential confounders from the perinatal period, childhood, and adoles-
cence were included in the model. If low birth weight were merely one of a
cluster of generic risk factors for psychopathology, then it should predict
other disorders as well as depression and to do so throughout childhood
and adolescence in both boys and girls. In fact, it predicted only depression,
only in adolescence, and only in girls. Additionally, it did not act like just one
more risk factor. In the absence of other adversities, the rate of female
adolescent depression was zero in both normal– and low–birth weight
girls. However, 30% of low–birth weight girls exposed to a single adversity
had an episode of adolescent depression compared with 5% of normal–birth
weight girls, and the difference in girls with two adversities was even more
marked (84% vs. 20%). Low birth weight acted more like a potentiator of
other risk factors than a separate adversity. This argues for the fetal program-
ming hypothesis. Low birth weight has been implicated in other psychiatric
disorders at different developmental stages (e.g., ADHD in 6-year-old boys
[Breslau et al., 1996]), but studies have not distinguished among competing
causal hypotheses.
A review of animal models for causal thinking is beyond the scope of this
chapter, but we should note that the perinatal period is one for which animal
research has been particularly illuminating about the causes of later psycho-
pathology. Two sets of data have been especially fruitful: the work of Meaney
and colleagues (Champagne & Meaney, 2006) with rats and that of Suomi
and colleagues (Champoux et al., 2002) with monkeys, on interactions
between characteristics of the infant and of the mother in predicting devel-
opmental competence.
Puberty has emerged as another important developmental stage in relation

to several psychiatric disorders. The term puberty encompasses changes in
multiple indices of adolescent development, including increases in several
gonadal and steroidal hormones, height, weight, body fat, body hair, breast
and genitalia development, powers of abstract thinking, and family and peer
expectations and behavior. It can also occur at the same time as major social
changes such as moving to high school (Simmons & Blyth, 1992). We may be
concerned with either linear effects (e.g., increased risk as development pro-
ceeds) or nonlinear effects of various kinds (e.g., the onset of menses for girls or
crossing some threshold level of sex steroids [Angold, Costello, & Worthman,
1999; Tschann et al., 1994]). Of course, both the physiological and the social
impacts of puberty may interact and may vary by gender. Thus, the relationship
between puberty and psychopathology may vary widely depending on which
aspect of puberty is causally significant for which disorder.
We examined this by pitting hormonal, morphological, and social–psycho-
logical markers of puberty against one another as predictors of adolescent
depression, anxiety disorders, conduct disorder, and alcohol use and abuse in
a longitudinal study. In the case of depression, high levels of estrogen and
testosterone predicted adolescent depression in girls. On the other hand,
higher levels of testosterone were associated with nonaggressive antisocial
behavior in boys in deviant peer groups but positive leadership roles in
boys who did not associate with deviant peers. There is also growing evidence
indicating that both girls and boys who go through puberty earlier than their
peers are at increased risk for emotional and behavioral problems, especially
if they have unsupportive backgrounds or engage in early sexual intercourse
(Ge, Brody, Conger, & Murry, 2002; Ge, Conger, & Elder, 1996; Magnusson,
Stattin, & Allen, 1985; Moffitt, Caspi, Belsky, & Silva, 1992; Kaltiala-Heino,
Kosunen, & Rimpela, 2003; Kaltiala-Heino, Marttunen, Rantanen, & Rimpela,
2003). The emergent picture is that multiple components of puberty have a
variety of sex-differentiated effects on different forms of psychopathology.
Thus, an understanding of the norms of development can help us to get
beyond description and move along the pathway toward a better understand-
ing of the causes of psychiatric disorders. It is also worth emphasizing that
studying early developmental processes is important for disorders that spread
far into adulthood like depression and drug abuse.
Methods for Causal Research in Developmental

Psychopathology
In this section we will not discuss formal experiments with randomized

assignment, which can be enormously helpful for causal research but are
rarely feasible in longitudinal studies, especially those that require represen-

tative population samples or high-risk community samples. Instead, we will
discuss some recent examples of the use of quasi-experimental methods that
capitalize on the longitudinal strengths of developmental research.
There is much discussion of whether and to what extent epidemiological
research can establish causes. The reason that this matters so much is, of
course, the danger of confounding. Confounding due to the presence of one
or more common causes of the risk factor and the outcome distorts the
impact of a risk factor on the probability of disease. We discuss some methods
used to reduce, if not to eliminate, this risk. Note that in formal experimental
designs, there may be characteristics of setting or time that interfere with the
simple logic of the experiment. For example, it is notoriously difficult to study
blood pressure in laboratory settings because of the ‘‘white coat’’ phenom-
enon that plays havoc with ‘‘resting’’ blood pressure in the presence of nurses
and doctors. Additionally, even when a causal hypothesis is supported in
laboratory studies, its effect size needs to be estimated in the real world.
A second problem with causal research in the context of developmental
psychopathology is that there is no such thing as ‘‘a’’ single cause.
Developmental epidemiologist have used a range of terms to describe what
they are looking for: Examples include component causes (Rothman, 1976),
mechanisms (Rutter, 1994), and pathways (Pickles & Hill, 2006). This section
will discuss some recent examples of the use of quasi-experimental methods
that capitalize on the longitudinal strengths of developmental research.
Quasi-Experiments
What distinguishes quasi-experiments from randomized experiments is that
in the former case we cannot be sure that group assignment is free of bias.
In other respects (e.g., the selection of the intervention, the measures admi-
nistered, the timing of measurement), the two designs may be close to
identical. However, the difference—inability to use random assignment—
can threaten the validity of causal conclusions based on the results (as dis-
cussed earlier). We describe three such strategies used to test whether and
how traumatic events cause psychiatric disorders. (In the following diagrams,
O = observation, X = event, T = time).
Sample Compared Postevent to a Population Norm
If data have already been collected before the event or intervention, it may be
possible to set up a pre- vs. post-, exposed vs. not exposed design that comes
close to random assignment. However, it is unlikely that there will have been
an opportunity to collect ‘‘before’’ measures on those to whom the (typically
unforeseen) event will occur, with the result that the most common form of
quasi-experiment following an unexpected catastrophe is the following:
T1 T2 T3
Sample X O
Population norm O
For example, Hoven and colleagues (2005) compared children from New
York during the September 11, 2001 (9/11), attack on the Twin Towers to a
representative population sample from nearby Stamford, Connecticut, who
had been assessed with the same instruments just before 9/11, as well as to
other community samples. The New York children assessed 6 months after
9/11 had higher rates of most diagnoses.
This design is critically dependent on the comparability of the postevent
sample and the sample on which the measures were normed since otherwise
any differences found might be the result of preexisting differences rather
than the event. Therefore, although this design is often the only one avail-
able, it tends to be the weakest of the various quasi-experiments.
Dose–Response Measures of Exposure to an Event
Sometimes it is possible to use a dose–response strategy to test hypotheses

about whether an exposure causes an outcome, rendering the following
design:
T1 T2
Sample a X O
Sample b X O
For example, the same researchers (Hoven et al., 2005) divided New York
City into three areas at different geographical distances from the site of the
World Trade Center and sampled children attending schools in each area, to
test the hypothesis that physical distance from the event reduced the risk of
psychiatric disorder; this finding would support a causal relationship between
the event and the disorder. They found high rates of mental disorder
throughout the study area but significantly lower rates in children who
went to school in the area closest to the site of the attack. This took the
researchers by surprise; their post hoc explanation was that the extent of
social support and mental health care following 9/11 prevented the harm
that the event might have caused.
Next, they measured personal and family exposure to the attack and com-
pared children who had family members involved in the attack to those who
were geographically close but had no personal involvement. They found, as

predicted, that both personal and family involvement increased the risk of a
mental disorder but that involvement of a family member was the stronger
risk factor even when the children were physically distant from the site.
These aspects of the design of this study carry more weight than the one
described earlier because they incorporate stronger and more theory-based
design characteristics. However, the designs lack a pretest; therefore, we
cannot rule out the possibility that these groups were different before the
event.
Different Groups Exposed and Not Exposed, Both Tested

Before and After Exposure
This design has the potential to come closest to a randomized design because
the same subjects are studied both before and after an event that occurred in
one group but not the other:
T1 T2 T3
Sample a O O
Sample b O X O
However, if sample a and sample b were not randomly assigned from the
same subject pool, the researcher must convince the reader that there were
no differences between the two groups before the event that could potentially
confound the causal relationship.
For example, in a longitudinal study of development across the transition
to adulthood, we interviewed a representative sample of young people every 1
or 2 years since 1993. Subjects were interviewed each year on a date as close
as possible to their birthday. Thus, in 2001, when the participants were aged
19 and 21, about two-thirds of them had been interviewed when, on 9/11, the
Twin Towers and the Pentagon were struck. We continued to interview the
remaining subjects until the end of the year (Costello, Erkanli, Keeler, &
Angold, 2004), but the world facing these young people was a very different
one from that in which we had interviewed the first group of participants; for
example, there was talk of a national draft, which would directly affect this
age group.
The strength of this design is critically dependent on the comparability of
the groups interviewed before and after the event. In this case we had 8 years
of interviews with the participants before 2001. We compared the before-9/11
and the after-9/11 groups on a wide range of factors and were able to demon-
strate that each was a random subsample of the main sample. Thus, we had
a quasi-experiment that was equivalent to randomly assigning subjects who
had experienced vs. those who had not experienced 9/11. We predicted that,
even though the participants were living 500 miles away from where the
events occurred, this ‘‘distant trauma’’ (Terr et al., 1999) would increase
levels of anxiety and possibly, in this age group, alcohol and drug abuse.
We also hypothesized that the potential for military conscription might
further increase anxiety levels, especially in males. We were wrong on both
counts. There was no increase in levels of anxiety. Women interviewed after
9/11 reported higher levels of drug use in general, and cannabis in particular,
with rates of reported use approaching twice the pre-9/11 level. Conversely,
men interviewed after 9/11 were less likely to report substance abuse, and
use of all drugs was lower.
The examples of quasi-experimental studies described here suggest that
such designs can be quite effective at discounting previously held beliefs but
that they are open to the risk of post hoc interpretations (as in the post-9/11
examples). Finally, as Shadish, Cook, and Campbell (2002) point out, ‘‘they
can undermine the likelihood of doing even better studies.’’
Natural Experiments
Natural experiments are gifts to the researcher; they are situations that could
not have been planned or proposed but do what a randomized experiment
does. That is, they assign participants to one exposure or another without
bias and hold all other variables constant while manipulating the risk factor
of interest. Sometimes the unbiased assignment is created by events, as when
one group of families in our longitudinal study received an income supple-
ment while others did not, where race (American Indian vs. Anglo) was the
sole criterion (Costello, Compton, Keeler, & Angold, 2003). In this case, we
had 4 years of assessments of children’s psychiatric status before and after
the introduction of the income supplement and, thus, could compare the
children’s behavior before and after the intervention in both groups. The
years of measurement before the event enabled us to rule out the potential
confounding of ethnicity with the children’s emotional and behavioral
symptoms.
A tremendously important possibility for natural experimentation occurs
when genes and environment can be separated. Such naturally occurring
situations provided the foundation for the rise of genetic epidemiology,
which ‘‘focuses on the familial, and in particular genetic, determinants of
disease and the joint effects of genes and non-genetic determinants’’ (Burton,
Tobin, & Hopper, 2005, p. 941). Several researchers have made ingenious use
of the fact that ‘‘people take their genes with them when they move from one
country to another, but often the migration entails a radical change in life-
style’’ (Rutter, Pickles, Murray, & Eaves, 2001, p. 310). If a comparison group
is available from both the old and the new countries, a natural experiment
exists of the following form:
T1 T2
Sample a1 X O
Sample a2 O
Sample b O
Sample a1, who migrated, can be compared both with sample a2, who stayed
home, and with sample b, who grew up in the new country. If a1 and a2 are
more similar than a1 and b, this suggests that the pathology being measured is
more strongly influenced by the genetic similarity of the two groups of the same
race/ethnicity than it is by the environmental differences in which the two
groups now live. Conversely, a greater similarity between a1 and b suggests a
strong environmental effect. For example, Verhulst and colleagues compared
Turkish adolescents in Holland with Turkish adolescents in Turkey and Dutch
adolescents in Holland on a self-report measure of child psychopathology
(Janssen et al., 2004). They found that the immigrant youth reported more
anxious, depressed, and withdrawn symptoms than the Dutch youth but
more delinquency, attentional problems, and somatic problems than the
Turkish youth in Turkey. This suggests that Turkish adolescents are, in general,
more prone to emotional symptoms than Dutch adolescents but that migration
caused some behavioral problems not seen at home. Interestingly, a follow-up
study when the two samples living in Holland were in their 20s showed that
differences between the two groups shrank significantly, largely because the
immigrants’ mental health improved more than did that of the native Dutch.
There are, of course, important caveats to be considered before causal
conclusions can be drawn from migrant designs: Why did people migrate?
Are they representative of the nonmigrants at home? So long as these issues
are carefully considered, however, migrant designs can be very helpful in
pulling apart entangled component causes.
Prevention Trials as Natural Experiments

Trials of a treatment or prevention programs also test causal hypotheses.
Treatment trials, which tend to take place in academic medical settings
with highly selected samples, can rarely be used as the basis of general
causal inference; but population-based prevention trials may approximate
experimental conditions. Unfortunately, community-based prevention trials
are often too expensive or limited in scope to address developmental issues.
One example of a theory-driven prevention trial with an etiological, devel-
opmental message is Fast Track (Bierman et al., 1992). This school-based
intervention with teachers, parents, and children tested the theory that ‘‘early
starters’’ (i.e., children who show conduct problems early in childhood) tend
to increase in aggressive behavior over time and to persist in antisocial
behavior longer than other antisocial children (Moffitt, 1993). The interven-
tion had positive effects 4 years later, and mediational analyses supported
specific causal pathways. For example, improvements in parenting skills
affected the child’s behavior at home but not at school, while improvements
in social cognition about peers affected deviant peer associations.
Additionally, children whose prosocial behavior in the classroom improved
had improved ratings in classroom sociometric assessments (Bierman et al.,
2002). It would benefit causal research greatly if prevention trials were, like
Fast Track, specific about their causal theories and rigorous in testing them.
Approaches to Data Analysis for Testing Causal Models

in Developmental Psychopathology
Traditionally, researchers have tried to deal with the problem of confounding

by controlling for potential confounders while using regression models of
various types. More recently, new methods for etiological inference in epide-
miological research have been introduced for both cross-sectional and long-
itudinal data (e.g., Robins, Hernan, & Brumback, 2000; Rosenbaum & Rubin,
1985; Rothman & Greenland, 2005). The underlying principle is to use
inverse probability weighting to create conditions that approximate a rando-
mized experiment (i.e., those exposed to the risk factor of interest are inter-
changeable with those not exposed [Hernan & Robins, 2006]).
We compared two analytic approaches to examine the hypothesis that
growing up in a single-parent household increases risk for conduct disorder
against the alternative that other factors associated with a mother’s being a
single parent (e.g., being a teen parent, leaving school without graduating,
psychiatric or drug problems, criminal record) confound the relationship
between having a single parent and developing a conduct disorder. Using a
traditional regression analysis, having a single parent remained a marginally
significant predictor after controlling for the potential confounders. Using the
alternative (g-estimation) approach, single parenting no longer exerted a
causal effect on child conduct disorder.
Conclusions
Observation, categorization, pattern recognition, hypothesis testing, causal

thinking: These are the stages through which a science tends to progress
as it advances in knowledge and rigor (Feist, 2006). Developmental research

starts from description and pattern recognition, but it can use those observa-
tions to test hypotheses and ask causal questions. Developmental psycho-
pathology is helped enormously in this task by its access to 100 years of
theory-driven research in normal development, a corpus of knowledge that
psychiatry has yet to exploit in full (Cicchetti, 2006). As research into the
causes of psychiatric disorders advances, the importance of a developmental
approach to every type of disorder (not just those seen in early childhood) will
become even more evident and the value of longitudinal data to answer
causal questions will increase.
References
Angoff, W. H. (1988). The nature–nurture debate, aptitudes, and group differences.

American Psychologist, 43(9), 713–720.
Angold, A., Costello, E. J., & Worthman, C. M. (1999). Pubertal changes in hormone
levels and depression in girls. Psychological Medicine, 29(5), 1043–1053.
Barker, D. (2003). The developmental origins of adult disease. European Journal of
Barker, D. (2004). The developmental origins of well-being. Royal Society, 359(1449),
1359–1366.
Bateson, P., Barker, D., Clutton-Brock, T., Deb, D., D’Udine, B., Foley, R. A., et al.
(2004). Developmental plasticity and human health. Nature, 430(6998), 419–421.
Bierman, K. L., Coie, J. D., Dodge, K. A., Greenberg, M. T., Lochman, J. E., &
McMahon, R. J. (1992). A developmental and clinical model for the prevention of
conduct disorder: The FAST track program. Developmental Psychopathology, 4(4),
509–527.
Bierman, K. L., Coie, J. D., Dodge, K. A., Greenberg, M. T., Lochman, J. E., McMahon,
R. J., et al.; Conduct Problems Prevention Research Group. (2002). Using the Fast
Track randomized prevention trial to test the early-starter model of the development
of serious conduct problems. Development and Psychopathology, 14(4), 925–943.
Botting, N., Powls, A., Cooke, R. W. I., & Marlow, N. (1997). Attention deficit hyper-
activity disorders and other psychiatric outcomes in very low birthweight children at
12 years. Journal of Child Psychology and Psychiatry, 38(8), 931–941.
Breitner, J. (2007). Prevention of Alzheimer’s disease: Principles and prospects. In
M. Tsuang, W. S. Stone, & M. J. Lyons (Eds.), Recognition and prevention of major
mental and substance use disorders (pp. 319–329). Arlington, VA: American Psychiatric
Publishing.
Breslau, N., Brown, G. G., DelDotto, J. E., Kumar, S., Ezhuthachan, S., Andreski, P.,
et al. (1996). Psychiatric sequelae of low birth weight at 6 years of age. Journal of
Abnormal Child Psychology, 24(3), 385–400.
Breslau, N., & Chilcoat, H. D. (2000). Psychiatric sequelae of low birth weight at 11
years of age. Biological Psychiatry, 47(11), 1005–1011.
Breslow, N. E., & Day, N. E. (1987). Statistical methods in cancer research: Vol. II. The
design and analysis of cohort studies (IARC Scientific Publication 82). Lyon:
International Agency for Research on Cancer.
Brown, G. W., & Harris, T. O. (1978). The social origins of depression: A study of
psychiatric disorder in women. New York: Free Press.
Buka, S. L., Tsuang, M., & Lipsitt, L. (1993). Pregnancy/delivery complications and
psychiatric diagnosis: A prospective study. Archives of General Psychiatry, 50(2),
151–156.
Burton, P. R., Tobin, M. D., & Hopper, J. L. (2005). Key concepts in genetic epide-
miology. Lancet, 366(9489), 941.
Cairns, R. B., Gariépy, J. L., & Hood, K. E. (1990). Development, microevolution, and
social behavior. Psychological Review, 97(1), 49–65.
Champagne, F. A., & Meaney, M. J. (2006). Stress during gestation alters postpartum
maternal care and the development of the offspring in a rodent model. Biological
Psychiatry, 59(12), 1227–1235.
Champoux, M., Bennett, A., Shannon, C., Higley, J. D., Lesch, K. P., & Suomi, S. J.
(2002). Serotonin transporter gene polymorphism, differential early rearing, and
behavior in rhesus monkey neonates. Molecular Psychiatry, 7(10), 1058–1063.
Cicchetti, D. (1984). The emergence of developmental psychopathology. Child
Development, 55(1), 1–7.
Cicchetti, D. (2006). Development and psychopathology. In D. Cicchetti & D. J. Cohen
(Eds.), Developmental psychopathology (2nd ed., Vol. 1, pp. 1–23). Hoboken, NJ: John
Wiley & Sons.
Cooke, R. W. (2004). Health, lifestyle, and quality of life for young adults born very
preterm. Archives of Disease in Childhood, 89(3), 201–206.
Costello, E. J. (2008). Using epidemiological and longitudinal approaches to study
causal hypotheses. In M. Rutter (Ed.), Rutter’s child and adolescent psychiatry (pp.
58–70). Oxford: Blackwell Scientific.
Costello, E. J., & Angold, A. (2006). Developmental epidemiology. In D. Cicchetti & D.
Cohen (Eds.), Theory and method (2nd ed., Vol. 1, pp. 41–75). Hoboken, NJ: John
Wiley & Sons.
Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between
poverty and psychopathology: A natural experiment. Journal of the American Medical
Association, 290(15), 2023–2029.
Costello, E. J., Erkanli, A., Keeler, G., & Angold, A. (2004). Distant trauma: A prospec-
tive study of the effects of 9/11 on rural youth. Applied Developmental Science, 8(4),
211–220.
Eisenberg, L. (1977). Development as a unifying concept in psychiatry. British Journal
of Psychiatry, 131, 225–237.
Favaro, A., Tenconi, E., & Santonastaso, P. (2006). Perinatal factors and the risk of
developing anorexia nervosa and bulimia nervosa. Archives of General Psychiatry,
63(1), 82–88.
Feist, G. J. (2006). The psychology of science and the origins of the scientific mind. New
Haven, CT: Yale University Press.
Frost, A. K., Reinherz, H. Z., Pakiz-Camras, B., Giaconia, R. M., & Lefkowitz, E. S.
(1999). Risk factors for depressive symptoms in late adolescence: A longitudinal
community study. American Journal of Orthopsychiatry, 69(3), 370–381.
Gale, C. R., & Martyn, C. N. (2004). Birth weight and later risk of depression in a
national birth cohort. British Journal of Psychiatry, 184, 28–33.
Gardner, F., Johnson, A., Yudkin, P., Bowler, U., Hockley, C., Mutch, L., et al. (2004).
Behavioral and emotional adjustment of teenagers in mainstream school who were
born before 29 weeks’ gestation. Pediatrics, 114(3), 676–682.
Ge, X., Brody, G., Conger, R., & Murry, V. (2002). Contextual amplification of pubertal
transition effects on deviant peer affiliation and externalizing behavior among
African American children. Developmental Psychology, 38(1), 42–54.
Ge, X., Conger, R. D., & Elder, G. H. (1996). Coming of age too early: Pubertal
influences on girls’ vulnerability to psychological distress. Child Development,
67(6), 3386–3400.
Gottlieb, G., & Willoughby, M. (2006). Probabilistic epigenesis of psychopathology. In
D. Cicchetti & D. Cohen (Eds.), Developmental psychopathology: Theory and method
(2nd ed., Vol. 1, pp. 673–700). Hoboken, NJ: John Wiley & Sons.
Greenough, W. T. (1991). Experience as a component of normal development:
Evolutionary considerations. Developmental Psychopathology, 27(1), 14–17.
Hay, D. F., & Angold, A. (1993). Introduction: Precursors and causes in development
and pathogenesis. In D. F. Hay & A. Angold (Eds.), Precursors and causes in devel-
opment and psychopathology (pp. 1–21). Chichester: John Wiley & Sons.
Hernan, M. A., & Robins, J. M. (2006). Estimating causal effects from epidemiological
data. Journal of Epidemiology and Community Health, 60(7), 578–586.
Hoven, C. W., Duarte, C. S., Lucas, C. P., Wu, P., Mandell, D. J., Goodwin, R. D., et al.
(2005). Psychopathology among New York City public school children 6 months
after September 11. Archives of General Psychiatry, 62(5), 545–552.
Insel, T. R., & Fenton, W. S. (2005). Psychiatric epidemiology: It’s not just about
counting anymore. Archives of General Psychiatry, 62(6), 590–592.
Jablensky, A., Morgan, V., Zubrick, S. R., Bower, C., & Yellachich, L.-A. (2005).
Pregnancy, delivery, and neonatal complications in population cohort of women
with schizophrenia and major affective disorders. American Journal of Psychiatry,
162(1), 79–91.
Janssen, M. M., Verhulst, F., Bengi-Arslan, L., Erol, N., Salter, C., & Crijnen, A. M.
(2004). Comparison of self-reported emotional and behavioral problems in Turkish
immigrant, Dutch and Turkish adolescents. Social Psychiatry and Psychiatric
Kaltiala-Heino, R., Kosunen, E., & Rimpela, M. (2003). Pubertal timing, sexual beha-
viour and self-reported depression in middle adolescence. Journal of Adolescence,
26(5), 531–545.
Kaltiala-Heino, R., Marttunen, M., Rantanen, P., & Rimpela, M. (2003). Early puberty
is associated with mental health problems in middle adolescence. Social Science &
Medicine, 57(6), 1055–1064.
Kandel, D. B., & Davies, M. (1982). Epidemiology of depressive mood in adolescents:
An empirical study. Archives of General Psychiatry, 39(10), 1205–1212.
Magnusson, D., Stattin, H., & Allen, V. L. (1985). Differential maturation among girls
and its relation to social adjustment: A longitudinal perspective. Stockholm: University
of Stockholm.
McGue, M. (1989). Nature–nurture and intelligence. Nature, 340, 507–508.
Moffitt, T. E. (1990). Juvenile delinquency and attention deficit disorder: Boys’ devel-
opmental trajectories from age 3 to age 15. Child Development, 61(3), 893–910.
Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior:
A developmental taxonomy. Psychological Review, 100(4), 674–701.
Moffitt, T. E., Caspi, A., Belsky, J., & Silva, P. A. (1992). Childhood experience and the
onset of menarche: A test of a sociobiological model. Child Development, 63(1), 47–58.
Nagel, E. (1957). Determinism and development. In D. B. Harris (Ed.), The concept of
development (pp. 15–26). Minneapolis: University of Minnesota Press.
Needleman, H. L., & Bellinger, D. (1991). The health effects of low level exposure to
lead. Annual Review of Public Health, 12, 111–140.
Nilsson, E., Stalberg, G., Lichtenstein, P., Cnattingius, S., Olausson, P. O., & Hultman,
C. M. (2005). Fetal growth restriction and schizophrenia: A Swedish twin study.
Twin Research & Human Genetics, 8(4), 402–408.
Offord, D. R., Boyle, M. H., Racine, Y. A., Fleming, J. E., Cadman, D. T., Blum, H. M.,
et al. (1992). Outcome, prognosis, and risk in a longitudinal follow-up study. Journal
of the American Academy of Child and Adolescent Psychiatry, 31(5), 916–923.
Osler, M., Nordentoft, M., & Nybo Andersen, A.-M. (2005). Birth dimensions and risk
of depression in adulthood: Cohort study of Danish men born in 1953. British
Journal of Psychiatry, 186, 400–403.
Patton, G. C., Coffey, C., Carlin, J. B., Olsson, C. A., & Morley, R. (2004). Prematurity at
birth and adolescent depressive disorder. British Journal of Psychiatry, 184, 446–447.
Pharoah, P. O. D., Stevenson, C. J., Cooke, R. W. I., & Stevenson, R. C. (1994).
Prevalence of behaviour disorders in low birthweight infants. Archives of Disease in
Childhood, 70, 271–274.
Pickles, A., & Hill, J. (2006). Developmental pathways. In D. Cicchetti & D. Cohen
(Eds.), Developmental psychopathology: Theory and method (2nd ed., Vol. 1, pp.
211–243). Hoboken, NJ: John Wiley & Sons.
Plomin, R., DeFries, J., & Loehlin, J. (1977). Genotype–environment interaction and
correlation in the analysis of human behavior. Psychological Bulletin, 84(2), 309–322.
Robins, J., Hernan, M., & Brumback, B. (2000). Marginal structural models and causal
inference in epidemiology. Epidemiology, 11(5), 550–560.
Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multi-
variate matched sampling methods that incorporate the propensity score. American
Statistician, 39(1), 33–38.
Rothman, K. J. (1976). Reviews and commentary: Causes. American Journal of
Rothman, K. J., & Greenland, S. (2005). Causation and causal inference in epidemiol-
ogy. American Journal of Public Health, 95(Suppl. 1), S144–S150.
Rutter, M. (1988). Studies of psychosocial risk: The power of longitudinal data. New York:
Rutter, M. (1994). Concepts of causation, tests of causal mechanisms, and implications
for intervention. In A. C. Petersen & J. T. Mortimer (Eds.), Youth unemployment and
society (Vol. 13, pp. 147–171). New York: Cambridge University Press.
Rutter, M., Pickles, A., Murray, R., & Eaves, L. (2001). Testing hypotheses on specific
environmental causal effects on bevavior. Psychological Bulletin, 127(3), 291–324.
Sameroff, A., & Seifer, R. (1995). Accumulation of environmental risk and child mental
health (Vol. 31). New York: Garland Publishing.
Scarr, S., & McCartney, K. (1983). How people make their own environments: A theory
of genotype–environment effects. Child Development, 54(2), 424–435.
Seifer, R., Sameroff, A. J., Baldwin, C. P., & Balwin, A. (1989, April). Risk and pro-
tective factors between 4 and 13 years of age. Paper presented at the annual meeting
of the Society for Research in Child Development, San Francisco, CA.
Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental
designs for generalized causal inference. Boston: Houghton Mifflin.
Silverton, L., Mednick, S. A., Schulsinger, F., Parnas, J., & Harrington, M. E. (1988).
Genetic risk for schizophrenia, birthweight, and cerebral ventricular enlargement.
Journal of Abnormal Psychology, 97(4), 496–498.
Simmons, R. G., & Blyth, D. A. (1992). Moving into adolescence: The impact of
pubertal change and school context. In P. H. Rossi, M. Useem, & J. D. Wright
(Eds.), Social institutions and social change (pp. 366–403). New York: Aldine de
Gruyter.
Sroufe, L. A. (1988). The role of infant–caregiver attachment in development. In
J. Belsky & T. Nezworski (Eds.), Clinical Implications of Attachment (pp. 18–38).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Szatmari, P., Saigal, S., Rosenbaum, P., Campbell, D., & King, S. (1990). Psychiatric
disorders at five years among children with birthweights <1000g: A regional per-
spective. Developmental Medicine and Child Neurology, 32(11), 954–962.
Terr, L., Bloch, D., Michel, B., Shi, H., Reinhardt, J., & Metayer, S. (1999). Children’s
symptoms in the wake of Challenger: A field study of distant-traumatic effects, an
outline of related conditions. American Journal of Psychiatry, 156(10), 1536–1544.
Tschann, J. M., Adler, N. E., Irwin, C. E., Jr., Millstein, S. G., Turner, R. A., & Kegeles,
S. M. (1994). Initiation of substance use in early adolescence: The roles of pubertal
timing and emotional distress. Health Psychology, 13(4), 326–333.
Weisglas-Kuperus, N., Koot, H. M., Baerts, W., Fetter, W. P., & Sauer, P. J. (1993).
Behaviour problems of very low birthweight children. Developmental Medicine and
Child Neurology, 35(5), 406–416.
Worthman, C. M., & Kuzara, J. (2005). Life history and the early origins of health
differentials. American Journal of Human Biology, 17(1), 95–112.
12
Causes of Posttraumatic Stress Disorder

naomi breslau
The definition of posttraumatic stress disorder (PTSD) in the Diagnostic and

Statistical Manual of Mental Disorders, 3rd edition (DSM-III), and in subse-
quent DSM editions is based on a conceptual model that brackets traumatic
events from other stressful experiences and PTSD from other responses to
stress and links the two causally. The connection between traumatic experi-
ences and a specific mental disorder has become part of the general dis-
course. PTSD provides a cultural template of the human response to war,
violence, disaster, or very bad personal experiences.
The DSM-III revolutionized American psychiatry. The manual’s editors
wanted a symptom-based, descriptive classification and generally rejected
any reference to causal theories about mental processes. PTSD was an excep-
tion to the rule of creating a classification that is ‘‘atheoretical with regard to
etiology or pathophysiological process’’ (American Psychiatric Association,
1980 p. 7), but the exception was not noted anywhere in the manual.1 Not
only did the PTSD definition include an etiological event, but it incorporated
a theory, an underlying process, that connects the syndrome’s diagnostic
features (McNally, 2003; Young, 1995).
In 1994, the American Psychiatric Association published the fourth edi-
tion of the DSM. The definition of PTSD, which had already undergone some
revisions in DSM-IIIR, maintained the syndrome’s description but changed
materially the stressor criterion. The range of events was widened, and the
emphasis shifted to the subjective experience of victims. The list of ‘‘typical’’
traumas in the DSM-IV left no doubt that the intent was to enlarge the
variety of experiences that can be used to diagnose PTSD beyond the initial
conception of directly experienced, life-threatening events such as combat,
1. The DSM-III did contain explicit exceptions for disorders with known etiology or pathophysio-
logical process. For example, it stated that in organic mental disorders organic factors have
been identified. It is hardly necessary to point out the difference between these mental dis-
orders and PTSD with respect to causal assumptions.
297
natural disaster, rape, and other assault. Persons who learned about a threat
to the physical integrity of another person or about a traumatic event experi-
enced by a friend could be considered victims. A novel form of PTSD took
shape following the 9/11 terrorist attacks, when the entire population of the
United States was considered to have been affected by a ‘‘distant’’ trauma,
produced chiefly by viewing television coverage. Weeks after the attacks,
researchers conducted telephone surveys and detected a rise in the prevalence
of PTSD and major depression as well as a ‘‘dose–response’’ relationship
between television viewing time and symptoms. Furthermore, a rise of new
9/11-related PTSD cases was reported among those who viewed televised
images on the 1-year anniversary of the events (Bernstein et al., 2007).
Some commentators criticized the ‘‘conceptual bracket creep’’ on several
counts, including the concern that it produces a heterogeneous and ‘‘diluted’’
population of cases, making it far more difficult to detect and characterize
pathological alterations in PTSD (McNally, 2003).
This chapter examines three themes in research on PTSD. They concern
essential features of the disorder that inform, intersect, and complicate one
another. The first concerns the internal logic of PTSD in the DSM that
captures the way a traumatic experience is linked to the clinical syndrome
through memory. The second concerns risk factors and diathesis. The third
concerns comorbidity, both bivariate associations of trauma and PTSD with
specific disorders and multivariate approaches to underlying liabilities across
a wide range of psychiatric disorders.
The Inner Logic of PTSD
The definition of PTSD in the DSM is based on a conceptual model that

brackets traumatic events from less severe stressors and links the traumatic
events causally with a specific syndrome. The syndrome is defined by three
clusters: (1) persistent reexperiencing of the trauma (e.g., intrusive recollec-
tions), (2) persistent avoidance of stimuli associated with the trauma and
emotional numbing (e.g., diminished interest in significant activities), and
(3) persistent symptoms of increased arousal (e.g., insomnia, concentration
problems). The definition assumes an underlying psychological process. Its
core elements are the recurrence in the present of the traumatic past (i.e.,
traumatic memory) and the co-occurrence (in alternating phases) of the char-
acteristic clinical features. The trauma is reexperienced in intrusive and dis-
tressing recollections, experiences associated with increased arousal. The
victim adapts by avoiding situations that stimulate painful reexperiencing.
The presence of an underlying process is implicit in the description of
PTSD in DSM-III. Unless the event was encoded in memory, intrusive
recollections, flashback, and physiological reactivity to reminders could not

occur. The majority of clinicians and researchers familiar with the literature
on posttraumatic syndromes understood the classification’s subtext and
accepted the presence of an underlying process, notwithstanding DSM-III’s
editorial rejection of theories concerning causation. A second group also
understood the subtext but interpreted it as a survival of psychoanalytic
reasoning and opposed the inclusion of PTSD in DSM-III for this reason.
A third group interpreted the PTSD symptom list as being consistent with
the DSM-III editorial position. They believed that PTSD diagnosis is based on
the presence of the indicated features and that presumptions about connec-
tions among symptoms are not justified.
The centrality of traumatic memory implies a causal order: The memory
of the event causes the onset of the syndrome. Diagnosing a case of PTSD
requires a clinical inference regarding a causal connection between an iden-
tifiable past event and the subsequent onset of symptoms. However, this is
not the only way that the event and the syndrome can be connected. A person
with current unexplained symptoms may attribute a connection with a past
event after gaining some new knowledge that allows him to reassess the
original experience in a new light (Young, 2001). When the process begins
with current symptoms, it proceeds from symptoms through a search for a
past event that can qualify as a cause.
A PTSD diagnosis that depends on distinguishing between these two causal
possibilities would have been a problem. There are no tools for making such a
distinction. There were none in 1980 and there are none today. It should be
noted that the problem of distinguishing between these two causal directions
is entirely different from the problem of finding objective information as to
whether or not the reported event occurred. Objective information (e.g., from
official records) that could corroborate the occurrence of a reported event is
incapable of confirming that the event, when it occurred, created a distressing
memory and that the memory caused the onset of symptoms.
The DSM definition of PTSD requires only two elements, the typical clinical
syndrome and an ‘‘identifiable stressor.’’ Once both are found, the correct link
with traumatic memory, that is, that the memory of the event preceded and
caused the onset of the syndrome, is assumed (Young, 2001). PTSD specialists
have described how the clinician together with the patient ‘‘can translate current
symptoms into disavowed traumatic memories’’ and how by doing that ‘‘both
clinician and patient will gain compelling respect for the disorder’’ (Lindy, Green,
& Grace, 1987) (p. 272). Although DSM-III circumvented the memory problem
by not insisting on establishing its causal priority, it maintained a formal inner
logic by linking criterion symptoms with the stressor. Specifically, recurrent
nightmares and intrusive thoughts must mirror the event; avoidance must be
of situations that recall the event and precipitate distress. In themselves, the
criterion symptoms are nonspecific; they characterize other anxiety disorders

and depression and are used in the definition of these other disorders. It is
their co-occurrence and their connection with the stressor that transform these
diagnostically ambiguous symptoms into a distinct DSM disorder that is PTSD.
The disorder’s inner logic has important implications for how PTSD cases are
ascertained. The use of a symptom list, with a specified cutoff score or as a
continuum that represents varying degrees of PTSD severity, is a departure
from the DSM construct.
The problem of whether symptoms followed or instead preceded the trau-
matic memory has not disappeared. Although not addressed in the DSM
definition, it is a central point in litigations and compensation claims. How
can one decide whether the traumatic event, as portrayed by an individual
patient, is the cause of the patient’s distress and not the reverse? Is there a
conspicuous clinical feature that could distinguish between the two alterna-
tives? The length of time that separates the onset of symptoms from the
event might be such a feature: The longer the time interval, the greater
the suspicion that the causal explanation is reversed, that is, from symptoms
to a traumatic event. The intent in DSM-III was that ‘‘there would be tem-
porally close juxtaposition between the stressor and the development of
symptoms’’ (Andreasen, 2004) (p. 1322). Research in military and civilian
samples supports that expectation; in most cases, symptoms begin within
days of the event (Andrews, Brewin, Philpott, & Stewart, 2007; Breslau,
Davis, Andreski, & Peterson, 1991; Jones & Wessely, 2005).
A time lag from stressor to onset of symptoms was suspect from the start.
In a volume entitled Psychiatric Diagnosis,Tthird Edition (1984), Goodwin and
Guze comment on what was then a new condition in DSM-III called
‘‘PTSD.’’ They describe the political context in which PTSD was adopted
and how it had been vigorously promoted by advocate groups on behalf of
Vietnam veterans. They focused specifically on a subtype of PTSD called
‘‘delayed onset,’’ which entitled many veterans to compensation for service-
connected injury, although symptoms first appeared many years after mili-
tary discharge. Goodwin and Guze comment, ‘‘Rarely before had so many
claimants presented themselves to psychiatric examiners having read printed
checklists describing the diagnostic feature of the disorder for which they
sought compensation.’’(page 82)
Risk Factors and Diathesis

The Problem of Risk Factors in PTSD
The bracketing of extreme stressors in the DSM-III definition of PTSD
implied that, unlike more ordinary stressful life events, the causal effects
of extreme stressors are independent of personal vulnerabilities. PTSD was

said to be a normal response to an abnormal event. It was conceived as
‘‘normal’’ not merely because it was believed to be the norm in a statistical
sense but also as produced directly (naturally) by the stressor. The role of
stressors in PTSD was compared to ‘‘the role of force in producing a broken
leg. (Andreasen, 1980). A 1985 comprehensive review of PTSD concluded
that ‘‘the nature and intensity of the stressor is the primary etiological
factor in individual differences in response to stress’’ (Green, Lindy, &
Grace, 1985). Therapists and veterans’ advocates rejected any suggestions
that predispositions played a part. A leading psychiatrist-advocate wrote in
1985 that the ‘‘predisposition theory’’’ has no standing at all among expert
clinicians who treat war veterans or rape victims and survivors of civilian
disasters and that ‘‘the predisposition theory is an instance of blaming the
victim’’ (Blank, 1985).
Based on the available literature, Breslau and Davis (1987) suggested that
emotional disturbance is not a direct consequence of ‘‘extreme’’ stressors and
that, in regard to extreme stressors, social and individual factors may modify
the response in the same way that they modify the response to ordinary
stressful life events. Their argument was restated in 1995 by Yehuda and
McFarlane in an article on the conflict between current knowledge about
PTSD and its original conceptual basis. By the ‘‘original conception’’
Yehuda and McFarlane refer to the DSM-III PTSD concept of a ‘‘natural
response’’ to extraordinary events that did not depend on individual vulner-
ability. By ‘‘current knowledge’’ they refer to the epidemiological evidence
about the relative rarity of PTSD among trauma victims and the associations
of PTSD with risk factors other than trauma intensity. They note that, ‘‘even
among those who are exposed to very severe and prolonged trauma [they
refer to prisoners of war and Holocaust survivors], there is usually a sub-
stantial number of individuals who do not develop PTSD or other psychiatric
illnesses’’ (p. 1708). Far from being a normal response to an abnormal stres-
sor, it is an abnormal response observed only in some people. A person’s
response to a stressor is determined not by the stressor but by ‘‘interaction’’
between the stressor and the victim’s risk factors.
What is the epidemiological evidence that has accumulated since 1980?
Studies of general population samples in the United States show a range of
estimates of the probability of PTSD given exposure to trauma (Table 12.1).
The heterogeneity across studies reflects differences in the way in which
trauma-exposed persons were identified. Longer lists of stressors, which
include events of lesser magnitude, added in DSM-IV, yield higher propor-
tions of exposed persons but lower percentages of exposed persons meeting
PTSD criteria. Estimates of PTSD associated with specific event types are
more consistent across studies. Table 12.2 displays conditional probabilities
Table 12.1 Lifetime Prevalence of Exposure and PTSD (Rate/100)
Exposure PTSD
M F M F
Breslau et al. (1991) 43.0 36.7 6.0 11.3
Norris (1992) 73.6 64.8 — —
Resnick et al. (1993) — 69.0 — 12.3
Kessler et al. (1995) 60.7 51.2 5.0 10.4
Breslau et al. (1997) — 40.0 — 13.8
Stein et al. (1997) 81.3 74.2 — —
Breslau et al. (1998) 92.2 87.1 6.2 13.0
Breslau et al. (2004b) 87.2 78.4 6.3 7.9
Kessler et al. (2005) — — 3.6 9.7
Table 12.2 Conditional Probability of PTSD Across Specific Traumas: Estimates From
Two Population Surveys
Detroit Areaa Baltimoreb

n % PTSD n % PTSD
Assaultive violence 286 20.9 304 15.1
Rape 32 49.0 39 46.2
Shot/stabbed 21 15.4 64 9.4
Sexual assault other than rape 27 23.7 38 29.0
Mugged/threatened with weapon 138 8.0 123 4.1
Badly beaten up 53 31.9 30 13.3
Other injury or shock 633 6.1 287 6.6
Serious car crash 168 2.3 50 10.0
Other serious accident 87 16.8 17 5.9
Natural disaster 109 3.8 20 0.0
Witnessed killing/serious injury 183 7.3 149 5.4
Discovered dead body 45 0.2 19 5.3
Learning about others 564 2.2 238 2.9
Sudden unexpected death 474 14.3 543 9.0
Any Trauma 1,957 9.2 1,372 8.8
a
Breslau et al. (1998). bBreslau et al. (2004a).
from two epidemiological studies that used the DSM-IV definition. In both
studies, stressors grouped under ‘‘assaultive violence’’ were associated with
the highest probability of PTSD (15% and 21%) and learning about trauma
experience by a close friend or relative was associated with the lowest prob-
ability (2.2% and 2.9%). The risk from any qualifying trauma was <10%
(8.8% and 9.2%). Clearly, even trauma types that have the highest PTSD
risk leave the majority of victims unaffected by the disorder.
PTSD among American Vietnam veterans was estimated in the National

Vietnam Veterans Readjustment Survey (NVVRS), a representative sample of
veterans (Kulka et al., 1990). The lifetime prevalence of DSM-IIIR PTSD in
male Vietnam theater veterans was 30.6%. A recent revisit of the survey
adjusted the lifetime estimate to 18.7% (Dohrenwend et al., 2006).
As to stressor severity, the evidence from civilian studies has been weaker
than from veterans’ studies (Brewin, Andrews, & Valentine, 2000). Additionally,
in the two population samples that we surveyed that used DSM-IV, we found
that the higher PTSD risk associated with assaultive violence (vs. other event
types) was observed only in females (Breslau et al., 1998; Breslau, Wilcox,
Storr, Lucia, & Anthony, 2004b) (Table 12.3).
Research on Risk Factors

Studies of risk factors have examined lists of variables that included socio-
demographic factors together with personality traits and biographical events.
The NVVRS included race, family religion, family socioeconomic status, edu-
cational attainment, marital status, child abuse, childhood behavioral pro-
blems, family mental-health problems, and history of mental-health
problems (Kulka et al., 1990). A meta-analysis of a long list of risk factors
for PTSD discovered heterogeneity between civilian and veteran studies as
well as across methods (Brewin et al., 2000). However, three risk factors were
uniform across populations and methods: psychiatric history, family psychia-
tric history, and early adversity (Brewin et al., 2000). Although the effect of
each individual risk factor examined in the meta-analysis was relatively
small, their sum might outweigh the impact of trauma severity (Brewin
et al., 2000, p. 756).
Reports of high prevalence of PTSD among prisoners of war (50% or even
higher in some subgroups) have not examined predispositions (Engdahl,
Dikel, Eberly, & Blank, 1997; Goldstein, van Kammen, Shelly, Miller, & van
Kammen, 1987). Little research effort has been devoted to the question of the
Table 12.3 Conditional Probability of PTSD: Sex–Specific Comparisons
Detroit Areaa Baltimoreb

M (%) F (%) M (%) F (%)
Assaultive violence 6.0 35.7 7.1 23.5
Excluding rape/sexual assault 6.0 32.3 4.7 12.7
Other injury or shock 6.6 5.4 7.9 5.2
Learning about others 1.4 3.2 2.8 3.1
Sudden unexpected death 12.6 16.2 9.2 8.8
a
Breslau et al. (1998). bBreslau et al. (2004a).
contribution of predispositions to the psychiatric outcomes of extreme stres-

sors or across stressors’ magnitude. Does exposure to severe stressors over-
ride the effects of predispositions on posttrauma psychiatric disturbance? An
exception is a 1981 publication by Helzer, based on data from a follow-up
study of Vietnam veterans conducted in the early 1970s, that sheds light on
this issue. Helzer (1981) examined the effects of antecedent factors (failure to
graduate from high school, drug use) on depression across different levels of
combat stress (measured by number of combat events). He reported that the
influence of these antecedents was marked (and statistically significant) only
at high levels of combat stress. A dose–response relationship between levels
of combat exposure and psychopathology, reported in that study, was not
accompanied by a corresponding decrease in the impact of predispositions.
The opposite pattern was observed.
A meta-analysis of risk factors for PTSD (or PTSD symptoms) by Ozer,
Best, Lipsey, and Weiss (2003) concluded that ‘‘peritraumatic’’ responses (e.g.,
dissociation as the immediate response to the stressor) count the most. There
is a conceptual problem in this analysis (and similar studies concerning
negative appraisal as a risk factor, e.g., Ehlers & Clark, 2000) in that peritrau-
matic responses and appraisal might themselves be aspects of the outcome
we wish to explain. Dissociations, negative appraisal, and PTSD are likely to
be manifestations of the same psychological process or consequences of a
common vulnerability.
Prior Trauma as Risk Factor

A frequently replicated epidemiological finding is the enhanced probability of
PTSD in exposed persons who had experienced prior traumatic events.
Studies of Vietnam veterans and general population samples have reported
higher rates of prior trauma (including childhood maltreatment) among
exposed persons who succumbed to PTSD than among exposed persons
who did not (Bremner, Southwick, Johnson, Yehuda, & Charney, 1993;
Breslau, Chilcoat, Kessler, & Davis, 1999; Galea et al., 2002; Yehuda,
Resnick, Schmeidler, Yang, & Pitman, 1998). The finding has been inter-
preted as supporting a ‘‘sensitization’’ process, that is, greater responsiveness
to subsequent stressors (Post & Weiss, 1998). This interpretation further
highlights stressors as a cause of PTSD. Now stressors play two roles:
They cause PTSD directly and, through a separate causal pathway, increase
the vulnerability to PTSD in the future.
The evidence on prior trauma comes almost exclusively from cross-sectional
studies and retrospective reports, a limitation that is generally acknowledged
in the literature. A major limitation that has been overlooked is the failure to
assess how persons had responded to the prior trauma—specifically, whether or
not they had developed PTSD in response to the prior trauma. Consequently,
it is unclear whether prior trauma per se or, instead, prior PTSD predicts an
elevated risk for PTSD following a subsequent trauma. Evidence that pre-
viously exposed persons are at increased risk for PTSD only if their prior
trauma resulted in PTSD would not support the hypothesis that exposure to
traumatic events increases the risk of (i.e., sensitizes to) the PTSD effects of
a subsequent trauma, transforming persons with ‘‘normal’’ reactions to stres-
sors into persons susceptible to PTSD. It might suggest that trauma preci-
pitates PTSD in persons with preexisting susceptibility that had already been
present before the prior trauma occurred. Evidence that personal vulnerabil-
ities, chiefly neuroticism, history of major depression and anxiety disorders,
and family history of psychiatric disorders, increase the risk for PTSD has
been consistently reported. There also is evidence that personal vulnerabil-
ities might be stronger predictors of psychiatric response to traumatic events
than trauma severity, especially in civilian samples.
We recently examined this question in our longitudinal epidemiological
study of young adults (Breslau, Peterson, & Schultz, 2008). At baseline and at
three reassessments over the following 10 years, respondents were asked
about the occurrence of traumatic events and PTSD. Data from one follow-
up assessment or more were available on 990 respondents (98.3% of the
initial panel). Exposure to trauma and PTSD measured at baseline and at
the 5-year follow-up were used to predict new exposure and PTSD during the
respective subsequent periods: from baseline to the 5-year assessment and
from the baseline and 5-year assessments to the 10-year assessment.
Preexisting major depression and any anxiety disorder were included as cov-
ariates to control for their effects (Table 12.4).
In this adjusted model the relative risk for PTSD following exposure to
traumatic events in subsequent periods was significantly higher among
trauma victims with PTSD in the preceding periods than in trauma victims
Table 12.4 Prior Trauma and PTSD and the Subsequent Occurrence of Trauma and
PTSD (n = 990)
First Follow-Up Period Second Follow-Up Period
n Exposed PTSD Among n Exposed PTSD Among

(%) Exposed (%) (%) Exposed (%)
Prior PTSD 92 42.4 18.0 105 60.0 19.1
Prior trauma/no 294 33.3 12.2 386 41.5 6.3
PTSD
No Prior Trauma 604 24.0 8.3 419 27.4 6.1
From Breslau et al 2008
Table 12.5 Relative Risk for PTSD Following Exposure to Trauma Associated With
Prior Trauma, Prior PTSD, and Covariates
Variable Bivariate estimates, Multivariable Model,

OR (95% CI) aOR (95% CI)
Prior PTSD 3.01 (1.52, 5.97)* 2.68 (1.33, 5.41)*
Prior trauma/no PTSD 1.24 (0.65, 2.36) 1.22 (0.64, 2.34)
Female (vs. male) 2.51 (1.25, 5.06)* 1.94 (0.93, 4.07)
White (vs. Black) 0.61 (0.33, 1.12) 0.60 (0.32, 1.11)
College education (less) 0.81 (0.41, 1.59) 1.00 (0.51, 1.96)
Preexisting major depression 2.72 (1.49, 4.99)* 2.09 (1.71, 3.75)*
Preexisting anxiety 2.35 (1.32, 4.20)* 1.65 (0.91, 2.97)
From bivariate and a multivariable generalized estimating equation (GEE) multinomial
regressions. Each of the bivariate models and the multivariable model includes a term
for time interval (suppressed). From :Breslau et al 2008
*p < 0.05.
who had not succumbed to PTSD. Odds ratios were 2.68 (95% confidence
interval 1.33–5.41) and 1.22 (95% confidence interval 0.64–2.34), respectively,
adjusted for sex, race, education, preexisting major depression and anxiety
disorders, and time of assessment (Table 12.5). We concluded that there was
no support in these data for the idea that traumatic events experienced in the
past lurk inside, waiting to shape reactions to future traumatic events. The
findings suggest that preexisting susceptibility to a pathological response to
stressors accounts for the PTSD response to the prior trauma and the sub-
sequent trauma.
Our results had been foreshadowed by a 1987 Israeli study of acute combat
stress reaction (CSR) among soldiers in the 1982 Lebanon War (Solomon,
Mikulincer, & Jakob, 1987). The authors reported that CSR occurred more
frequently among soldiers of the Lebanon War who had experienced CSR in
a previous war, but not among soldiers who had fought in a previous war but
had not experienced CSR, compared to new recruits who had not fought in a
previous war. The authors concluded that knowledge of the outcome of prior
combat was essential for predicting soldiers’ response to subsequent combat.
Soldiers who suffered CSR in a previous war might have had preexisting vul-
nerability that also accounted for their increased risk of CSR during the sub-
sequent war. Soldiers who had fought in a previous war but had not
experienced CSR had a lower rate of CSR during the subsequent war than
new recruits who had no war experience. It is tempting to interpret this obser-
vation as evidence of ‘‘inoculation.’’ However, the new recruits included sol-
diers who would have had CSR had they fought in a prior war. This undetected
‘‘vulnerable’’ subset would push up the rate of CSR in the group of new recruits
as a whole.
Summarizing their vast research on how genetic and environmental fac-

tors combine to influence the risk of depression in adulthood, Kendler and
Prescott (2006) observe that environmental risk factors have time-limited
effects. ‘‘Although we are sure this is an oversimplification it does seem
that what causes temporally stable liability to depression comes from our
genes, whereas environmental factors create the large but brief spikes in
risk that induces episodes in vulnerable individuals’’ (p. 343). They make a
similar observation for antisocial behavior. It is reasonable to ask, Could this
time-limited effect apply also to traumatic events and PTSD? I do not wish to
ignore the caveat that Kendler and Prescott issued about oversimplification.
They might have had in mind childhood events, such as childhood sexual
abuse. However, with respect to these distant events, they have shown that
they are entangled with one another and with genetic factors influencing
depression and that the path to a recent major depression episode is indirect,
through its effects on lifetime trauma, conduct disorder, and recent stressful
life events.
Intelligence
Studies in Vietnam veterans have reported associations between intelligence
test scores and the risk for PTSD (Macklin et al., 1998; McNally & Shin, 1995;
Pitman, Orr, Lowenhagen, Macklin, & Altman, 1991). Evidence on the role of
intelligence in children’s psychiatric response to adversity was reported for a
range of disorders and for PTSD (Fergusson & Lynskey, 1996; Silva et al.,
2000). Several articles published in 2006 and 2007 reported on cognitive
ability measured in early childhood and subsequent PTSD in general popula-
tion samples (Breslau, Lucia, & Alvarado, 2006; Koenen, Moffitt, Poulton,
Martin, & Caspi, 2007; Storr, Ialongo, Anthony, & Breslau, 2007) and on
Vietnam veterans from the twin registry for whom predeployment test
scores were available (Kremen et al., 2007). Some of the studies found that
a decrease in risk was conferred by high IQ rather than the full range of IQ.
For example, we found that age 6 Wechsler Intelligence Scale for Children–
Revised IQ >115 was associated with a lower risk of subsequent exposure to
trauma and, among those exposed, a markedly lower risk of PTSD (adjusted
odds ratio = 0.21) (Breslau et al., 2006). Similarly, Gilbertson et al. (2006)
reported that above average cognitive functions protect from chronic PTSD
and that those with PTSD had average, rather than below average, cognitive
function. The mean IQ of PTSD veterans and their monozygotic twin broth-
ers was 105, whereas the mean IQ of non-PTSD combat veterans and their
monozygotic twin brothers was 118.
These studies perform two tasks. First, they dispel the notion that IQ
deficits observed among patients with PTSD reflect stress-induced
neurotoxicity, the primary hypothesis in earlier PTSD studies (Bremner, 1999;

Sapolsky, Uno, Rebert, & Finch, 1990). Observed cross-sectional associations
with IQ do not reflect the effects of psychological trauma but are more likely
to reflect preexisting differences. Second, these studies suggest that high IQ
plays a protective role. The mechanisms by which high IQ deters the PTSD
effects of trauma are unclear. Gilbertson et al. (2006) suggested that high IQ
signals a general capacity to effectively and flexibly manipulate verbal infor-
mation and, thus, a capacity to place traumatic experiences into meaningful
concepts, which may reduce negative emotional impact (p. 493).
Neuroticism
Neuroticism is a personality trait that at the high end is a disposition to
respond to stress with negative affect, depression, and anxiety and at the
low end manifests as emotional stability and ‘‘normality.’’ An early study
that called attention to neuroticism’s salience in the psychiatric response to
traumatic experiences reported on the survivors of the 1983 Australian bush-
fires. In contrast with the expectation that the intensity of the stressor would
be the primary cause, neuroticism and history of predisaster disturbances
emerged as stronger predictors of morbidity (McFarlane, 1988, 1989).
Studies of Vietnam combat veterans reported that PTSD and PTSD symptoms
were correlated with neuroticism (Casella & Motta, 1990; Hyer et al., 1994;
Talbert, Braswell, Albrecht, Hyer, & Boudewyns, 1993). In a general popula-
tion sample of young adults, neuroticism predicted both exposure to traumatic
events and PTSD after exposure, controlling for other risk factors (Breslau,
Davis, & Andreski, 1995; Breslau et al., 1991). In most of the studies neuroti-
cism was measured after the trauma, but three studies measured neuroticism
prior to the trauma and reported an association between neuroticism and
PTSD or postdisaster disturbance (Alexander & Wells, 1991; Engelhard, van
den Hout, & Kindt, 2003; Parslow, Jorm, & Christensen, 2006). Recently, pro-
spective studies have reported that anxious/depressed mood, anxiety disorders,
and difficult temperament measured in childhood predicted subsequent PTSD
(Breslau et al., 2006; Koenen et al., 2007; Storr et al., 2007).
Research on neuroticism demonstrated connections with neurophysiologi-
cal substrates, in particular, the lability of the autonomic nervous systems.
There is evidence supporting heritability and stability from childhood to
adulthood. Genetic control of neuroticism has been reported in numerous
studies since the 1970s. Molecular genetics studies, using both association
and linkage methods, identified gene regions that are likely to influence
variation in neuroticism (Fullerton et al., 2003; Lesch et al., 1996). A recent
meta-analysis concluded that there is a strong association between a seroto-
nin transporter promoter polymorphism (5-HTTLPR) and neuroticism, when
neuroticism is measured by the NEO Personality Inventory (Sen, Burmeister,

& Ghosh, 2004). Analysis of the Virginia Twin Registry showed that neuroti-
cism was only minimally changed following major depression episodes
(Kendler & Prescott, 2006), suggesting that reported cross-sectional associa-
tions between neuroticism and disorders reflect primarily the effect of neu-
roticism. Kendler and Prescott also showed that genetic factors underlying
neuroticism are largely shared with those that influence the liability for
internalizing disorders, although PTSD was not included. There is evidence
that neuroticism contributes to psychopathology even more broadly, including
externalizing disorders and comorbidity between internalizing and externaliz-
ing disorders (Khan, Jacobson, Gardner, Prescott, & Kendler, 2005; Krueger &
Markon, 2001).
Placing neuroticism in a list of risk factors containing chiefly attributes of
aggregates might obscure the potential status of neuroticism as diathesis.
Neuroticism accounts for the process that is PTSD: the way in which a
stressor is perceived and appraised and the characteristic features of the
PTSD syndrome. As a propensity related to stress reactivity, high neuroticism
predicts the repetitions of the memory of the past trauma in the present
(ruminating and reexperiencing symptoms), the phobic avoidance, the dys-
phoria, and associated sleep and concentration problems that characterize
PTSD. While it maps on the PTSD characteristic features, it simultaneously
connects PTSD with the main body of neurobiological science.
Comorbidity
Studies of general population samples have confirmed earlier observations

from clinical samples and samples of Vietnam veterans that persons diag-
nosed with PTSD have high rates of other psychiatric disorders. One expla-
nation that has been proposed is that stressors that cause PTSD also cause
other disorders. Because in PTSD there is always an identified etiological
stressor, it is logical to ask, Could not the same stressor also have caused
the comorbid disorder via a separate and distinct pathway (Yehuda,
McFarlane, & Shalev, 1998)?
The hypothesis that stressors increase the risk for other disorders, inde-
pendent of their PTSD effects, would be supported by evidence of elevated
incidence of other disorders in trauma victims who did not succumb to PTSD
relative to persons who were not exposed to trauma. Conversely, evidence of
an increased risk for the subsequent onset of major depression or substance-
use disorders only in victims with PTSD, relative to persons who did not
experience trauma, would suggest that PTSD might be the cause or, alter-
natively, that the two disorders share a common underlying vulnerability.
The uniqueness of PTSD in the DSM has methodological implications for

evaluating this question. Because a link with a stressor is required for PTSD,
the risk of PTSD in trauma victims is measured by a conditional probability
(probability among those exposed to a stressor). In contrast, the risk for other
psychiatric disorders following trauma is measured by a ratio of risks, or a
relative risk, which is the standard epidemiological method for evaluating a
suspected cause. The definition of depression or substance-use disorder
requires no link with a stressor, and the risk for these disorders in trauma
victims is evaluated relative to an unexposed reference group. Additionally,
because we are interested in whether exposure per se, independent of PTSD,
causes another disorder, victims are separated into two subsets, those with
and those without PTSD.
In our longitudinal study of young adults, the adjusted odds ratio for the
first onset of major depression in persons with preexisting PTSD was 2.96
and in persons who were exposed to trauma but did not develop PTSD, 1.35
(not significant); the difference between the odds ratios was statistically sig-
nificant (Table 12.6). Neither PTSD nor history of trauma exposure increased
the risk for the onset of alcohol-use disorder. However, preexisting PTSD, but
not history of trauma exposure, increased the risk for the subsequent onset of
drug-use disorder. The difference between odds ratios associated with prior
PTSD and prior exposure was statistically significant. A recent report from
another longitudinal study on the incidence of drug-use disorders shows
similar results. An increased 1-year incidence of drug disorder was found
for persons with preexisting PTSD but not for persons with exposure alone
(Reed, Anthony, & Breslau, 2007).
To gain further understanding of the relationship between PTSD and
these other disorders, we estimated associations in the reverse direction.
Table 12.6 Incidence and Relative Risk for Other Disorders in 10-Year Follow-Up in
Detroit Area Study of Young Adults
PTSD Exposed/No PTSD Not Exposed
% aOR % aOR % aOR

(95% CI) (95% CI) (95% CI)
Major 38.5 2.96* (1.59–5.53) 19.5 1.35 (0.89–2.03) 17.1 —
depressiona,b
Alcohol A/D 15.8 1.45 (0.67–3.17) 15.6 1.14 (0.71–1.85) 12.8 —
Drug A/Da 10.6 4.34* (1.63–11.53) 2.2 0.72 (0.25–2.05) 2.6 —
aOR, odds ratio adjusted for sex, race, and education.
a
Comparisons of aORs between PTSD and exposure/no PTSD are significant at *p < 0.05. Data
on substance-use disorders are from Breslau et al. (2003). bSimilar results on major depression
from 5-year follow-up are in Breslau et al. (2000). A/D = abuse or dependence
We examined whether preexisting major depression and substance-use dis-

orders increased the risk for new exposure to traumatic events or the condi-
tional risk for PTSD. What we found is that preexisting major depression
predicted an increased risk for subsequent exposure and for PTSD among
persons exposed to trauma. Drug-use disorder was associated with an
increased risk for neither subsequent exposure nor PTSD. Preexisting alcohol
disorder was associated with an increased risk for PTSD following exposure
but not for exposure (Table 12.7).
The prospective results, taken together, suggest different explanations
across comorbid disorders. (1) The bidirectional relationship between major
depression and PTSD together with the evidence that preexisting major
depression increased the likelihood of subsequent exposure and PTSD follow-
ing exposure suggest a shared diathesis or shared environmental causes other
than the traumatic event (for which there was no evidence). (2) In contrast,
drug-use disorder, for which there was support for only one direction—from
PTSD to drug-disorder onset—might be a complication of PTSD.
A general conclusion that can be drawn is that persons who experienced
trauma and who did not develop PTSD (i.e., most of those exposed to trau-
matic events) are not at an elevated risk for major depression and drug-use
disorders compared with unexposed persons. The excess incidence of these
disorders in persons exposed to trauma is concentrated primarily in the small
subset of exposed persons with PTSD. Reports from other studies on lifetime
and current co-occurrence of other disorders with trauma and PTSD support
this generalization (Breslau, Chase, & Anthony, 2002; North & Pfefferbaum,
2002).
Another approach to comorbidity that illuminates etiology is the applica-
tion of quantitative models to large data sets on multiple disorders. Specific
psychiatric disorders are understood as manifestations of a small number of
Table 12.7 Risk for Exposure to Trauma and PTSD by Preexisting Disorders
Preexisting Exposure in PTSD in Exposed

Diagnoses Total Sample (n = 1,007) (n = 399)
HR (95% CI) HR (95% CI)

Major depression 2.0* (1.3–3.0) 3.7* (2.0–6.7)
Drug A/D 1.1 (0.7–1.7) 1.1 (0.5–2.7)
Alcohol A/D 1.1 (0.8–1.6) 2.1* (1.2–3.9)
HR, hazard ratio, adjusted for sex, race, and education from eight Cox proportional hazards
models with time-dependent variables; CI, confidence interval.
From ‘‘Estimating post-traumatic stress disorder in the community: Lifetime perspective and the
impact of typical traumatic events,’’ by N. Breslau, E. L. Peterson, L. M. Poisson, L. R. Schultz, &
V. C. Lucia, 2004a, Psychological Medicine, 34(5), 889–898.
liability factors. These latent factors explain comorbidity by virtue of their

impact on multiple disorders. Krueger and Markon (2006) presented results
from a meta-analysis of data on 11 disorders from five population samples.
The best-fitting model comprises internalizing and externalizing liability fac-
tors (correlated at 0.50), and the internalizing factor splits into two separable
(but highly correlated) liabilities, labeled ‘‘distress’’ and ‘‘fear’’ (Figure 12.1).
Distress is the liability for depressive disorders and generalized anxiety dis-
order; fear is the liability for panic and phobic disorders. Similar models were
identified in previous studies by Krueger (1999), using phenotypic data, and
by Kendler, Prescott, Myers, and Neale (2003), using twin designs. PTSD was
not included in these analyses.
A factor analysis of multiple disorders that included PTSD by Slade and
Watson (2006) found that PTSD loaded highly on the distress factor (Figure
12.2). In an earlier analysis by Watson (2005), PTSD also loaded on the
distress factor together with depression and generalized anxiety disorder.
However, its affinity to the distress factor was weaker than the affinity of
those other disorders (Table 12.8). Watson cites evidence from an analysis of
data on Gulf War veterans, suggesting that PTSD’s affinity to the distress
liability is due to a dysphoria factor in PTSD, which combines symptoms of
emotional numbing with insomnia, irritability, and poor concentration that
are prevalent in depression and anxiety disorders (Simms, Watson, &
Doebbeling, 2002).
.50
Internalizing Externalizing
.95 .78
Distress Fear
.75
.86 .74 .79 .78 .70
.84 .71 .70 .84 .59
Generalized Adult
Major Social Specific Panic Alcohol Drug Conduct
Dysthymia Anxiety Agoraphobia Antisocial
Depression Phobia Phobia Disorder Disorder Disorder Disorder
Disorder Behavior
Figure 12.1 Path diagram for best-fitting meta-analysis model. Used with permission
of ANNUAL REVIEWS, INC., from Annual Review of Clinical Psychology, article by
R. Krueger and K. E. Markon, volume 2, 2006; permission conveyed through
Copyright Clearance Center, Inc.
Major depression 0.81
Dysthmyia 0.82
Distress
Posttraumatic stress 0.83
0.9
0.85
Generalized anxiety
0.75
Neurasthenia Internalizing
0.82
Social Phobia
0.83 Fear
Panic disorder 0.9
0.83
Agoraphobia 0.78 0.6
Obsessive-compulsive
0.72 Externalizing
Alcohol dependence
0.70
Drug dependence
Figure 12.2 Best-fitting model of the structure of 10 DSM-IV disorders (Australian

NSMHWB) (N = 10,641). From Tim Slade and David Watson, ‘‘The structure of
common DSM-IV and ICD-10 mental disorders in the Australian general population,’’
Psychological Medicine, volume 32, issue 11, page 1597, 2006 ! Cambridge Journals,
reproduced with permission.
Table 12.8 Factor Loadings of Lifetime DSM-III-R Diagnoses: NCS Data (n = 5,877)
Factor 1 Factor 2 Factor 3

Dysthymia .80 .01 –.13
Major depressive episode .75 .00 .02
Generalized anxiety disorder .62 –.01 .15
Posttraumatic stress disorder .40 .15 .17
Alcohol dependence –.02 .76 –.05
Antisocial personality disorder .01 .75 –.03
Drug dependence .01 .74 .04
Simple phobia –.02 –.04 .74
Agoraphobia .10 –.04 .68
Social phobia –.09 .08 .67
Panic disorder .31 –.06 .50
Bipolar disorder .33 .29 .29
Reproduced with permission from Watson, D. (2005). Rethinking the mood and anxiety disorders:
A quantitative hierarchical model for DSM-V. Journal of Abnormal Psychology, 114(4), 522–536.
Copyright ! 2005 by the American Psychological Association. The use of APA information does
not imply endorsement by APA.
What might these underlying liabilities be? Kruege (1999) points out that
the two major spectra of disorders, internalizing and externalizing, are linked
in the literature to personality traits of neuroticism and disinhibition: neuro-
ticism to internalizing disorders and neuroticism in the presence of high
disinhibition to externalizing disorders. Based on our findings, we have sug-
gested the possibility of a common diathesis between PTSD and major
depression and proposed that it might be a mistake to regard PTSD and
major depression in ‘‘comorbid’’ cases as separate and distinct (Breslau,
Davis, Peterson, & Schultz, 2000). As to the relationship of PTSD with sub-
stance-use disorders, the evidence suggests a different explanation. If there
are common underlying liabilities, they are probably weaker. It is clear that
we cannot conclude that the association of PTSD with alcohol- or drug-use
disorder is environmental, with alcohol or drug involvement increasing the
probability of exposure to traumatic events and indirectly increasing the risk
for PTSD. Also, survival analysis with time-dependent covariates of the retro-
spective data gathered at baseline did not support a causal pathway from
substance-use disorders to exposure. A recent study on another sample repli-
cates these findings (Reed et al., 2007). The evidence in our prospective data
that PTSD increased the risk for drug-use disorders, especially prescription
medicines, if replicated, would provide a part of the explanation. PTSD and
substance-use disorder are probably connected by multiple pathways, includ-
ing a more complex shared liabilities pattern which involves both neuroticism
and disinhibition.
Conclusion
Findings from our prospective research help to rule out some of the potential
pathways that might account for PTSD comorbidity. Trauma-exposed persons
who did not succumb to PTSD (i.e., about 90% of those exposed) are not at a
markedly increased risk for other disorders. PTSD following exposure to
stressors might identify persons with preexisting liability to a range of dis-
orders. The findings do not support the idea that trauma caused PTSD in
some victims and major depression in others. They led us to conclude that
the two disorders might have a shared diathesis and that, when observed
together in ‘‘comorbid’’ cases, they are not distinct disorders with separate
etiologies. Multivariate analysis of psychiatric comorbidity illuminates etiol-
ogy by seeking to identify core processes underlying multiple disorders. The
liability constructs that emerge resemble personality traits linked to psycho-
pathology. Neuroticism is a liability for internalizing disorders, a spectrum
that contains PTSD. The construct of PTSD as a process with an inner logic
has close affinity to neuroticism. The core dimension in neuroticism is the
individual’s propensity to respond to stressors. At the low end, it describes a

normal response style. At the high end, it describes a predisposition for
emotional instability, negative affect, and lability of the autonomic nervous
system. Neuroticism is not merely a risk factor that, with other risk factors
that are attributes of aggregates, increases the probability of PTSD. It is better
conceived of as an underlying liability for PTSD and its association with other
disorders, including depression and substance-use disorders. The possibility
that another liability trait, extroversion (or disinhibition), comes into play in
the comorbidity trajectory of PTSD, particularly with respect to substance-use
disorders, fits within the general liability framework, in which the two traits
are moderately correlated. Drug-use disorder in PTSD cannot be assumed to
be external to this liability framework. It cannot be regarded as an environ-
mental factor that increases the likelihood of exposure to traumatic events,
indirectly raising the risk for PTSD. We found no evidence for this pathway
in two prospective studies.
What was left out in this examination of causes of PTSD is history and
society. For that, we now have rich accounts that trace trauma theories,
psychiatric observations, and policies, including the waxing and waning of
attention to social factors and individual predispositions in cross-cultural and
military psychiatry (Breslau, 2004, 2005; Jones & Wessely, 2005; Shephard,
2001; Young, 1995). Key factors accounting for variability in psychiatric
casualties during warfare were morale, group cohesion, and leadership.
Perhaps the most important and simplest historical lesson is contained
in Shephard’s comment that living in a robust and self-confident culture
helps.
References
Alexander, D. A., & Wells, A. (1991). Reactions of police officers to body-handling after a
major disaster. A before-and-after comparison. British Journal of Psychiatry, 159, 547–
555.
American Psychiatric Association. (1980). Diagnostic and statistical manual of mental
disorders (3rd ed.). Washington DC: Author.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental
disorders: DSM-IV (4th ed.). Washington DC: Author.
Andreasen, N. C. (1980). Post-traumatic stress disorder. In A. M. Freedman, H. I.
Kaplan, & B. J. Sadock (Eds.), Comprehensive textbook of psychiatry (3rd ed.).
Baltimore: Williams & Wilkins.
Andreasen, N. C. (2004). Acute and delayed posttraumatic stress disorders: A history
and some issues. American Journal of Psychiatry, 161(8), 1321–1323.
Andrews, B., Brewin, C. R., Philpott, R., & Stewart, L. (2007). Delayed-onset posttrau-
matic stress disorder: A systematic review of the evidence. American Journal of
Psychiatry, 164(9), 1319–1326.
Bernstein, K. T., Ahern, J., Tracy, M., Boscarino, J. A., Vlahov, D., & Galea, S. (2007).
Television watching and the risk of incident probable posttraumatic stress disorder:
A prospective evaluation. Journal of Nervous and Mental Disease, 195(1), 41–47.
Blank, A. S. (1985). Irrational reactions to post-traumatic stress disorder and Viet Nam
veterans. In S. Sonnenberg, A. S. Blank, Jr., & J. A. Talbott (Eds.), The trauma of
war: Stress and recovery in Viet Nam veterans (p. xxi). Washington DC: American
Psychiatric Press.
Bremner, J. D. (1999). Does stress damage the brain? Biological Psychiatry, 45(7),
797–805.
Bremner, J. D., Southwick, S. M., Johnson, D. R., Yehuda, R., & Charney, D. S. (1993).
Childhood physical abuse and combat-related posttraumatic stress disorder in
Vietnam veterans. American Journal of Psychiatry, 150(2), 235–239.
Breslau, J. (2004). Cultures of trauma: Anthropological views of posttraumatic stress
disorder in international health. Culture, Medicine and Psychiatry, 28(2), 113–126.
Breslau, J. (2005). Response to ‘‘Commentary: Deconstructing critiques on the inter-
nationalization of PTSD’’. Culture, Medicine and Psychiatry, 29(3), 371–376.
Breslau, N., Chase, G. A., & Anthony, J. C. (2002). The uniqueness of the DSM
definition of post-traumatic stress disorder: Implications for research. Psychological
Medicine, 32(4), 573–576.
Breslau, N., Chilcoat, H. D., Kessler, R. C., & Davis, G. C. (1999). Previous exposure to
trauma and PTSD effects of subsequent trauma: Results from the Detroit Area
Survey of Trauma. American Journal of Psychiatry, 156(6), 902–907.
Breslau, N., & Davis, G. C. (1987). Posttraumatic stress disorder. The stressor criterion.
Journal of Nervous and Mental Disease, 175(5), 255–264.
Breslau, N., Davis, G. C., & Andreski, P. (1995). Risk factors for PTSD-related trau-
matic events: A prospective analysis. American Journal of Psychiatry, 152(4), 529–535.
Breslau, N., Davis, G. C., Andreski, P., & Peterson, E. (1991). Traumatic events and
posttraumatic stress disorder in an urban population of young adults. Archives of
General Psychiatry, 48(3), 216–222.
Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. (1997). Psychiatric sequelae of
posttraumatic stress disorder in women. Archives of General Psychiatry, 54(1), 81–87.
Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. R. (2000). A second look at
comorbidity in victims of trauma: The posttraumatic stress disorder–major depres-
sion connection. Biological Psychiatry, 48(9), 902–909.
Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the inci-
dence of nicotine, alcohol and drug disorders in persons who have experienced
trauma. Archives of General Psychiatry, 60, 289–294.
Breslau, N., Kessler, R. C., Chilcoat, H. D., Schultz, L. R., Davis, G. C., & Andreski, P.
(1998). Trauma and posttraumatic stress disorder in the community: The 1996
Detroit Area Survey of Trauma. Archives of General Psychiatry, 55(7), 626–632.
Breslau, N., Lucia, V. C., & Alvarado, G. F. (2006). Intelligence and other predisposing
factors in exposure to trauma and posttraumatic stress disorder: A follow-up study at
age 17 years. Archives of General Psychiatry, 63(11), 1238–1245.
Breslau, N., Peterson, E., & Schultz, L. (2008). A second look at prior trauma and the
posttraumatic stress disorder-effects of subsequent trauma: A prospective epidemio-
logical study. Archives of General Psychiatry, 65(4), 431–437.
Breslau, N., Peterson, E. L., Poisson, L. M., Schultz, L. R., & Lucia, V. C. (2004a).
Estimating post-traumatic stress disorder in the community: Lifetime perspective
and the impact of typical traumatic events. Psychological Medicine, 34(5), 889–898.
Breslau, N., Wilcox, H. C., Storr, C. L., Lucia, V. C., & Anthony, J. C. (2004b). Trauma
exposure and posttraumatic stress disorder: A study of youths in urban America.
Journal of Urban Health, 81(4), 530–544.
Brewin, C. R., Andrews, B., & Valentine, J. D. (2000). Meta-analysis of risk factors for
posttraumatic stress disorder in trauma-exposed adults. Journal of Consulting and
Clinical Psychology, 68(5), 748–766.
Casella, L., & Motta, R. W. (1990). Comparison of characteristics of Vietnam
veterans with and without posttraumatic stress disorder. Psychological Reports,
67(2), 595–605.
Dohrenwend, B. P., Turner, J. B., Turse, N. A., Adams, B. G., Koenen, K. C., &
Marshall, R. (2006). The psychological risks of Vietnam for U.S. veterans: A revisit
with new data and methods. Science, 313(5789), 979–982.
Ehlers, A., & Clark, D. M. (2000). A cognitive model of posttraumatic stress disorder.
Behaviour Research and Therapy, 38(4), 319–345.
Engdahl, B., Dikel, T. N., Eberly, R., & Blank, A., Jr. (1997). Posttraumatic stress
disorder in a community group of former prisoners of war: A normative response
to severe trauma. American Journal of Psychiatry, 154(11), 1576–1581.
Engelhard, I. M., van den Hout, M. A., & Kindt, M. (2003). The relationship between
neuroticism, pre-traumatic stress and post-traumatic stress: A prospective study.
Personality and Individual Differences, 35, 381–388.
Fergusson, D. M., & Lynskey, M. T. (1996). Adolescent resiliency to family adversity.
Journal of Child Psychology and Psychiatry, 37(3), 281–292.
Fullerton, J., Cubin, M., Tiwari, H., Wang, C., Bomhra, A., Davidson, S., et al. (2003).
Linkage analysis of extremely discordant and concordant sibling pairs identifies
quantitative-trait loci that influence variation in the human personality trait neuroti-
cism. Amercian Journal of Human Genetics, 72(4), 879–890.
Galea, S., Ahern, J., Resnick, H., Kilpatrick, D., Bucuvalas, M., Gold, J., et al. (2002).
Psychological sequelae of the September 11 terrorist attacks in New York City. New
England Journal of Medicine, 346(13), 982–987.
Gilbertson, M. W., Paulus, L. A., Williston, S. K., Gurvits, T. V., Lasko, N. B., Pitman,
R. K., et al. (2006). Neurocognitive function in monozygotic twins discordant for
combat exposure: Relationship to posttraumatic stress disorder. Journal of Abnormal
Psychology, 115(3), 484–495.
Goldstein, G., van Kammen, W., Shelly, C., Miller, D. J., & van Kammen, D. P. (1987).
Survivors of imprisonment in the Pacific theater during World War II. American
Goodwin, D. W., & Guze, S. B. (1984). Psychiatric diagnosis (3rd ed.). New York: Oxford
University Press.
Green, B. L., Lindy, J. D., & Grace, M. C. (1985). Posttraumatic stress disorder. Toward
DSM-IV. Journal of Nervous and Mental Disease, 173(7), 406–411.
Helzer, J. E. (1981). Methodological issues in the interpretations of the consequences
of extreme situations. In B. S. Dohrenwend & B. P. Dohrenwend (Eds.), Stressful life
events and their contexts: Monographs in psychosocial epidemiology (Vol. 2, pp. 108–129).
New York: Prodist.
Hyer, L., Braswell, L., Albrecht, B., Boyd, S., Boudewyns, P., & Talbert, S. (1994).
Relationship of NEO-PI to personality styles and severity of trauma in chronic
PTSD victims. Journal of Clinical Psychology, 50(5), 699–707.
Jones, E., & Wessely, S. (2005). War syndromes: The impact of culture on medically
unexplained symptoms. Medical History, 49(1), 55–78.
Kendler, K. S., & Prescott, C. A. (2006). Genes, environment, and psychopathology:

Understanding the causes of psychiatric and substance use disorders. New York:
Guilford Press.
Kendler, K. S., Prescott, C. A., Myers, J., & Neale, M. C. (2003). The structure of
genetic and environmental risk factors for common psychiatric and substance use
disorders in men and women. Archives of General Psychiatry, 60(9), 929–937.
Kessler, R. C., Chiu, W. T., Demler, O., Merikangas, K. R., & Walters, E. E. (2005).
Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the
National Comorbidity Survey Replication. Archives of General Psychiatry, 62(6),
617–627.
Kessler, R. C., Sonnega, A., Bromet, E., Hughes, M., & Nelson, C. B. (1995).
Posttraumatic stress disorder in the National Comorbidity Survey. Archives of
General Psychiatry, 52(12), 1048–1060.
Khan, A. A., Jacobson, K. C., Gardner, C. O., Prescott, C. A., & Kendler, K. S. (2005).
Personality and comorbidity of common psychiatric disorders. British Journal of
Koenen, K. C., Moffitt, T. E., Poulton, R., Martin, J., & Caspi, A. (2007). Early child-
hood factors associated with the development of post-traumatic stress disorder:
Results from a longitudinal birth cohort. Psychological Medicine, 37(2), 181–192.
Kremen, W. S., Koenen, K. C., Boake, C., Purcell, S., Eisen, S. A., Franz, C. E., et al.
(2007). Pretrauma cognitive ability and risk for posttraumatic stress disorder: A twin
study. Archives of General Psychiatry, 64(3), 361–368.
Krueger, R. F. (1999). The structure of common mental disorders. Archives of General
Psychiatry, 56(10), 921–926.
Krueger, R. F., & Markon, K. E. (2001). The higher-order structure of common DSM
mental disorders: Internalization, externalization, and their connections to person-
ality. Genetic and environmental relationships between normal and abnormal per-
sonality. Personality and Individual Differences, 30, 1245–1259.
Krueger, R. F., & Markon, K. E. (2006). Reinterpreting comorbidity: A model based
approach to understanding and classifying psychopathology. Annual Review of
Clinical Psychology, 2, 111–133.
Kulka, R. A., Schlenger, W. E., Fairbank, J. A., Hough, R. L., Jordan, B. K., Marmar, C.
R., et al. (1990). Trauma and the Vietnam War generation: Report of findings from the
National Vietnam Veterans Readjustment Study. New York: Brunner/Mazel.
Lesch, K. P., Bengel, D., Heils, A., Sabol, S. Z., Greenberg, B. D., Petri, S., et al.
(1996). Association of anxiety-related traits with a polymorphism in the serotonin
transporter gene regulatory region. Science, 274(5292), 1527–1531.
Lindy, J. D., Green, B. L., & Grace, M. C. (1987). The stressor criterion and posttrau-
matic stress disorder. Journal of Nervous and Mental Disease, 175(5), 269–272.
Macklin, M. L., Metzger, L. J., Litz, B. T., McNally, R. J., Lasko, N. B., Orr, S. P., et al.
(1998). Lower precombat intelligence is a risk factor for posttraumatic stress dis-
order. Journal of Consulting and Clinical Psychology, 66(2), 323–326.
McFarlane, A. C. (1988). The longitudinal course of posttraumatic morbidity. The
range of outcomes and their predictors. Journal of Nervous and Mental Disease, 176(1),
30–39.
McFarlane, A. C. (1989). The treatment of post-traumatic stress disorder. British
Journal of Medical Psychology, 62(Pt. 1), 81–90.
McNally, R. J. (2003). Progress and controversy in the study of posttraumatic stress
disorder. Annual Review of Psychology, 54, 229–252.
McNally, R. J., & Shin, L. M. (1995). Association of intelligence with severity of post-
traumatic stress disorder symptoms in Vietnam combat veterans. American Journal
of Psychiatry, 152(6), 936–938.
Norris, F. H. (1992). Epidemiology of trauma: Frequency and impact of different
potentially traumatic events on different demographic groups. Journal of
Consulting and Clinical Psychology, 60(3), 409–418.
North, C. S., & Pfefferbaum, B. (2002). Research on the mental health effects of
terrorism. Journal of the American Medical Association, 288(5), 633–636.
Ozer, E. J., Best, S. R., Lipsey, T. L., & Weiss, D. S. (2003). Predictors of posttraumatic
stress disorder and symptoms in adults: A meta-analysis. Psychological Bulletin,
129(1), 52–73.
Parslow, R. A., Jorm, A. F., & Christensen, H. (2006). Associations of pre-trauma
attributes and trauma exposure with screening positive for PTSD: Analysis of
a community-based study of 2,085 young adults. Psychological Medicine, 36(3),
387–395.
Pitman, R. K., Orr, S. P., Lowenhagen, M. J., Macklin, M. L., & Altman, B. (1991). Pre-
Vietnam contents of posttraumatic stress disorder veterans’ service medical and
personnel records. Comprehensive Psychiatry, 32(5), 416–422.
Post, R. M., & Weiss, S. R. (1998). Sensitization and kindling phenomena in mood,
anxiety, and obsessive–compulsive disorders: The role of serotonergic mechanisms
in illness progression. Biological Psychiatry, 44(3), 193–206.
Reed, P. L., Anthony, J. C., & Breslau, N. (2007). Incidence of drug problems in young
adults exposed to trauma and posttraumatic stress disorder: Do early life experiences
and predispositions matter? Archives of General Psychiatry, 64(12), 1435–1442.
Resnick, H. S., Kilpatrick, D. G., Dansky, B. S., Saunders, B. E., & Best, C. L. (1993).
Prevalence of civilian trauma and posttraumatic stress disorder in a representative
national sample of women. Journal of Consulting and Clinical Psychology, 61(6),
984–991.
Sapolsky, R. M., Uno, H., Rebert, C. S., & Finch, C. E. (1990). Hippocampal damage
associated with prolonged glucocorticoid exposure in primates. Journal of
Neuroscience, 10(9), 2897–2902.
Sen, S., Burmeister, M., & Ghosh, D. (2004). Meta-analysis of the association between
a serotonin transporter promoter polymorphism (5-HTTLPR) and anxiety-related
personality traits. American Journal of Medical Genetics B Neuropsychiatric Genetics,
127(1), 85–89.
Shephard, B. (2001). A war of nerves: Soldiers and psychiatrists in the twentieth century.
Cambridge, MA: Harvard University Press.
Silva, R. R., Alpert, M., Munoz, D. M., Singh, S., Matzner, F., & Dummit, S. (2000).
Stress and vulnerability to posttraumatic stress disorder in children and adolescents.
American Journal of Psychiatry, 157(8), 1229–1235.
Simms, L. J., Watson, D., & Doebbeling, B. N. (2002). Confirmatory factor analyses of
posttraumatic stress symptoms in deployed and nondeployed veterans of the Gulf
War. Journal of Abnormal Psychology, 111(4), 637–647.
Slade, T., & Watson, D. (2006). The structure of common DSM-IV and ICD-10 mental
disorders in the Australian general population. Psychological Medicine, 36(11),
1593–1600.
Solomon, Z., Mikulincer, M., & Jakob, B. R. (1987). Exposure to recurrent combat
stress: Combat stress reactions among Israeli soldiers in the Lebanon war.
Psychological Medicine, 17(2), 433–440.
Stein, M. B., Walker, J. R., Hazen, A. L., & Forde, D. R. (1997). Full and partial
posttraumatic stress disorder: Findings from a community survey. American
Storr, C. L., Ialongo, N. S., Anthony, J. C., & Breslau, N. (2007). Childhood antecedents
of exposure to traumatic events and posttraumatic stress disorder. American Journal
of Psychiatry, 164(1), 119–125.
Talbert, F. S., Braswell, L. C., Albrecht, J. W., Hyer, L. A., & Boudewyns, P. A. (1993).
NEO-PI profiles in PTSD as a function of trauma level. Journal of Clinical Psychology,
49(5), 663–669.
Terr, L. C., Bloch, D. A., Michel, B. A., Shi, H., Reinhardt, J. A., & Metayer, S. (1999).
Children’s symptoms in the wake of Challenger: A field study of distant-traumatic
effects and an outline of related conditions. American Journal of Psychiatry, 156(10),
1536–1544.
Watson, D. (2005). Rethinking the mood and anxiety disorders: A quantitative hier-
archical model for DSM-V. Journal of Abnormal Psychology, 114(4), 522–536.
Yehuda, R., & McFarlane, A. C. (1995). Conflict between current knowledge about
posttraumatic stress disorder and its original conceptual basis. American Journal of
Psychiatry, 152(12), 1705–1713.
Yehuda, R., McFarlane, A. C., & Shalev, A. Y. (1998). Predicting the development of
posttraumatic stress disorder from the acute response to a traumatic event. Biological
Psychiatry, 44(12), 1305–1313.
Yehuda, R., Resnick, H. S., Schmeidler, J., Yang, R. K., & Pitman, R. K. (1998).
Predictors of cortisol and 3-methoxy-4-hydroxyphenylglycol responses in the acute
aftermath of rape. Biological Psychiatry, 43(11), 855–859.
Young, A. (1995). The harmony of illusions: Inventing post-traumatic stress disorder.
Princeton, NJ: Princeton University Press.
Young, A. (2001). Our traumatic neurosis and its brain. Science in Context, 14(4), 661–683.
Zammit, S., Allebeck, P., David, A. S., Dalman, C., Hemmingsson, T., Lundberg, I.,
et al. (2004). A longitudinal study of premorbid IQ score and risk of developing
schizophrenia, bipolar disorder, severe depression, and other nonaffective psychoses.
Archives of General Psychiatry, 61(4), 354–360.
Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the inci-
dence of nicotine, alcohol and drug disorders in persons who have experienced
trauma. Archives of General Psychiatry, 60, 289–294.
13
Causal Thinking for Objective Psychiatric

Diagnostic Criteria
A Programmatic Approach in Therapeutic Context
donald f. klein
Introduction: Ascribing Disease and Illness
Terms such as disorder, illness, disease, dysfunction, and deviance embody the pre-
conceptions of historical development (Klein, 1999). That individuals become ill
for no apparent reason, suffering from pain, dizziness, malaise, rash, wasting,
etc., has been known since prehistoric days. The recognition of illness led to the
social definition of the patient and the development of various treatment
institutions (e.g., nursing, medicine, surgery, quacks, and faith healers).
Illness is an involuntary affliction that justifies the sick, dependent role
(Parsons, 1951). That is, because the sick have involuntarily impaired func-
tioning, it is a reasonable social investment to exempt them (at least tem-
porarily) from normal responsibilities. Illness implies that something has
gone wrong. However, gaining exemption from civil or criminal responsibil-
ities is often desired. Therefore, if no objective criteria are available, an illness
claim can be viewed skeptically. By affirming involuntary affliction, diagnosis
immunizes the patient against charges of exploitative parasitism.
Therefore, illness may be considered a hybrid concept, with two compo-
nents: (1) the necessary inference that something has actually, involuntarily,
gone wrong (disease) and (2) the qualification that the result (illness) must be
sufficiently major, according to current social values, to ratify the sickness
exemption role. The latter component is related to the particular historical
stage, cultural traditions, and values. This concept has been exemplified by
the phrase ‘‘harmful dysfunction’’ (Wakefield, 1992).
However, this does not mean that the illness concept is arbitrary since the
inference that something has gone wrong is necessary. Beliefs as to just what
has gone wrong (e.g., demon possession, bad air, bacterial infection) as well
321
as the degree of manifested dysfunction that warrants the sick role reflect the
somewhat independent levels of scientific and social development (for further
reference, see Lewis, 1967).
How can we affirm that something has gone wrong if there is no objective
evidence? The common statistical definition of abnormality simply is
‘‘unusual.’’ Something is abnormal if it is rare. Although biological variability
ensures that someone is at an extreme, there is a strong presumption that
something has gone wrong if sufficiently extreme. For instance, hemoglobin
of 5g/100ml exceeds normal biological variation, indicating that something
has gone wrong. Therefore, infrequency (e.g., dextrocardia) usefully indicates
that something is probably wrong but is not sufficient (e.g., left-handedness)
or necessary (e.g., dental caries). A mysterious shift from well-being to pain
and manifest dysfunction strongly indicates that something has gone wrong.
That such distressing states may remit affirms that somehow repair had
come. What has gone wrong is a deviation from an implicit standard, for-
mulated by the evolutionary theory of adaptive functions and dysfunctions
(Millikan, 1993; Klein, 1993, 1999).
Medical diagnosis was placed on a firmer footing by Sydenham in the
seventeenth century by the concept of syndromes, forms of ill health compar-
able to the types of animal and vegetable species in terms of symptoms and
course, for example, gout and rheumatoid arthritis. A symptom complex was
more than a concatenation of symptoms and signs. It implied some common
latent cause distinct from those supporting ordinary health, even if such
causes were unknown. Kraepelin made use of syndromes in distinguishing
dementia praecox from manic–depressive illness by initially arguing that the
different symptom complexes provided firm prognostic differences. ‘‘Points
of rarity’’ are not necessary to differentiate syndromes and even if a latent
cause is entirely categorical, its manifestations may not evidence bimodality
(Murphy, 1964).
The Search for an Ideal Diagnostic Entity
Since around the middle of the nineteenth century, disease has become
defined by objectively demonstrated etiology and pathology, thanks to crucial
discoveries made by scientists such as Pasteur and Virchow. Causal analysis
allowed diagnostic progression past simple syndromal definition by objec-
tively elucidating necessary etiologies. Evident examples can be found
among infectious diseases and avitaminoses.
This is not the case in psychiatry. Attempting to find objective differences
(biomarkers) between normal subjects and subjects with various psychiatric
syndromes has been the overriding focus of biological psychiatry. Beset by
lack of pathophysiological knowledge, measures of biochemical or physiolo-

gical variables are almost always the result of irrelevant artifactual differences
produced by prior or current treatment or secondary complications (e.g.,
avitaminosis, alcohol abuse, head trauma). Worse, syndromal nonspecificity
regularly occurs, even for familial genetic markers (e.g., catechol O-methyl-
transferase–inefficient alleles), which leads some to argue for syndromal
amalgamation; this falsely implies that these syndromes are offshoots of a
common causal pathophysiology. Thus, it is clear that psychiatric nosology
lags behind general medical nosology. Diagnostic and Statistical Manual of
Mental Disorders, third edition (DSM-III), disorders are syndromes defined by
the consensus of clinical scientists. It can be argued that these polythetic cate-
gories are never simply related to etiology, pathophysiology, or laboratory tests.
Syndromal Heterogeneity
Psychiatric disorders are largely familial. Twin, adoption, and related study
designs indicate that syndromal familiality is largely genetic (or due to gene-
by-environment interactions). Therefore, the tremendous advances in mole-
cular genetics and genomics raised hopes for objective diagnostic genetic
tests, revealed by such methods as linkage and association studies.
Unfortunately, increasing disappointment set in once molecular genetic
research focused on highly familial but non-Mendelian syndromes. Despite
remarkably significant statistical associations found in individual studies,
there have been repeated failures of replication (Riley & Kendler, 2006).
Further, as seen in the example of Huntington disease, gene identification
does not necessarily lead to advances in treatment and/or more complete
knowledge of pathophysiology.
The lack of replicability of many genomic studies as well as the low-
magnitude effect estimates reported highlight the central problem of syndro-
mal heterogeneity (Bodmer, 1981). Smoller and Tsuang (1998, p. 1152) clearly
state the problem and suggest a straightforward solution:
With recent advances in molecular genetics, the rate-limiting step in

identifying susceptibility genes for psychiatric disorders has become
phenotype definition. The success of psychiatric genetics may require
the development of a ‘‘genetic nosology’’ that can classify individuals in
terms of the heritable aspects of psychopathology.
However, this may not be feasible. As Crow (2007, p. 13) states, ‘‘Recent meta-
analyses have not identified consistent sites of linkage. The three largest studies
of schizophrenia fail to agree on a single locus . . . there is no replicable
support for any of the current candidate genes.’’ Although Crow’s remarks
address psychoses, the general nosological applicability is clear. Repeated

failures of replication highlight the difficulty of finding a consistent genetic
etiology for syndromes.
It is doubtful if the fundamentally genocentric nosology proposed by
Smoller and Tsuang (1998) can be achieved. Nonetheless, their argument
that syndromal heterogeneity invalidates genetic linkage and association
studies is entirely reasonable. However, this sound observation has not
dissuaded syndrome linkage genomic efforts.
Replacing Syndromes With Endophenotypes

One recent popular strategy has been to move from syndromes to endophe-
notypes for gene linkage partners. Endophenotypes are held to be superior to
biomarkers by Gottesman and Gould (2003, p. 636):
Endophenotypes, measurable components unseen by the unaided eye

along the pathway between disease and distal genotype . . . represent
simpler clues to genetic underpinnings than the disease syndrome
itself, promoting the view that psychiatric diagnoses can be decom-
posed or deconstructed. . . . In addition to furthering genetic analysis,
endophenotypes can clarify classification and diagnosis.
However, Flint and Munafo (2006) criticize this optimistic assumption in their
detailed meta-analytic review of human and animal data. Endophenotypes
appear to be only on a par with genetic biomarkers, which blunts optimism.
These conclusions as well as prior assumptions of enhanced endopheno-
type utility for genetic analysis are limited by the paucity of studies
specifically addressing differential endophenotypic utility. Unfortunately,
Flint and Munafo’s critique has not, as yet, been widely discussed in the
literature.
Conceptual Problems With Endophenotypes

The use of endophenotypes to circumvent the difficulty of linking genes to
syndromes amounts to dismissing syndromes as meaningful. This presents
conceptual difficulties. For example, Braff, Freedman, Schork, and Gottesman
(2007) describe explicitly deconstructing syndromes into multiple indepen-
dent genetic abnormalities that engender multiple independent functional
abnormalities. However, if psychiatric syndromes are caused by the agglom-
eration of multiple small independent effects of many different genes, which
may or may not be phenotypically evident, as well as multiple interactions
with the fluctuating internal and external environment (Rutter, 2007), it is
surprising that there are any consistently evident symptom complexes at all.
However, certain syndromes (e.g., mania, melancholia, depressive disorder,
obsessive–compulsive disorder, and panic disorder) have been stereotypically
described for centuries, albeit under different labels, in many places and
languages.
Even more cogent, syndromal decomposition into a heap of independent
dysfunctions, each underlying a particular syndromal facet, is inconsistent
with total syndromal remission and recurrence. The observation of surprising
remissions, periods of apparent health, and recurrences was facilitated by the
long-term mental hospitals, where remission and discharge was a notable
event. Further, since there was often only one available hospital, relapses
could be noted. Falret in 1854 (as reprinted in Pichot, 2006, p. 145) described
‘‘Circular insanity [Folie circulaire] . . . characterized by the successive and
regular reproduction of the manic state, the melancholic state, and a more or
less prolonged lucid interval.’’ This description anticipated Kraepelin’s more
inclusive concept of manic–depressive disorder. Both syndromal descriptions
emphasized periods of remission as an essential diagnostic element.
Remissions and relapses occur in many psychiatric and general medical
illnesses (gout, intermittent porphyria, etc.). Since it is highly improbable that
multiple independent causes should concertedly cease, the inference that
complex syndromes have multiple independent causes is implausible.
Those diverse syndromes are recognizable, familial, and extraordinarily dif-
ferent from ordinary health and behavior. This is consonant with Sydenham’s
hypothesis that each syndrome has a common underlying proximal cause,
even if there is no common distal genetic defect. However, the frequent
reliable recognition, since Sydenham, of quite distinct syndromes implies
that multiple small genetic contributions become manifest by taking different
routes to impairing, perhaps in several ways, a distinct evolved function—
which may generate a distinct syndrome (Klein & Stewart, 2004). The argu-
ment is not that all complex psychiatric presentations evidence periods of
total remission; rather, it is logically incorrect to assume that a symptom
complex must be due to a group of independent endophenotypes.
‘‘Comorbidity’’ suggests sequential and/or interactive causal processes.
However, the argument that one aspect of a complex syndrome is likely to
be the direct manifestation of an endophenotype is not logically supported
and unlikely to pay off. For instance, the sudden onset of an apparently
spontaneous panic attack causes an immediate flight to help. With the repeti-
tion of such attacks, chronic anticipatory anxiety often develops. Panic attacks
are often followed by avoidant and dependent measures, misleadingly
referred to as ‘‘agoraphobia.’’
Six weeks of imipramine treatment prevents spontaneous panic attacks.
However, chronic anticipatory anxiety and phobic avoidances remit more
slowly after successful exposure experiences. This sequence of panic attacks,

chronic anticipatory anxiety and phobic avoidance, as well as the relationship
to imipramine treatment and real-life exposure treatment have been analyzed
using structural equation modeling with data from two experimental trials
(Klein, Ross, & Cohen, 1987).
The variation in severity of anticipatory anxiety and phobic avoidance
following panic disorder is clear. The relative independence of panic attack
remissions and anxiety/phobic avoidance remissions indicates multiple pro-
cesses. Perhaps independent functions regulating chronic anxiety and/or
behavioral avoidance are rendered dysfunctional by recurrent panics.
Therefore, genetic linkage studies of familial panic disorders, whose pro-
bands are largely agoraphobic, may be linking numerous distal contributors
to several proximal dysfunctions. This may account for the largely inconsis-
tent and negative results of such studies (Fyer et al., 2006).
Brain Imaging and Pathophysiology

It is also hoped that the remarkable development of structural and functional
brain imaging may bring about nosological improvement by objectively
demonstrating specific brain dysfunctions. However, these technical triumphs
present substantial practical and inferential problems.
First, the statistical problems generated by relatively few individual sub-
jects, each providing an enormous number of nonindependent data points,
are both formidable and controversial. Thirion et al. (2007) suggest that 20–
27 subjects is the minimum sample size necessary for sensitive, reproducible
analyses of functional magnetic resonance imaging (fMRI) data. Studies of
this magnitude are quite rare because of daunting practical problems of
recruitment, screening, and expense. Therefore, the necessary positive repli-
cations are exceptional.
Second, the detection of a signal of increased perfusion is usually inter-
preted as increased brain activity; however, a distinction among neuronal
firing, synaptic but nonspiking activity, and dendrite pile or glial activity
remains unclear. The time window of functional imaging is much longer
than the time interval of neuronal processes, while the spatial resolution is
too coarse to delineate neuronal centers. Using complex, large anatomical
structures to identify distinct brain processes implies a doubtful uniformity
of function at micro levels.
Third, whether the functional implications of increased perfusion are of
brain excitation or inhibition (or other processes) has not yet been deter-
mined. An increase in brain activity might amount to stepping on either a
brake or an accelerator (or another unknown process). There is no necessary
parallel between increases in brain function and increases in psychological or
physiological functioning. Such relationships are hypotheses to be tested.

However, the correlational, naturalistic setting impairs causal assertions.
Finally, there are overriding inferential issues besides these technical and
developmental problems. The outpouring of studies comparing syndromes
with each other and with normal subjects has the same design structure as
comparative biomarker studies. Therefore, the same problems are present as
in biomarker studies (e.g., syndromal heterogeneity, artifact contamination,
causal ambiguity, and lack of specificity).
One justification for the nosological relevance of brain imaging argues
that techniques such as fMRI combined with activation paradigms identify
‘‘brain circuits recruited during specific processes . . . in healthy people. . . .
Understanding these processes in healthy people is a prerequisite for advan-
cing research in psychiatric disorders where these capacities are affected’’
(Kupfer, First, & Regier, 2002, p. 46). This sequence, often referred to as
‘‘translational research,’’ assumes that basic research delineating normal phy-
siology is a ‘‘prerequisite’’ for useful clinical research and improved care.
Therapeutics as a Guide to Pathophysiology
History provides many examples of how practice precedes and enables theory.
Artificial selection by culling unwanted hereditary traits and inbreeding
desired traits was essential to Darwin’s formulation of evolution by natural
selection. Remarkably, studies of pathology led to the discovery of unsus-
pected normal functions. Clinical studies of scurvy, beriberi, and pellagra
led to treatment with nutritional supplements, discovery of specific vitamins,
and discovery of enzymatic cofactors. The serendipitous observation of
cowpox-induced immunity to smallpox led to vaccination, while the study
of beverage contamination led to pasteurization. Germ theory, bacteriology,
immunology, and other evolved mechanisms of resistance to infection fol-
lowed. This list could be extended indefinitely. Empirical therapies often
illuminate dysfunctions, thus bringing unknown normal functions into sight.
Clinical Psychopharmacology and Pathophysiology

The psychopharmacological revolution was a clear case of unpredictable ther-
apeutic advance well before relevant pathophysiological knowledge. All major
psychotropic drug discoveries occurred when surprising clinical benefits were
serendipitously observed during treatment for other purposes.
The basic finding in the 1960s that psychotropic agents blocked both
neuronal receptors and synaptic neurotransmitter reuptake led to a continu-
ing explosion of discovery regarding neurotransmitters, synapses, receptors,
and neural transmission. It was apparent, however, that inhibiting reuptake

or receptor blockade may not be therapeutically important since some effec-
tive drugs did not show this prerequisite (McGrath & Klein, 1983). In any
case, since reuptake inhibition and receptor blockage are almost immediate
effects, they could only be the first dominoes, while remissions took weeks to
appear. Further, benefit ranged from negligible to remarkable, once again
indicating syndromal heterogeneity.
Chlorpromazine was a safe, presurgical antihistamine sedative whose anti-
psychotic properties were completely unsuspected. The antipsychotic action
of chlorpromazine contradicted the conventional psychogenic wisdom of psy-
chiatry and was greeted with profound skepticism, if not open derision. The
clinical trial by random assignment to concurrent placebo and putative treat-
ment groups was slowly adopted only after the late 1940s as a scientifically
necessary part of general pharmacology. Its rapid acceptance into psychophar-
macological trials in the 1950s, amplified by double-blind precautions, was in
part to deal with frequent claims that the evidence for pharmacotherapeutic
benefit was invalid or due to ‘‘chemical straight jackets.’’
The initial focus was to test the specificity and the somewhat ambiguously
defined efficacy of the medication, that is, its activity. Statistical superiority of
the randomized, double-blind evaluated medication under test to the parallel
placebo group, as measured by average outcome scale scores, established
specific drug activity. The 1962 Kefauver-Harris Food and Drug
Administration (FDA) amendment required the demonstration of acute
drug efficacy prior to marketing, without any stipulation concerning effect
size, translation into clinical benefit, determination of who would benefit, or
systematic attention to long-term maintenance of benefit or late-onset toxi-
cities. The pursuit of short-term statistical superiority to placebo became
industry’s search for the Holy Grail of marketability.
This effort incurred several persisting ambiguities. If 60% of those treated
with medication had substantial scale-measured improvements, while only
30% of those on placebo did so (assuming statistical significance), then the
drug was not causal for about half of those who seemed to have a direct drug
benefit. Identifying the individuals for whom specific beneficial drug action
occurred remained obscure. Therefore, attempts to determine how a drug
brought about specific benefits by studying those who got better while receiv-
ing the drug were handicapped by actually studying a causally heterogeneous
mixture.
Within this phenotypically homogeneous syndrome, suppose that the
medication benefit occurred in only about 30%. Was response variation due
to some irrelevant differences in drug pharmacokinetics, or might those 30%
have a different cause for their proximal pathophysiology? Since even an
apparently common syndrome can be induced by an array of possible
saboteurs, response to a particular medication may well depend on particular

causal paths.
An example is diabetes. Recognized in antique times as a syndrome of
polydipsia and polyuria, the sweetness of the urine distinguished diabetes
mellitus from diabetes insipidus. More recently, these subsyndromes could be
objectively distinguished by urinary sugar tests. Urinary sugar was then
found to be due to abnormally high blood sugar. In psychiatry, such
simple objective diagnostic facts are lacking. The attempt to understand the
causes of illness by studying ultimate distal genetic causes has faltered.
While genetic approaches to causal identification in psychiatry have fal-
tered, the observation that major psychotropic agents can specifically induce
remissions suggests an experimental approach to proximal pathophysiological
processes. However, this inference requires evidence that medications directly
ameliorate proximal pathophysiology, rather than yield benefit by nonspecific
compensation.
For example, gastric acidity is reduced by calcium carbonate to approxi-
mately normal in patients with hyperacidity, thus decreasing symptoms.
However, it also decreases gastric acidity below normal in subjects without
hyperacidity. This action is compensatory rather than ameliorative of the
hyperacidity’s cause. Conversely, aspirin substantially decreases fever but
does not lower normal body temperature. Consequently, it is considered an
antipyretic rather than a poikilothermic. Its benefit comes from its direct
effect on a particular febrile defense system. Therefore, normal homeothermy
is not affected.
Therefore, to distinguish compensatory drug benefits from those that
affect underlying pathophysiology, the study of drug effects on normal sub-
jects is telling. For instance, it has been known since the 1940s that amphe-
tamines are stimulants, increasing feelings of vigor, elation, arousal, and
positive mood in normals. One might reasonably expect that amphetamines
should benefit major clinical syndromal depression. However, clinical experi-
ence and placebo-controlled trials (Satel & Nelson, 1989) indicated no such
benefit.
It was noted that depressed mood, as occurs in the medically ill, often
improved with stimulants. This suggests that such depressed feelings lack
continuity with syndromal depression. When imipramine was reported to
benefit severe syndromal depression, it was assumed by those without
direct experience that it was a superstimulant. It was astonishing that
there was no resemblance to stimulants. Severely depressed patients slept
and ate better, and sudden remissions (often described by patients with
such phrases as ‘‘the veil has lifted’’) occurred only after weeks of treatment.
This observation led to unsystematic attempts to evaluate whether imipra-
mine elevated the mood of normal subjects, but these pilot efforts did not
show mood elevation. A properly controlled trial of chronic administration of

clinically appropriate antidepressant doses to normal subjects was not feasi-
ble given their unpleasant anticholinergic side effects. It was stated that a
battalion of Danish soldiers went through such a study without showing
mood effects, but these findings were not published.
Rapoport et al. (1980) studied enuresis. A placebo-controlled trial of lower
doses of tricyclics in enuretic, nondepressed children demonstrated enuresis
benefit but no effect on mood. In a 4-week placebo-controlled trial of 20mg
of the selective serotonin reuptake inhibitor (SSRI) paroxetine, normal sub-
jects showed no evidence of mood elevation (Knutson et al., 1999). Similarly,
Adly, Straumanis, and Chesson (1992), in a placebo-controlled trial of fluox-
etine, demonstrated benefit for migraine but no antidepressant effect. While
other studies of antidepressants have found utility in pain management and
cigarette cessation, among other benefits, induced elevated mood has not
been reported in any subject lacking a prior affective disorder.
These findings suggest that amphetamine’s effect on depressive feelings is
compensatory rather than due to interaction with an underlying affective
pathophysiology. Conversely, the mood effects of antidepressants required
the presence of pathophysiology, thus obviating mood effects in normals.
Given the complex pharmacodynamics, other effects in normal subjects,
such as the fast onset of decreased irritability, are not surprising but are
not associated with mood elevation.
Other studies of major psychotropic drugs, given for periods approximat-
ing those of clinical trials, have also failed to show that the remarkable
cognitive and affective effects manifest in patients have parallels in normal
subjects. These include studies of antipsychotic drugs (Dimascio, Havens, &
Klerman, 1963a, 1963b; de Visser, van der Post, Pieters, Cohen, & van
Gerven, 2001), lithium (Judd, Hubbard, Janowsky, Huey, & Attewell, 1979),
and a wide body of literature on SSRIs (Pace-Schott et al., 2001; Loubinoux
et al., 2005; Dumont, de Visser, Cohen, & van Gerven, 2005). Monoamine
oxidase inhibitors have been used to treat angina pectoris, but reports of
these clinical treatments similarly do not note any mood shifts (although
they have been noted in the treatment of chronic tuberculosis patients).
The syndromal depression status of the tuberculosis patients, however, is
unknown.
It is particularly telling that Rosenzweig et al. (2002, p. 10) found that
studies using standard psychometric tests ‘‘carried out in schizophrenic
patients have failed to demonstrate any consistent effects of typical or atypical
neuroleptics on psychomotor or cognitive functions.’’ Despite this finding,
these agents have major clinical benefits for patients regarding both psycho-
motor and cognitive symptomatology. This implies that their specific thera-
peutic effect is due to normalizing one or more pathophysiological processes
that are not present in normal individuals, rather than by an extension of

effects discernible in normal individuals.
All of these observations, although not entirely definitive, consistently
point to a lack of parallelism between the clinical benefits of major psycho-
tropic drugs in patients and their effects on normal subjects. This is con-
sonant with the view that these agents’ specific benefit is due to normalizing
pathophysiology rather than some nonspecific compensatory action, as may
be the case with sedatives and stimulants. A telling observation is that there
is no illicit street-market demand for the major psychotropic agents, but there
is a substantial demand for stimulants and sedatives.
The Import of Remission

That the benefits of major psychopharmacological agents are tightly tied to
their normalizing effects on underlying pathophysiology is also supported by
their remarkably specific effects in certain syndromes, for example, retarded
unipolar depressions, manic states, angry hyperactive paranoid states, panic
disorder, and psychotic states approximating bipolar disorder. Under drug
treatment, these syndromes frequently show a complete restitution to
normal premorbid status. Strikingly, the natural history of each of these
syndromes is also marked by episodes of apparently complete spontaneous
remission, a parallel that can hardly be accidental.
Specific drug-induced remission in these syndromes may be due to a
normalizing interaction with episodically dysfunctional cybernetic feedback
controls (e.g., reversible decreased negative feedback or pathologically
induced positive feedback) that engender remitting syndromes (Klein,
1964a, 1988; Klein, Gittelman, Quitkin, & Rifkin, 1980). Patients who benefit
from specific pharmacotherapy to remit may share a common pathophysiol-
ogy that differs from those refractory to such treatment as well as those who
respond during placebo treatment.
Reducing syndrome heterogeneity by relating baseline and historical char-
acteristics to specific treatment outcomes may yield useful practical clues to
treatment choice and the delineation of clinically meaningful subsyndromes,
as well as heuristic clues to the causation of pathophysiology. This is elabo-
rated in the following section.
The Vicissitudes of Pharmacological Dissection
There were early statistical attempts using multiple regression analysis to

contrast the effects of different drugs from each other and placebo so as to
identify subgroups with distinctive response patterns. These regularly failed
on replication. The pharmacological dissection approach has been most suc-

cessful when applied to hypothesized subsyndromes, when a distinctive ther-
apeutic response had been clinically (serendipitously) noted. Validation was
followed by controlled treatment trials—for example, panic disorder (distin-
guished from other anxiety disorders [Klein, 1964b]), agoraphobia (distin-
guished from other phobias [Zitrin, Klein, & Woerner, 1978]), atypical
depression (distinguished from other major depressions [Quitkin et al.,
1989]), and schizophrenia with childhood asociality (distinguished from schi-
zophrenic patients with superior antipsychotic benefit [Klein, 1967]).
Notably, these subsyndromal distinctions were powerfully reinforced by
differences in premorbid and early course. The frequent history of severe
childhood separation anxiety disorder in hospitalized adult patients with
panic disorder and agoraphobia prompted successful controlled clinical
trials of imipramine in such children. With the development of the SSRIs,
this therapeutic approach received widespread clinical acceptance, although
the controlled evidence is not uniformly positive.
However, studies reducing syndromal heterogeneity by pharmacological
dissection failed to thrive as the two major funding sources turned away.
The National Institute of Mental Health (NIMH) abandoned support for
placebo-controlled studies of marketed drugs, arguing that this was the
proper province of industry. The NIMH and academia were to focus on
basic processes. Unfortunately, industry did not pursue such diagnostically
informative studies since a statistically significant benefit indicated by an
average scale outcome difference between drug and placebo was the efficacy
requirement needed for the primary goal of FDA marketing approval. The
broadest possible diagnostic indication allows the broadest marketing. Since
profitability would drop by narrowing a broad syndrome to a subsyndrome,
this is not a priority for industry.
The unintended effect of the NIMH completely allocating placebo-
controlled trials of marketed agents to industry was to prevent the develop-
ment of clinical psychopharmacological approaches to improving nosology
and elucidating pathophysiology.
Splitting and Lumping

A distinction should be made between pharmacological dissection and phar-
macological amalgamation. Dissection is not a high-level inference. If tricyc-
lic antidepressants do not specifically benefit early-onset chronic atypical
depression but instead specifically benefit late-onset periodic atypical depres-
sion, then there are likely to be different pathophysiologies underlying these
symptomatically identical disorders (Stewart, McGrath, Quitkin, & Klein,
2007).
Because both enuresis and melancholia responded to imipramine, some

thought that enuresis was covert depression, an example of amalgamation.
Rapoport et al. (1980) and Jorgensen, Lober, Christiansen, and Gram (1980)
effectively dispelled this notion by noting that, in their enuretic samples,
there was no clinical depression or psychotropic response to imipramine.
The anti-enuretic effects were almost immediate, and the dose and blood
level necessary were far smaller than those required for depression. Since
imipramine had a variety of distinct pharmacodynamic effects, pharmacolo-
gical amalgamation is a questionable but testable hypothesis. In any case, it
is logically and factually irrelevant to the validity of pharmacological dissec-
tion. Although the clinically useful subsyndromes discerned by pharmacolo-
gical dissection have stimulated hypotheses about specific adaptive
dysfunctions, they do not thereby provide objective diagnostic criteria.
Refining Remission Heterogeneity

Among patients who remit during drug treatment, it is still not known if
they benefit from a specific pharmacological action or just get better. Chassan
(1967) addressed how one can tell whether a treatment intervention actually
worked in an individual patient. He recommended ‘‘intensive design,’’ that
is, repeating periods of intervening and nonintervening and evaluating
whether the benefit synchronized with the intervention. This would be an
alternative clinical trials design since if psychotropic drugs are discontinued
immediately after remission, relapse rates are usually high. It is only among
those whose specific benefit requires medication that double-blind placebo
substitution should incur relapse. Therefore, the alternative design is to initi-
ally openly treat all patients with the medication under study, titrating for the
individual’s optimal dose, until it is clear if the patient shows such a minor
response that he or she could not be a treatment responder. These patients
would leave the trial. Responders would be maintained on medication for a
period but then randomly assigned, double-blind, to either placebo or con-
tinued medication. All patients would be closely followed for defined signs of
worsening. A worsening rate higher in the placebo-substituted group than
the medication-maintained group would be evidence of medication efficacy.
Medication retreatment would be indicated for patients exhibiting signs of
worsening. Those who worsened on placebo and then improved on medica-
tion retreatment are likely specific drug responders. Those who switched to
placebo but continued to do well would be less likely to be specific medica-
tion responders. By sequentially repeating this process, nonspecific and spe-
cific responders would be progressively identified. Even without repetition,
specific drug responder identification would be substantially enhanced. This
design would define individuals who were very likely medication-specific
responders, likely nonspecific responders, and nonresponders. One concern

might be that this design would fail if a drug were curative; in that case,
worsening on placebo substitution in those remitted during drug treatment
should not occur. (We would be grateful for such a design failure.) Other
practical benefits are that all patients initially receive active treatment, which
will foster recruitment since often patients will not risk being assigned to
placebo. Also, patients will learn whether medication is necessary for them to
remit. This design has been used successfully (McGrath et al., 2000).
Combining Objective Measures With Intensive Design

Modifying the usual clinical trial by randomized, double-blind placebo sub-
stitution in putative responders (intensive design) allows for the isolation of a
clinically meaningful, experimentally defined subsyndrome (i.e., individuals
who require specific medication for both remission and relapse prevention).
Past experience indicates that medications that specifically induce remission,
when continued, also prevent relapse. This again affirms that their specific
therapeutic activity occurs by normalizing a dysfunction. Further, extending
the intensive clinical trial design by including objective baseline measures
can allow the isolation of objective diagnostic criteria for this subsyndrome.
Also, finding objective dependent outcome measures (in those who require
medication to improve) renders blinding unnecessary.
Heuristically, if such objective measures also normalize during the course
of syndrome remission, then they must be an integral part of the causation of
dysfunction rather than simply a correlate. This suggests that embedding
objective measures within intensive clinical trials of already known specific
therapeutic agents would allow for the discovery of objective, clinically rele-
vant, psychiatric diagnostic criteria. Even better, this experimental approach
can isolate causation both of dysfunction and of the specific medication
under study. Note that this requires a long-term programmatic approach
that substantially differs from and exceeds current National Institutes of
Health (NIH) road maps or DSM-V discussions.
Conclusion
Since the current American Psychiatric Association (APA) DSM is primarily a

diagnostic manual for practitioners, the threshold for including objective
findings should depend on clearly demonstrated practical value related to
differential diagnosis. This suggested approach to the objective investigation
of the pathophysiologies manifested as psychiatric syndromes requires
expensive, long-term, programmatic support. It is not likely to affect any
DSM for quite a while. The large difficulty is that neither NIH nor industry
nor the APA DSM process supports such study designs, especially of
marketed medications.
Achieving the necessary long-term support may depend on the realization
that current genomic and brain-imaging efforts are unlikely to succeed in
resolving nosological ambiguities because syndrome and genetic heterogene-
ity defeats group contrast and correlative studies. Our suggestion is to sub-
stantially diminish heterogeneity by objectively identifying specific
pharmacotherapeutic responders through intensive design. We argue that
major psychotropic drug effects depend on normalization of proximal patho-
physiology. Objective predictors of specific medication remission that are
also specifically treatment-responsive must be causally relevant to the under-
lying pathophysiology; these predictors are central clues to both pathophysiol-
ogy and drug response. Finally, studying known effective agents hastens
this goal. This is worth emphasis as it affords a strong basis for program-
matic support. Such specific objective signs would improve psychiatric differ-
ential diagnosis beyond both the current clinical consensus and biomarker
approaches.
References
Adly, C., Straumanis, J., & Chesson, A. (1992). Fluoxetine prophylaxis of migraine.
Headache, 32, 101–104.
Bodmer, W. F. (1981). Gene clusters, genome organization and complex phenotypes.
When the sequence is known, what will it mean? American Journal of Human
Genetics, 33, 664–682.
Braff, D. L., Freedman, R., Schork, N. J., & Gottesman, I. I. (2007). Deconstructing
schizophrenia: An overview of the use of endophenotypes in order to understand a
complex disorder. Schizophrenia Bulletin, 33, 21–32.
Chassan, J. B. (1967). Research design in clinical psychology and psychiatry. New York:
Appleton-Century-Crofts.
Crow, T. J. (2007). How and why genetic linkage has not solved the problem of
psychosis: Review and hypothesis. American Journal of Psychiatry, 30, 13–21.
de Visser, S. J., van der Post, J., Pieters, M. S., Cohen, A. F, & van Gerven, J. M.
(2001). Biomarkers for the effects of antipsychotic drugs in healthy volunteers.
British Journal of Clinical Pharmacology, 51, 119–132.
Dimascio, A., Havens, L. L., & Klerman, G. L. (1963a). The psychopharmacology of
phenothiazine compounds: A comparative study of the effects of chlorpromazine,
promethazine, trifluoperazine and perphenazine in normal males. I. Introduction,
aims and methods. Journal of Nervous and Mental Disease, 136, 15–28.
Dimascio, A., Havens, L. L., & Klerman, G. L. (1963b). The psychopharmacology of
phenothiazine compounds: A comparative study of the effects of chlorpromazine,
promethazine, trifluoperazine, and perphenazine in normal males. II. Results and
discussion. Journal of Nervous and Mental Disease, 136, 168–186.
Dumont, G. J., de Visser, S. J., Cohen, A. F., & van Gerven, J. M.; Biomarker Working
Group of the German Association for Applied Human Pharmacology. (2005).
Biomarkers for the effects of selective serotonin reuptake inhibitors (SSRIs) in
healthy subjects. British Journal of Clinical Pharmacology, 59, 495–510.
Flint, J., & Munafo, M. R. (2006). The endophenotype concept in psychiatric genetics.
Psychological Medicine, 37, 163–180.
Fyer, A. J., Hamilton, S. P., Durner, M., Haghighi, F., Heiman, G. A., Costa, R., et al.
(2006). A third-pass genome scan in panic disorder: Evidence for multiple suscept-
ibility loci. Biological Psychiatry, 60(4), 388–401.
Gottesman, I. I., & Gould, T, D. (2003). The endophenotype concept in psychiatry:
Etymology and strategic intentions. American Journal of Psychiatry, 160, 636–645.
Jorgensen, O. S., Lober, M., Christiansen, J., & Gram, L. F. (1980). Plasma concentra-
tion and clinical effect in imipramine treatment of childhood enuresis. Clinical
Pharmacokinetics, 5, 386–393.
Judd, L. L., Hubbard, B., Janowsky, D. S., Huey, L. Y., & Attewell, P. A. (1979). The
effect of lithium carbonate on affect, mood, and personality of normal subjects.
Archives of General Psychiatry, 36, 860–866.
Klein, D. F. (1964a). Behavioral effects of imipramine and phenothiazines:
Implications for a psychiatric pathogenic theory and theory of drug action. In J.
Wortis (Ed.), Recent advances in biological psychiatry (Vol. VII, pp. 273–287). New
York: Plenum Press.
Klein, D. F. (1964b). Delineation of two drug-responsive anxiety syndromes.
Psychopharmacologia, 5, 397–408.
Klein, D. F. (1967). Importance of psychiatric diagnosis in prediction of clinical drug
effects. Archives of General Psychiatry, 16(1), 118–126.
Klein, D. F. (1978). A proposed definition of mental illness. In R. Spitzer, D. F. Klein
(Eds.), Critical Issues in Psychiatric Diagnosis (pp. 41–71). New York: Raven Press.
Klein, D. F. (1988). Cybernetics, activation, and drug effects. Acta Psychiatrica
Scandinavica Supplementum, 341, 126–137.
Klein, D. F. (1993). False suffocation alarms, spontaneous panics, and related condi-
tions; an integrative hypothesis. Archives of General Psychiatry, 50, 306–317.
Klein, D. F. (1999). Harmful dysfunction, disorder, disease, illness, and evolution.
Journal of Abnormal Psychology, 108, 421–429.
Klein, D. F., Gittelman, R., Quitkin, F., & Rifkin, A. (Eds.). (1980). Diagnosis and drug
treatment of psychiatric disorders: Adults and children (2nd ed.). Baltimore, MD:
Williams & Wilkins.
Klein, D. F., Ross, D. C., & Cohen, P. (1987). Panic and avoidance in agoraphobia:
Application of PATH analysis to treatment studies. Archives of General Psychiatry,
44(3), 377–385.
Klein, D. F., & Stewart, J. (2004). Genes and environment: Nosology and psychiatry.
Neurotoxicity Research, 6(1), 11–15.
Knutson, B., Wolkowitz, O. M., Cole, S. W., Chan, T., Moore, E. A., Johnson, R. C.,
et al. (1999). Selective alteration of personality and social behavior by serotonergic
intervention. American Journal of Psychiatry, 155, 373–379.
Kupfer, D. J., First, M. B., & Regier, D. A. (Eds.). (2002). A research agenda for DSM-V.
Washington DC: American Psychiatric Association.
Lewis, Aubrey (1967). The state of psychiatry: essays and addresses. London: Routledge
and Kegan Paul.
Loubinoux, I., Tombari, D., Pariente, J., Gerdelat-Mas, A., Franceries, X., Cassol, E.,
et al. (2005). Modulation of behavior and cortical motor activity in healthy subjects
by a chronic administration of a serotonin enhancer. NeuroImage, 27, 299–313.
McGrath, P. J., & Klein, D. F. (1983). Heuristically important mood altering drugs. In
J. Angst (Ed.), The origins of depression: current concepts and approaches (pp. 331–349).
New York: Springer-Verlag.
McGrath, P. J., Stewart, J. W., Petkova, E., Quitkin, F. M., Amsterdam, J. D., Fawcett,
J., et al. (2000). Predictors of relapse during fluoxetine continuation or maintenance
treatment of major depression. Journal of Clinical Psychiatry, 61, 518–524.
Meehl, P. E. (1990). Appraising and amending theories: The strategy of lakatosian
defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.
Meehl, P. E. (1992). Factors and taxa, traits and types, difference of degree and differ-
ences in kind. Journal of Personality, 60, 117–174.
Murphy, E. A. (1964). One cause? Many causes? The argument from the bimodal
distribution. J. chronic Dis., 17, 301–324.
Pace-Schott, E. F., Gersh, T., Silvestri, R., Stickgold, R., Salzman, C., & Hobson, A. J.
(2001). SSRI treatment suppresses dream recall frequency but increases subjective
dream intensity in normal subjects. Sleep Research, 10, 129–142.
Parsons, T. (1951). The social system. New York: Free Press.
Pichot, P. (2006). Tracing the origins of bipolar disorder: From Falret to SM-IV and
ICD-10. Journal of Affective Disorders, 96, 145–148.
Preter, M., & Klein, D. F. (2008). Panic, suffocation false alarms, separation anxiety
and endogenous opioids. Progress in Neuropsychopharmacology and Biological
Quitkin, F. M., McGrath, P. J., Stewart, J. W., Harrison, W., Wager, S. G., Nunes, E.,
et al. (1989). Phenelzine and imipramine in mood reactive depressives: Further
delineation of the syndrome of atypical depression. Archives of General Psychiatry,
46(9), 787–793.
Rapoport, J. L., Mikkelsen, E. J., Zavadil, A., Nee, L., Gruenau, C., Mendelson, W.,
et al. (1980). Childhood enuresis. II. Psychopathology, tricyclic concentration in
plasma, and antienuretic effect. Archives of General Psychiatry, 37, 1146–1152.
Riley, B., & Kendler, K. S. (2006). Molecular genetic studies of schizophrenia. European
Journal of Human Genetics, 14, 669–680.
Rosenzweig, P., Canal, M., Patat, A., Bergougnnan, L., Zieleniuk, I., & Bianchetti, G.
(2002). A review of the pharmacokinetics, tolerability and pharmacodynamics of
amisulpride in healthy volunteers. Human Psychopharmacology, 17, 1–13.
Rutter, M. (2007). Gene–environment interdependence. Developmental Science, 10, 12–18.
Satel, S. L., & Nelson, J. C. (1989). Stimulants in the treatment of depression: A critical
overview. Journal of Clinical Psychiatry, 50, 241–249.
Smoller, J. W., & Tsuang, M. T. (1998). Panic and phobic anxiety: Defining phenotypes
for genetic studies. American Journal of Psychiatry, 155, 1152–1162.
Stewart, J. W., McGrath, P. J., Quitkin, F. M., & Klein, D. F. (2007). Atypical depres-
sion: Current status and relevance to melancholia. Acta Psychiatrica Scandinavica
Supplementum, 433, 58–71.
Thirion, B., Pinel, P., Meriaux, S., Roche, A., Dehaene, S., & Poline, J. (2007). Analysis
of a large fMRI cohort: Statistical and methodological issues for group analyses.
NeuroImage, 35, 105–120.
Wakefield, J. C. (1992). Disorder as harmful dysfunction: A conceptual critique of
DSM-III-R’s definition of mental disorder. Psychological Review, 99(2), 232–247.
Zitrin, C. M., Klein, D. F., & Woerner, M. G. (1978). Behavior therapy, supportive
psychotherapy, imipramine, and phobias. Archives of General Psychiatry, 35(3),
307–316.
14
The Need for Dimensional Approaches in Discerning

the Origins of Psychopathology
robert f. krueger and daniel goldman
Introduction
The 2008 meeting of the American Psychopathological Association was

framed by a very challenging topic: causality. Indeed, setting aside any pos-
sible application in understanding psychopathology, causality is a deep
concept—a fact that has kept philosophers gainfully employed for some
time now. One thing is clear, however, at least in the behavioral sciences:
If one wants to make credible causal claims, it helps to be able to directly
manipulate the variables of interest. Indeed, some would go so far as to say
that causality cannot be inferred without this kind of experimental manip-
ulation. Through manipulation, one can systematically vary a variable of
interest, while holding others constant, including the observational condi-
tions. Consider, for example, how this is conveyed to new students in the
behavioral sciences in a very useful text by Stanovich (2007). Stanovich (2007)
first reviews the classic observation that simply knowing that two things
(A and B) tend to occur together more often than one would expect by chance
(a correlation) is not enough evidence to conclude that those two things have
some sort of causal relationship (e.g., A causes B). To really claim that A causes
B, ‘‘the investigator manipulates the variable hypothesized to be the cause
and looks for an effect on the variable hypothesized to be the effect while
holding all other variables constant by control and randomization’’ (p. 102).
The implications of this experimental perspective on causality for psycho-
pathology research are readily apparent: The situation is nearly hopeless, at
least in terms of getting at the original, antecedent, distal causes of psycho-
pathology. It is axiomatically unethical to manipulate variables to enhance the
likelihood of psychopathology; we cannot directly manipulate things to create
psychopathology in persons who do not already suffer from psychopathology.
This is not to say that, once psychopathology is present, experimental designs
338
14 The Need for Dimensional Approaches 339
are not fundamentally helpful in understanding the mechanisms underlying

its expression. Indeed, the discipline of experimental psychopathology is
founded on this premise, involving comparisons of the behaviors of persons
with psychopathology and persons without psychopathology under precisely
controlled conditions. Still, this approach discerns the mechanisms that are
disturbed in psychopathology once it is present, as opposed to the reasons
that those mechanisms became disturbed in the first place. In addition,
preventive interventions designed to remove putative causes of psychopathol-
ogy may be tested through experimental manipulations, but many major
putative causal factors cannot be removed in this manner in humans
(e.g., genes).
Given this state of affairs, can we ever figure out ‘‘what causes psycho-
pathology’’? In this chapter, we will argue that much can be learned about
the origins and nature of psychopathology without worrying too much about
claims of causality per se. Although much progress has been made in expli-
cating tractable models for causal inference (see Chapters 3, 4 and 11;
Pearl, 2000), our current thinking is that straightforward claims of causality
(e.g., A causes disorder B) are unlikely to map empirical realities in most of
psychopathology research. Moreover, this common-language understanding
of the word ‘‘cause,’’ as implying a two-variable system of cause and effect,
is almost certainly how the word would be interpreted if one were daring
enough to use it in reference to conditions that antedate the development of
psychopathology. This is in spite of the fact that statisticians and philoso-
phers acknowledge that causality needs to be understood in terms of
multiple, probabilistic causes (Holland, 1986).
Most of psychopathology is probably too multifactorial to be reducible to
straightforward, two-variable statements of cause and effect. For example,
aggregate genetic influences on any form of psychopathology summarize
the effects of numerous individual polymorphisms. Even if each of these
polymorphisms could be individually manipulated in humans while holding
the others constant (allowing causality in the experimental sense to be eval-
uated), each of these relationships is probabilistic and part of a larger system,
as opposed to deterministic and part of a two-variable (one gene causes one
disorder) system (Kendler, 2005). This multivariable situation at the genomic
level then intersects with the ways in which genetic effects are contingent on
environmental moderators (Rutter, 2006), thereby further reducing the
veridicality of straightforward causal claims.
Fortunately, considerations of the complexity of the mechanisms that are
likely to underlie psychopathology do not render it empirically intractable.
The process of discovering circumstances where psychopathology is more or
less likely is empirically tractable, and this endeavor can be enhanced by
evolving our conceptualization of psychopathology from an ‘‘either/or’’
issue to a matter of degree. In particular, if we cannot directly induce psy-

chopathology in people, we are left with naturally occurring patterns of cov-
ariation as bedrock empirical observations for our science. Our argument is
that these patterns will be more clearly revealed, and their implications for
ameliorating psychopathology will be more readily understood, if we evolve
our field to think of psychopathology as a matter of degree as opposed to a
matter of kind. We also argue that, rather than thinking in terms of specific
bivariate relations (e.g., gene A causes disorder B), it will be more generative
to embrace the multivariate complexity of both psychopathology and its
antecedents and work to model this complexity directly.
In the course of this chapter, we will make this argument by first reiter-
ating something that has been known for some time but that bears repeating:
If one wants to accurately discern how strongly two things are related, it is
helpful for statistical power to conceive of both things as continuous, dimen-
sional variables, as opposed to discrete variables (Cohen, 1983). We will then
turn from a discussion of statistical power to a discussion of the conundrums
that have emerged from psychopathology research using polythetic categori-
cal concepts: the interrelated problems of within-category heterogeneity and
comorbidity among categories. Putting together the statistical advantages of
dimensions with the conceptual conundrums generated by polythetic cate-
gories emphasizes the need to move toward a novel dimensional approach to
classifying psychopathology.
With this in mind, we will argue that the current zeitgeist is also optimal
for a smooth transition from more categorical to more dimensional ways of
thinking about psychopathology. We will then briefly describe some empirical
research on psychopathology from a dimensional perspective, ways of inte-
grating this research into the upcoming Diagnostic and Statistical Manual of
Mental Disorders, fifth edition (DSM-5), and ways of enhancing the DSM-5
with dimensional concepts to frame the collection of future data. We will
conclude with a discussion of how dimensional conceptualization is relevant
not only to psychopathology itself but also to conceptualization of the etiology
of psychopathology.
Enhancing the Power of Psychopathology

Research With Dimensions
In scientific disciplines where direct manipulation of critical variables is

unethical or infeasible (e.g., psychopathology research, astrophysics), patterns
of covariation take on fundamental importance. For example, physicists
cannot induce or manipulate the formation of distant celestial bodies,
but they can observe the behavior of such bodies using tools like telescopes.
They can then apply formal mathematics to model the resulting data, thereby
instantiating scientific theories in empirical data. The situation is very similar
in psychopathology research. We cannot induce psychopathology in human
beings, but given a group of persons who differ in their psychopathology
status, we can see if psychopathology status covaries with other variables
(e.g., test performance, physiology, genes, family history, developmental
antecedents).
The definition of psychopathology status is a fundamental and historically
vexing issue. For much of the history of our discipline, such definitions were
highly chaotic because different investigators meant different things when
they used the same label. The solution to this problem has been to imple-
ment a system that solves this definitional problem by providing consensus
definitions that draw on the opinions of diverse experts. This is the diagnostic
system that originated with DSM-III and has continued forward in much the
same form in DSM-III-R and DSM-IV.
The modern DSMs have been indispensable in psychopathology research
because, to a large extent, we know what other researchers mean when they
say they are studying a specific DSM diagnosis. Nevertheless, the modern
DSMs embody an important assumption, namely, that all mental disorders
are an either/or matter. Each diagnostic construct described in recent DSMs
is a polythetic category. That is, for each diagnosis, multiple criteria are listed,
and a certain combination of criteria indicates membership in a category of
mental disorder, whereas not having those criteria indicates membership in
the complementary nondisordered group.
The practical needs for dichotomous categorical psychopathology labels
(e.g., for third-party payment purposes) have been acknowledged elsewhere
(First, 2005; Krueger & Markon, 2006b). However, if our goal is scientific—to
understand the origins and nature of psychopathology—dimensional psycho-
pathology constructs are indispensable (Helzer, Kramer, & Krueger, 2006). A
fundamental reason for this is that dichotomous variables (e.g., presence vs.
absence of a mental disorder) contain less information than variables that can
take on more values (e.g., how much a research participant resembles a
mental disorder prototype on a multipoint scale) (Kraemer, Noda, &
O’Hara, 2004; MacCallum, Zhang, Preacher, & Rucker, 2002). This means
that many more research participants are needed to discern the correlates of
a dichotomous psychopathology construct, as opposed to a continuous psy-
chopathology construct. The literal ‘‘costs’’ of dichotomization can be quite
profound, if one thinks of the problem in terms of the finite amount of
money available for research on psychopathology. We can therefore achieve
greater research traction with less money by using dimensional constructs
because we do not need as many research participants to ask key research
questions.
Limitations of Polythetic Categories: Heterogeneity and

Comorbidity
In addition to limitations of a more statistical nature, other conceptual pro-

blems have emerged from trying to use polythetic-categorical psychopathol-
ogy constructs in research. The first of these problems is within-category
heterogeneity. For specific diagnostic categories, the modern DSMs list symp-
toms that define the categories, and various combinations of symptoms are
equally legitimate indicators of membership in the category. For example,
DSM-IV defined obsessive–compulsive personality disorder (OCPD) as con-
sisting of eight symptoms, with four of these symptoms needing to be pre-
sent to meet the criteria for diagnosis. As a result, two different persons can
both legitimately meet the criteria for OCPD in spite of having four entirely
different symptoms. Nevertheless, a group of persons meeting the criteria
for OCPD is meant to be interpreted as a homogenous group, in the sense
that they have a single disorder that presumably has a coherent etiology,
course, etc.
A related problem is comorbidity, or the tendency for putatively distinct
categorical disorders to co-occur more than one would expect by chance.
Comorbidity among DSM-defined mental disorders is extensive and is not
limited to pairs of disorders. Indeed, the phenomenon is probably better
thought of as ‘‘multimorbidity’’ in the sense that two-variable patterns of
co-occurrence do not capture the extent of it. That is, persons meeting the
criteria for three or more diagnoses are not uncommon, and they carry much
of the social burden (e.g., diminished educational and occupational attain-
ment) of mental illness compared with persons who meet the criteria for only
one diagnosis (Krueger & Markon, 2006a). The resultant conceptual problem
for psychopathology research is clear. Much psychopathology research is
organized around single DSM-defined categories, but if the persons whom
we most need to understand are ‘‘multimorbid’’ and not well-conceptualized
in terms of single categories, then the typical research strategy is not identi-
fying the persons who actually carry the major social burden of mental
illness.
The Zeitgeist: The Time Is Right for Dimensions in

Psychopathology Research
In sum, there are at least three major limitations of categorical psychopathol-

ogy constructs: (1) reduced statistical power, (2) heterogeneity within cate-
gories, and (3) comorbidity among categories. We are neither the first to
identify these problems nor the first to suggest that they signal the need
for new dimensional alternatives to categorical conceptions of psychopathol-

ogy (see Kupfer, 2005). The challenge is how to incorporate dimensional
concepts into psychopathology research without returning to the nosological
chaos that reigned before the modern DSMs were created. In our view, there
are two interrelated pathways to making psychopathology research more
dimensional. First, dimensional research on psychopathology can proceed
by drawing on, but not being wedded to, existing DSM constructs, and this
work could eventuate in an empirically based dimensional nosology. Second,
the DSM per se can be enhanced with dimensional concepts, thereby provid-
ing a uniform platform for further research. We will discuss both of these
pathways in turn.
Examples of Existing Dimensional Research
There are a number of recent examples of programmatic research on psy-

chopathology from a dimensional perspective, and this work can be roughly
divided into three themes: (1) research comparing dimensional and catego-
rical accounts of psychopathology, (2) research modeling comorbidity among
modern DSM constructs dimensionally, and (3) research modeling symptoms
within psychopathological categories dimensionally.
Comparing Dimensional and Categorical Models

Much of the historical discussion surrounding categorical and dimensional
accounts of psychopathology has been framed in terms of practical considera-
tions and disciplinary matters (e.g., clinicians need category labels to com-
municate, physicians are historically accustomed to using categorical
diagnostic concepts), but it is now possible to move this discussion into a
more empirical arena. Specifically, there are ways of asking if data are more
supportive of a specific model of psychopathology as it occurs in nature, as
well as ways of inquiring about hybrid conceptualizations involving both
categories and dimensions.
Seminal work in this area was pursued by Paul Meehl in his program of
research aimed at developing what he termed ‘‘taxometric’’ methods (Meehl,
1992). Taxometric methods are ways of using data to ask if the variables in the
data are indicative of a nonarbitrary and coherent category of persons. If such
a category can be discerned, this group is referred to as a ‘‘taxon.’’ These
methods have become popular in psychopathology research, and a recent
special section of the Journal of Abnormal Psychology was devoted to discuss-
ing these methods and their application (Cole, 2004). Although taxometric
methods have certain methodological strengths (e.g., the use of multiple
independent procedures or ‘‘multiple epistemic pathways’’ to converge on a

taxonic conjecture), they have evolved to some extent outside the realm of
mainstream statistics; some limitations of these approaches may be traced to
this separate evolutionary path. For example, rather than formally parameter-
izing the dimensional alternative to the taxonic model as a comparison
model, a dimensional structure is assumed when a taxonic model does not
provide a good account of the data.
Other approaches have therefore been pursued that originate more within
the domain of traditional statistical inquiry, with its focus on explicit model
parameterization and using data derived from samples to derive inferences
about the situation in the broader population. Our recent work in this area
has focused on ways of comparing categorical and dimensional accounts of
the latent structure of a series of measured indicators of psychopathology via
the comparison of latent class and latent trait models. In a latent class model,
the observed data reflect the existence of a series of mutually exclusive groups
of persons. For example, a set of signs and symptoms might delineate dis-
tinct groups of persons characterized by specific and distinct profiles on the
measured indicators. In contrast, in a latent trait model, the same signs and
symptoms delineate an underlying dimension, such that people are charac-
terized by their position along that dimension, as opposed to being charac-
terized by their membership in discrete groups. Recent methodological work
has shown how these models can often be distinguished by their differential fit
to the same data (Lubke & Neale, 2006; Markon & Krueger, 2006). In addition,
we have applied these modeling comparisons to show that comorbidity among
DSM-defined disorders involving substance dependence and antisocial behavior
indicate a dimensionally organized ‘‘externalizing spectrum,’’ as opposed to
membership in discrete classes of mental disorder (Krueger, Markon, Patrick,
& Iacono, 2005; Markon & Krueger, 2005).
Muthén (2006) has discussed related modeling developments and articu-
lated models that represent hybrids of latent class and latent trait models. In
these hybrid models, both categorical and continuous latent variables are
accommodated, providing flexibility in terms of thinking of both categorical
and continuous aspects of the organization of mental disorders. The general
point is that categorical and continuous conceptions need no longer be adju-
dicated based on a priori preferences. Data can be brought to bear directly on
these issues.
Modeling Comorbidity Among Existing Constructs

Dimensionally
As described earlier, comorbidity has presented a vexing problem in psycho-
pathology research. One approach to the problem is to attempt to unravel the
meaning of comorbidity by fitting explicit quantitative models to comorbidity

data. Krueger and Markon (2006a) reviewed research in this vein and con-
cluded that the existing literature supports a liability-spectrum model of comor-
bidity among DSM-defined mental disorders commonly observed in the
population. In this model, comorbidity is neither an artifact nor a nuisance
but, instead, a predictable consequence of the involvement of shared liability
factors in multiple disorders. These shared factors are well-conceptualized as
continuous dimensions of personality that confer risk for the development of
psychopathology. In particular, a personality style characterized by emotional
instability confers risk for a broad internalizing spectrum of unipolar mood
and anxiety disorders; when this style is also accompanied by disinhibition,
there is elevated risk for a broad externalizing spectrum of substance-use and
antisocial-behavior disorders (Krueger, 2005).
This research has mostly been confined to disorders that are prevalent in
the general, community-dwelling population because epidemiological data
provide the needed diversity of assessed constructs and sample sizes. The
approach could be extended to samples (e.g., outpatient psychiatric) with a
greater density of other kinds of psychopathology, and other putative spectra
would likely be delineated as a result (e.g., a psychosis spectrum) (Wolf et al.,
1988).
Modeling Symptoms Within Existing Constructs

Dimensionally
Although the DSM conceptualizes all mental disorders as categories, many
instruments for assessing DSM constructs allow for the collection of symptom-
level information. Data on symptoms can be analyzed in a dimensional
manner by using statistical techniques designed to model relationships
between symptoms and an underlying dimension; these are the aforemen-
tioned latent trait models, which are also known as item response theory
(IRT) models. The alcoholism literature is one place where this approach has
been used extensively in the last few years, although it has also been fruitful
in studying symptoms within other constructs (e.g., unipolar depression)
(Aggen, Neale, & Kendler, 2005). Earlier work on alcohol problems from a
categorical perspective (e.g., using latent class models) (Heath et al., 1994)
revealed latent groups that represented gradations on an underlying conti-
nuum. That is, the groups tended not to have unique profiles (e.g., a group
with primarily one type of symptom as opposed to another) but, rather, were
characterized by increasing probabilities of all types of symptoms. As a
result, investigators in this literature have recently turned to IRT models
because such models are well-suited to exactly this kind of dimensional
latent structure.
In typical IRT models, symptoms of psychopathology are modeled in

terms of two parameters: the strength of the relationship between the symp-
tom and the underlying dimension of psychopathology and the place along
the dimension where the symptom is most relevant (a concept that can be
understood as the severity of the symptom). These sorts of IRT models have
yielded some basic insights about the nature of alcohol problems. First, the
relatively good fit of IRT models indicates that alcohol problems are well
conceptualized as lying along a dimension, with the presence of more
severe symptoms (e.g., medical complications) indicating a higher probability
that less severe symptoms (e.g., heavy use) are also present (Krueger et al.,
2004; Saha, Chou, & Grant, 2006). Second, the arrangement of abuse and
dependence symptoms across the alcohol-problems dimension does not align
with the severity arrangement suggested in the DSM; some dependence
symptoms can be quite mild, whereas some abuse symptoms can be quite
severe (Langenbucher et al., 2004). These two basic insights provide an
important impetus for changing the diagnostic criteria for alcohol diagnoses
in DSM-5 to better map a continuum of severity as it occurs in nature
(see Martin, Chung, Kirisci, & Langenbucher, 2006).
The utility of IRT models in alcoholism research suggests that such
models might be useful in studying diverse psychopathological concepts.
This is an empirical question that can be addressed by fitting these models
to other data and interpreting the results. Such an endeavor might result in
the realization that IRT models are not optimal for all psychopathological
domains. This does not mean, however, that we should abandon dimensional
concepts or statistical modeling as a means of instantiating theory in data.
Rather, what might be needed are more subtle models, perhaps containing
both categorical and dimensional elements (Muthén, 2006). Consider, for
example, the possibility of uncovering a categorical distinction between
having no psychopathological symptoms and having at least some symptoms
but a dimensional distinction within those persons who manifest at least one
symptom. One interpretation of these modeling results is that, in nature,
there exists a cusp—a categorical distinction between zero and at least one
symptom. Drawing from this conceptualization, diagnostic criteria need to be
optimized for two purposes: (1) distinguishing between people above or
below the cusp and (2) distinguishing severity among cases or persons
beyond the cusp.
Another interpretation, however, is that the symptom list is incomplete;
the cusp is therefore an illusion created by not assessing symptoms that
occur below the cusp. Because DSM conceptualizations of psychopathology
have historically derived from clinical conceptualizations of psychopathology,
milder symptoms may be omitted simply because they have not traditionally
been observed in clinical settings. Nevertheless, such symptoms may be of
importance to society; ‘‘subclinical’’ psychopathology has very real public-

health consequences (Horwath, Johnson, Klerman, & Weissman, 1994).
Given this interpretation, the next step in the scientific process is to identify
and assess symptoms less severe than those traditionally recognized in clin-
ical settings and to see if they lie along the same dimension as more severe
symptoms. The overall point is that conceptions of psychopathology are now
amenable to a program of empirical research involving close links between
methodological and substantive developments. Sorting out conceptual possi-
bilities and working them through in data via the application of formal
statistical models will tell us a great deal about the underlying nature of
psychopathology.
Augmenting DSM-5 With Dimensions
As can be seen from the foregoing sections, there is now a nontrivial corpus
of research on psychopathology from a dimensional perspective. This seems
particularly remarkable given the exclusively categorical nature of mental
disorders as defined in the modern DSMs. In recognition of this burgeoning
dimensional literature, the American Psychiatric Institute for Research and
Education (APIRE) organized a meeting in July 2006 to discuss a research
agenda for contemplating the inclusion of dimensions throughout the
upcoming DSM-5 (Helzer et al., 2008). Although the primary sources
should be consulted to understand the numerous ideas discussed at the
meeting, a general consensus was that the DSM-5 could benefit from the
explicit inclusion of dimensional elements in many areas of psychopathology.
Enhancing Future Inquiry: Separable Research and

Official Nosology Streams
Although there is enthusiasm for dimensional concepts in the classification

of psychopathology, many areas of psychopathology have not been extensively
studied from a dimensional perspective due to the traditional categorical
focus of the DSM. Moreover, categorical concepts have their place in any
applied nosology, particularly for practical clinical purposes, such as provid-
ing descriptive labels to facilitate third-party payment. Hence, conversion of
the DSM to an entirely dimensional system in the course of one revision is
likely infeasible and may not be entirely desirable. With these considerations
in mind, a dual-track strategy might be pursued. First, research on dimen-
sional approaches can continue to flourish, separate from the DSM per se.
Second, the DSM can be enhanced with dimensional concepts. In areas with
a rich history of dimensional research (e.g., personality disorders) (Widiger,

Simonsen, Krueger, Livesley, & Verheul, 2005), this enhancement process will
likely be more straightforward. For other areas, the process of dimensional
augmentation might take longer, necessitating research that may not be
directly based on categorical DSM concepts. Logically speaking, these separ-
able streams—a research stream on dimensions of psychopathology and the
DSM nosology per se—will intertwine. The important point in distinguishing
the streams, however, is that the DSM need not be seen as a barrier to
dimensionally oriented research, even in areas where practical consideration
or a lack of relevant literature results in a more traditional categorical-
polythetic classification scheme.
Etiologic Factors: Toward a Structural Perspective to

Dovetail With the Phenotypic Structure of
Psychopathology
We have argued that psychopathology research can benefit from incorporat-

ing dimensional constructs more extensively. Dimensionality extends to
diverse aspects of a comprehensive nosology. Symptoms within diagnoses
can be understood as indicators of underlying dimensions, and the arrange-
ment of diagnostic concepts can be explored empirically by thinking of diag-
noses as lying within dimensionally organized spectrums. In taking this
approach, some shortcomings of a purely categorical nosology can be over-
come. Statistical power is enhanced through the use of dimensional con-
structs. Within-category heterogeneity can be dealt with by isolating the
correlates of dimensions in models that take into account the structural
organization of psychopathology. For example, the unique correlates of a
specific narrow syndrome (e.g., unipolar depression) can be identified by
taking into account, or holding constant, the variance shared between that
syndrome and neighboring syndromes within specific spectrums (e.g., anxi-
ety disorders, understood as closely related dimensions within a broader
spectrum of internalizing psychopathology). Comorbidity can be dealt with
by modeling the natural tendencies for disorders to co-occur within broader
spectrums of variation.
So far, we have discussed only the internal, nosological aspects of this kind
of dimensional-hierarchical perspective on psychopathology. We leave the
reader with some thoughts about the flipside of this perspective. That is,
could a dimensional-hierarchical perspective also benefit the way we think
about the causes of psychopathology (or at least its antecedents and corre-
lates)? Often, putative causal factors have been studied and framed individu-
ally and thought of as dichotomous. For example, Kendler (2005) described
the history of the idea that there are ‘‘specific genes for specific psychopathol-
ogies,’’ with its implication that causal genes will operate in a straightforward
Mendelian manner (e.g., there is one relevant gene, and it has two forms,
mutated/disease-causing and nonmutated, and the etiologic effect of the
mutated form is insensitive to environmental inputs). Although there are
some human neuropsychiatric diseases where the etiology can be understood
in this way (e.g., Huntington disease), Kendler (2005) concluded that genetic
effects on most psychopathological conditions are not likely to be this
straightforward. Rather, genetic effects on psychopathology are likely smaller,
many genes are likely relevant, and these genes are likely sensitive to
environmental inputs.
As with the complexity of psychopathological phenotypes, this etiologic
complexity can also be usefully parsed using dimensional-structural
approaches. Indeed, the dimensional structure of etiologic factors may resem-
ble the structure of psychopathology itself, a finding that breaks down the
conceptual barrier between ‘‘cause’’ (or etiology) and ‘‘effect’’ (psychopatho-
logical phenotypes). To pick one example, twin research on the externalizing
spectrum shows that the genetic effects on individual DSM disorders invol-
ving antisocial behavior and substance dependence are largely (but not exclu-
sively) in common, and this common genetic risk can be well-modeled as a
dimension (Krueger et al., 2002; Kendler, Prescott, Myers, & Neale, 2003;
Young, Stallings, Corley, Krauter, & Hewitt, 2000). This genetic risk dimen-
sion represents the effects of numerous individual genes that increase the
probability of psychopathology in concert; as such, it provides a compelling
target for identifying specific genetic polymorphisms that increase the risk
for psychopathology. This strategy appears to have greater traction for identi-
fying relevant polymorphisms when compared with a strategy aimed at
detecting putatively separate and dichotomous genetic effects on putatively
separate and dichotomous externalizing disorders (see Dick, 2007). The gen-
eral point is that dimensional thinking can extend usefully beyond nosology
to also encompass thinking about etiology. We look forward to seeing if a
dimensional perspective can get us closer to understanding not only what
psychopathology is but also where it comes from.
References
Aggen, S. H., Neale, M. C., & Kendler, K. S. (2005). DSM criteria for major depression:
Evaluating symptom patterns using latent-trait item response models. Psychological
Medicine, 35, 475–487.
Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7,
249–253.
Cole, D. A. (2004). Taxometrics in psychopathology research: An introduction to some

of the procedures and related methodological issues. Journal of Abnormal Psychology,
113, 3–9.
Dick, D. M. (2007). Identification of Genes Influencing a Spectrum of Externalizing
Psychopathology. Current Directions in Psychological Science, 16(6), 331–335.
First, M. B. (2005). Clinical utility: A prerequisite for the adoption of a dimensional
approach in DSM. Journal of Abnormal Psychology, 114, 560–564.
Heath, A. C., Bucholz, K. K., Slutske, W. S., Madden, P. A. F., Dinwiddie, S. H.,
Dunne, M. P., et al. (1994). The assessment of alcoholism in surveys of the general
community: What are we measuring? Some insights from the Australian twin panel
interview survey. International Review of Psychiatry, 6, 295–307.
Helzer, J. E., Kraemer, H. C., & Krueger, R. F. (2006). The feasibility and need for
dimensional psychiatric diagnoses. Psychological Medicine, 36, 1671–1680.
Helzer, J. E., Kraemer, H. C., Krueger, R. F., Wittchen, H-U., Sirovatka, P. J., & Regier,
D. A. (Eds.). (2008). Dimensional approaches in diagnostic classification: Refining the
research agenda for DSM-5. Arlington, VA: American Psychiatric Association.
Holland, P .W. (1986). Statistics and causal inference. Journal of the American Statistical
Horwath, E., Johnson, J., Klerman, G. L., & Weissman, M. M. (1994). What are the
public health implications of subclinical depressive symptoms? Psychiatric Quarterly,
65, 323–337.
Kendler, K. S. (2005). "A Gene for . . .": The nature of gene action in psychiatric
disorders. American Journal of Psychiatry, 162, 1243–1252.
Kendler, K. S., Prescott, C., Myers, J., & Neale, M. C. (2003). The structure of genetic
and environmental risk factors for common psychiatric and substance use disorders
in men and women. Archives of General Psychiatry, 60, 929–937.
Kraemer, H. C., Noda, A., & O’Hara, R. (2004). Categorical versus dimensional
approaches to diagnosis: Methodological challenges. Journal of Psychiatric Research,
38, 17–25.
Krueger, R. F. (2005). Continuity of axes I and II: Toward a unified model of person-
ality, personality disorders, and clinical disorders. Journal of Personality Disorders, 19,
233–261.
Krueger, R. F., Hicks, B. M., Patrick, C. J., Carlson, S. R., Iacono, W. G., & McGue, M.
(2002). Etiologic connections among substance dependence, antisocial behavior, and
personality: Modeling the externalizing spectrum. Journal of Abnormal Psychology,
111, 411–424.
Krueger, R. F., & Markon, K. E. (2006a). Reinterpreting comorbidity: A model-based
approach to understanding and classifying psychopathology. Annual Review of
Clinical Psychology, 2, 111–133.
Krueger, R. F., & Markon, K. E. (2006b). Understanding psychopathology: Melding
behavior genetics, personality, and quantitative psychology to develop an empirically-
based model. Current Directions in Psychological Science, 15, 113–117.
Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing
psychopathology in adulthood: A dimensional-spectrum conceptualization and its
implications for DSM-5. Journal of Abnormal Psychology, 114, 537–550.
Krueger, R. F., Nichol, P. E., Hicks, B. M., Markon, K. E., Patrick, C. J., Iacono, W. G.,
et al. (2004). Using latent trait modeling to conceptualize an alcohol problems
continuum. Psychological Assessment, 16, 107–119.
Kupfer, D. J. (2005). Dimensional models for research and diagnosis: A current

dilemma. Journal of Abnormal Psychology, 114, 557–559.
Langenbucher, J. W., Labouvie, E., Martin, C. S., Sanjuan, P. M., Bavly, L., Kirisci, L.,
et al. (2004). An application of item response theory analysis to alcohol, cannabis,
and cocaine criteria in DSM-IV. Journal of Abnormal Psychology, 113, 72–80.
Lubke, G. H., & Neale, M. C. (2006). Distinguishing between latent classes and con-
tinuous factors: Resolution by maximum likelihood? Multivariate Behavioral
Research, 41, 499–532.
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice
of dichotomization of quantitative variables. Psychological Methods, 7, 19–40.
Markon, K. E., & Krueger, R. F. (2005). Categorical and continuous models of liability
to externalizing disorders: A direct comparison in NESARC. Archives of General
Psychiatry, 62, 1352–1359.
Markon, K. E., & Krueger, R. F. (2006). Information-theoretic latent distribution mod-
eling: Distinguishing discrete and continuous latent variable models. Psychological
Methods, 11, 228–243.
Martin, C. S., Chung, T., Kirisci, L., & Langenbucher, J. W. (2006). Item response
theory analysis of diagnostic criteria for alcohol and cannabis use disorders in
adolescents: Implications for DSM-5. Journal of Abnormal Psychology, 115, 807–814.
Meehl, P. E. (1992). Factors and taxa, traits and types, differences in degree and
differences in kind. Journal of Personality, 60, 117–174.
Muthén, B. (2006). Should substance use disorders be considered as categorical or
dimensional? Addiction, 101(Suppl. 1), 6–16.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge
University Press.
Rutter, M. (2006). Genes and behavior: Nature–nurture interplay explained. Malden, MA:
Blackwell.
Saha, T. D., Chou, S. P., & Grant, B. F. (2006). Toward an alcohol use disorder
continuum using item response theory: Results from the National Epidemiologic
Survey on Alcohol and Related Conditions. Psychological Medicine, 36, 931–941.
Stanovich, K. E. (2007). How to think straight about psychology (8th ed.). Boston:
Pearson.
Widiger, T. A., Simonsen, E., Krueger, R., Livesley, J. W., & Verheul, R. (2005).
Personality disorder research agenda for the DSM-5. Journal of Personality
Disorders, 19, 315–338.
Wolf, A. W., Schubert, D. S., Patterson, M. B., Grande, T. P., Brocco, K. J., &
Pendleton, L. (1988). Associations among major psychiatric diagnoses. Journal of
Consulting and Clinical Psychology, 56, 292–294.
Young, S. E., Stallings, M. C., Corley, R. P., Krauter, K. S., & Hewitt, J. K. (2000).
Genetic and environmental influences on behavioral disinhibition. American Journal
of Medical Genetics, 96, 684–695.
Index
Note: page numbers followed by ‘‘f ’’ and ‘‘t’’ denote figures and tables, respectively.
Abnormality, 322 alcohol consumption and, 219, 220f,

Adaptive treatment strategies 221t, 222f, 234–35, 235f, 236f
defined, 179 Algebra, 60–61
development of. See Sequential multiple Alleles, 254
assignment trial Allelic heterogeneity, 255. See also
research questions to refine, 181–82, Heterogeneity
182t, 184t Alzheimer’s disease
Adly, C., 330 nonsteroidal anti-inflammatory
Aerodigestive cancer, alcohol dependence drugs for, 281
and, 231f American Cancer Society, 212
Agnostic causal model, 103–4, 117. See also American Psychiatric Association, 297
Causal thinking; Directed acyclic American Psychiatric Institute for Research
graph and Education (APIRE), 347
associated with directed acyclic graphs, American Psychopathological
123–24 Association, 338
Agoraphobia, 325 Amphetamine, 330
best-fitting meta-analysis model for, Anderson, G. L., 79, 83
312t, 313t Angold, A., 13, 279
factor loadings of, 313t Angrist, J., 60
Alcohol dependence Animal studies
aerodigestive cancer and, 231f meiotic randomization in, 239–40
and ALDH2 genotype influences Mendelian randomization in, 237.
diseases, 219, 220f, 221t, 222f, 234–35, See also Mendelian randomization
235f, 236f Anticipatory anxiety, 326
best-fitting meta-analysis model for, Antipsychotic drugs, 330
312t, 313t Antisocial behavior, 344
and coronary heart disease, 216, 218 genetic effects on, 349
early drinking and, 70–72, 71f Antisocial personality disorder
and esophageal cancer, 219, 223f best-fitting meta-analysis model for,
factor loadings of, 313t 312t, 313t
prevalence of, 70 factor loadings of, 313t
PTSD and, 310 Ascherio, A., 207
ALDH2 genotype influences diseases Asparouhov, T., 164
353
354 Index
Associational versus causal concept. Bounds, on counterfactual probability,

See Causal versus associational 153–54
concept Braff, D. L., 324
Associative selection bias, 215. See also Bias Brain imaging, 326–27
Attenuation by errors. See Regression Breslau, N., 297, 301–3, 305, 306, 310, 311
dilution bias Breslow, N. E., 280
Autism-spectrum disorders (ASDs) British Women’s Heart and Health Study
deletions on the X-chromosome and, (BWHHS), 209–10
266–67 Brown, G. W., 281
functional variants in NLGN4 and, Brown, H., 177
267, 268
genetics of, 265–70
homozygous mutation in CNTNAP2, Campbell, D., 289
267, 268 Campbell, D. T., 27, 37, 41, 43, 44
and rare variant:common disease Canalization, 236
hypothesis, 268–70 Candidate gene association studies, 261,
Average causal effect(s), 29, 30. See also 262. See also Genetic variation
Causal effect Categorical psychopathology. See also
from experiments, 5–6, 6f Psychopathology
nonadherence impact on, 6 comparison with dimensional
Avin, C., 108, 125, 140–43 psychopathology, 343–44
research modeling symptoms within,
345–47
Baba, T., 220, 221 Cauley, J. A., 83
Back-door criterion, 54–56, 55f. See also Causal analysis, 4
Directed acyclic graph average causal effects from
Baron, R. M., 16, 18, 21 experiments, 5–6, 6f
Bayesian information criterion (BIC), 164 confounding, 9–12
Berkson’s bias, 212. See also Bias clinical trials, 6–7
Best, S. R., 304 innovative designs and analyses for
Best-fitting meta-analysis model improving causal inferences, 8–9
of DSM-IV disorders, 313f nonexperimental observational
path diagram for, 312f studies, 6f, 8
Bias temporal patterns of, 12–15, 14f
associative selection, 215 Causal Bayesian networks, 52n5
Berkson’s, 212 Causal effect, 51–54
regression dilution, 213 average, 5–6, 6f, 29, 30
selection, 25 context dependency, 36–37
Bilineal inheritance, 259 defined, 4–5, 28
Bindewald, B., 207 estimation, confounding and, 54–57
Bipolar disorder mediation of, 15–21, 16f, 17f, 19f
best-fitting meta-analysis model for, moderation of, 21
312t, 313t narrowly defined outcomes, 36
factor loadings of, 313t natural confounding effects
Bladder cancer, smoking and, 231–32, 232t on, 35–36
Index 355
of treatment, 28–29 Clinical Antipsychotic Trials of

true, 30, 35, 37 Intervention Effectiveness
Causal explanation, 27, 41–43 (CATIE), 183
construct validity, 41–42 Clinical psychopharmacology
external validity, 42–43 normalizing effects on pathophysiology,
Causal identification, 27, 37–41, 38f, 39t 328–31
Causal inference Clinical trials, 6–7
analyses for improving, 8–9 Cognitive ability, 9
fundamental problem of, 5, 15 Cognitive behavior therapy (CBT), 15–21
Causal manipulation, 27, 43–44 Cohen, J., 187, 193
Causal relations, mathematics of, 47–62 Colditz, G. A., 207
associational versus causal concept, Cole, D. A., 20
48–51 Combat stress reaction (CSR), 306
graphical models, 54–57 Common variant:common disease model,
nonparametric structural equations, 256–57, 262. See also Genetic variation
51–53 Comorbidity, 325, 342, 348. See also
symbiosis between counterfactual and Multimorbidity
graphical methods, 57–61 among modern DSM constructs
Causal thinking dimensionally, research
in developmental disorders, 279–92 modeling, 344–45
for objective psychiatric diagnostic Conditional ignorability, 59
criteria, 321–35 Conditional probability, 310
in psychiatry, 66–78 Conduct disorder and drug dependence,
Causal versus associational concept 72–73
coping with change, 48 Confounding, 6f, 8, 25, 286
formulating, 48–49 analytic approaches to, 9–12
ramification of, 49–50 bias, 49
untested assumptions and new and causal effect estimation, 54–57
notation, 50–51 control of, 56–57
Causation of genotype–environmentally modifiable
deductive-nomological approach to, risk factor–disease associations,
73–74 233–35
interventionist model, 73–78 natural, 35–36
mechanistic model approach to, 74 within Mendelian randomization,
reverse, 210, 211 investigating reintroduced, 234–35,
Chassan, J. B., 333 235f, 236f
Chen, L., 222, 235, 236 Construct validity, 41–42. See also Validity
Chesson, A., 330 Context dependency, 36–37
Cheverud, J. M., 215 Controlled direct effects (CDEs), 119–20.
Childhood disintegrative disorder, 265 See also Direct effect
Chlebowski, R. T., 83 Cook, T. D., 8, 27, 37, 41, 43, 44, 289
Chlorpromazine, 328 Coping with change, 48
Cholesterol–coronary heart disease Copy number variant (CNV), 254–55.
relationship, 221–24 See also Genetic variation
Christiansen, J., 333 detection, cytogenetics and, 264–65
356 Index
Coronary heart disease (CHD) quasi-experiments for, 286–89

alcohol consumption and, 216, 218 timing of risk exposure, 280–82
cholesterol and, 221–22 Developmental stability, 235–38
C-reactive protein and, 224 Diabetes, 329
hormone-replacement therapy for, 79, Diagnostic and Statistical Manual of Mental
206–13, 207–9f, 210–12t Disorders (DSM), 71, 265, 297, 323,
Correns, C., 217 340, 347
Costello, E. J., 8–9, 12, 13, 279 Didelez, V., 123–24
Co-twin–control method, 68–73, 69f, 71f Dimensions, use of, in psychopathology
Counterfactual, 25, 29, 51–54, 103–4 research, 338–49
closest-world, 54 augmentation of, 347
identifiability of causal contrasts in, 104 comorbidity among categories, 342
independence of, 104–5, 111–15, 146, etiological factors, 348–49
148, 152, 153, 155 future inquiry, 347–48
structural definition of, 53–54 heterogeneity within categories, 342
Covariate selection, 54–56, 55f reduced statistical power, 340–41
C-reactive protein (CRP) research comparing dimensional and
and coronary heart disease, 224 categorical accounts of
Crossing over, 258 psychopathology, 343–44
Crow, T. J., 323 research modeling comorbidity among
Curb, J. D., 83 modern DSM constructs
Cytogenetics dimensionally, 344–45
CNV detection and, 264–65 research modeling symptoms within
psychopathological categories
dimensionally, 345–47
Data and Safety Monitoring Board, 82 Direct effect
Davey Smith, G., 9, 206, 210, 222, controlled, 119–20
223, 235, 236 identification of, 103
Davis, G. C., 301 principal stratum, 120–22
Dawid, A. P., 105, 107, 118, 119, 123–24, pure (natural). See Pure direct effect
131, 135, 147 Directed acyclic graph (DAG), 10, 108–10,
Day, N. E., 280 108f. See also Graphs
Deductive-nomological approach to causal model associated with, 123–24
causation, 73–74 pure direct effect, identification
Depression, 42, 159–65, 329–30 of, 122–23
major, 72, 74–75, 298, 305–7, 309–11, DNA. See also Genetic variation
312t, 313t, 314 coding, 253
Development, defined, 279–80 noncoding, 253
Developmental disorders, causal thinking Dodd, K. W., 208
in, 279–92 Dose–response relationship, 282
approaches to data analysis for, 291 Drug dependence, 344
normal development and, 283–85 adaptive treatment strategy for, 179–82,
normal experiments for, 289–90 180f, 182t
prevention trials as natural experiments, best-fitting meta-analysis model for,
290–91 312t, 313t
Index 357
conduct disorder and, 72–73 Fitness

factor loadings of, 313t impact of genetic variation on, 255
genetic effects on, 349 Flint, J., 324
PTSD and, 310, 311 Fluorescence in situ hybridization, 264
Dysthymia Fong, G. T., 19
factor loadings of, 313t Food and Drug Administration (FDA),
best-fitting meta-analysis model for, 312t, 80–81, 328
313t Frangakis, C. E., 4, 7, 121
Freedland, K. E., 15
Freedman, R., 324
Effect of treatment on the treated (ETT), Functional magnetic resonance imaging
118–19, 144–49, 147f, 148f (fMRI), 326, 327
Eisenberg, L., 279
Endogenous variable, 52
Endophenotypes Gastric acidity, 329
conceptual problems with, 324–26 Gene by environment interactions, 237
replacing syndromes with, 324 Mendelian randomization and, 230–32.
Enuresis, 333 See also Mendelian randomization
Ervin, B., 207 Geneletti, S., 107, 119, 123–24, 135
Esophageal cancer Generalized anxiety disorder
alcohol consumption and, 219, 223f best-fitting meta-analysis model for, 312t,
Exchangeability, 30, 32, 33, 35, 40, 106 313t
Exogenous variable, 52 factor loadings of, 313t
Expectation-maximization algorithm, 164 Genes
Exposure propensity, 216–20, 220–23f and fine mapping, 258, 267
External validity, 42–43. See also Validity multifunction of, 233–34
Genetic approaches to causal
identification, 329
Færgeman, O., 222 syndromal heterogeneity, 323–24
Familial hypercholesterolemia Genetic association, 260–64
and coronary heart disease, 221–24 candidate, 261
Fetal origins hypothesis, 284 genomewide, 261–62
Finest Fully Randomized Causally Genetic buffering, 237
Interpretable Structured Tree Graph Genetic epidemiology
(FFRCISTG), 105, 110–11. See also Mendelian randomization and, 241–43.
Graphical causal models See also Mendelian randomization
bounds on pure direct effect under, Genetic redundancy, 236
153–54 Genetic variation, 217–18, 252, 253–56
counterfactual independence common variant:common disease
conditions for, 113 model, 256–57
data-generating process leading copy number variation, 254–55
to, 149–53, 150f, 152f, 153f between gDNA and phenotypes, 67
interventions restricted to a impact on fitness, 255
subset, 155–56 rare variant:common disease
path specific effects, 140–41 hypothesis, 257
358 Index
short tandem repeat, 254 medication group analysis, 168–77,

single-nucleotide polymorphisms. See 170f, 171f
Single-nucleotide polymorphisms placebo group analysis, 165–67,
Genocopy, 214 166–68f, 171–77, 172f,
Genomewide association studies (GWASs), 174f, 176f
261–62 postrandomization time points analysis,
Genomewide linkage scan, 257 169–71, 175–77
Genotype–phenotype relationship, 259–60 Gu, J., 232
Germline DNA (gDNA) Guze, S. B., 300
and phenotypes, causal relationship
between variation in, 66–68
g-Functional, 115–17 Hafeman, D. M., 18, 42, 108
density, 115–16 Halpern, J., 58
Gilbertson, M. W., 307, 308 Hamilton Depression Rating Scale,
Gill, R. D., 117 159, 161
Giovannucci, E., 207 Harbord, R., 222, 235, 236
Glymour, M. M., 32 Harris, T. O., 281
Goldschmidt, R. B., 214 Hart, C., 210
Goodwin, D. W., 300 Health Professionals Follow-Up
Gottesman, I. I., 324 Study, 206
Gould, T, D., 324 Heart Protection Study, 208
Gram, L. F., 333 Heckerman, D., 118
Graphical causal models, 108–19 Heller syndrome. See Childhood
agnostic causal models, 117 disintegrative disorder
and algebra, combining, 60–61 Helzer, J. E., 304
directed acyclic, 108–10, 108f. See also Hendrix, S. L., 83
Directed acyclic graph Hernán, M., 7, 11, 12, 26–28, 291
FFRCISTG causal models, Heterogeneity, 342
110–11, 113 allelic, 255
g-functional, 115–17 locus, 255, 259
interventions restricted to a subset of remission, 333–34
variables, 118 syndromal, 323–24
minimal counterfactual models, 112, Hill, A. B., 77
113–14 Hobbs, H., 263, 264
manipulable contrasts and parameters, Holland, P. W., 4, 15, 29–30, 50, 51,
118–19 60, 339
non-parametric structural equation Homologous recombination, 258
models, 114–15 Honkanen, R., 218
Great Smoky Mountains study, 9, 12, 282 Hormone-replacement therapy (HRT)
Greenland, S., 26–28, 37, 38, 40, 55. 60, 62, for coronary heart disease, 206–13,
104, 109, 120, 131, 136, 291 207–9f, 210–12t
Growth mixture modeling (GMM), 159–77, Hoven, C. W., 287
162f Hsia, J., 83
all time points analysis, 169, 173 Hughes, J., 207
estimation and model choice, 164–65 Huntington disease, 323
Index 359
Illness, 321–22. See also Syndromes Katan, M. B., 218

components of, 321 Kaufman, J. S., 154
Imai, K., 108 Kaufman, S., 154
Imbens, G., 60 Keele, L., 108
Imipramine, 333 Kendler, K. S., 66, 307, 309, 312, 348, 349
Independence, 49 Kenny, D. A., 16–21, 42
conditional, 130, 132–33, 156 Kessler, R. C., 302
of counterfactuals, 111–15, 146, 148, 152, Klein, D. F., 324
153, 155 Kojima, S., 220, 221
Inheritance, 67 Kraemer, H. C., 20, 341
Instrumental variables approach, Krueger, R. F., 312, 338, 345
Mendelian randomization as, 229–30,
230f, 231f. See also Mendelian
randomization Laan, M. van der, 108
‘‘Insufficient but necessary components Latent class model, 344
of unnecessary but sufficient’’ Latent trait model, 344, 345
(INUS), 38, 54 Law of independent assortment, 217
Integrated counterfactual approach (ICA) Lenz, W., 214
causal explanation, 41–43 Leuchter, A. F., 159
causal identification, 37–41, 38f Levins, R., 26, 43
causal manipulation, effects of, 43–44 Lewis, D., 54
distinguished from potential outcomes Lewis, S., 222, 223, 235, 236
model, 39t Li, R., 239
Intelligence and posttraumatic stress Liability-spectrum model, 345
disorder, 307–8. See also Posttraumatic Lifton, R., 263, 264
stress disorder Linear equation, 52
Intensive design, 333 Linkage disequilibrium, 233
combining objective measures Linkage studies, 257–60
with, 334 nonparametric, 259
Intent to treat (ITT), 7, 21, 83 parametric, 258–59, 267
Intermediate phenotypes, 221–24. See also Lipp, H. P., 239
Phenotype Lipsey, T. L., 304
Intervention model (IM), 54, 66, Lober, M., 333
73–78 Locus heterogeneity, 255, 259. See also
relationship between mechanistic causal Heterogeneity
models and, 76–77 Lumping, 332–33
Inverse probability weighting, 11
Item response theory (IRT), 345–46
Iwai, N., 220, 221 Mackie, J. L., 37–38, 40, 41, 54
MacLehose, R. F., 154
Major depression, 72, 74–75, 298, 305–7,
Jackson, R. D., 83 309–11, 314. See also Depression
Jamain, S., 267 best-fitting meta-analysis model for,
Jorgensen, O. S., 333 312t, 313t
Judd, C. M., 16, 19 factor loadings of, 313t
360 Index
PTSD and, 310, 311 genetic epidemiology and, 241–43

Manic–depressive disorder, 325 as instrumental variables approach,
Manipulable contrasts/parameters relative 229–30, 230f, 231f
to a graph G, 118–19, 144–49, intermediate phenotypes, 221–24
147f, 148f maternal genotype as indicator of
Manson, J. E., 83 intrauterine environment, 225–27,
Markon, K. E., 312, 345 226f, 237–38
Maternal genotype, as indicator of problems and limitations of, 232–40
intrauterine environment, 225–27, study findings, implications of, 227–28
226f, 237–38. See also Genotype Mendel’s law, 217
Mathematics of causal relations, 47–62 Menopausal hormone therapy, 79–97
associational versus causal concept, background, 80–81
48–51 estrogen plus progestin therapy for,
graphical models, 54–57 79–80, 82, 83–84, 83t
nonparametric structural estrogen therapy for, 83–84, 83t
equations, 51–53 hypothesized effects of, 82–84, 83t
symbiosis between counterfactual and observational studies of, 80–81, 84–91,
graphical methods, 57–61 86t, 87–90f
Maxwell, S. E., 20 randomized clinical trial for, 84–91, 86t,
McFarlane, A. C., 301 87–90f, 93–97, 94f, 95f
Mechanistic model approach to randomized clinical trial with
causation, 74 observational studies for, 91–93
Mediation analysis, 4, 15–21, 16f, 17f, 19f, WHI trial design, 81–82
41, 60, 291 Millen, A. E., 208
Meehl, P. E., 343 Minimal Counterfactual Model (MCM),
Meiotic randomization, in animal 105, 112
studies, 239 counterfactual independence conditions
Melancholia, 333 for, 113–14
Mendel, G., 217 effect of treatment on the treated,
Mendelian randomization, 9, 213. See also 145–49, 147f, 148f
Genetics path specific effects, 140
in animal studies, 237, 239–40 Minor allele frequency, 254
canalization and developmental Missing data, 7, 8, 164, 166
stability, 235–40 model, 51, 57
comparison with randomized controlled problem, 29
trials, 228–29, 228f Moderation of causal effects, 21
confounding, 233–35 Moffitt, T. E., 282
exposure propensity, 216–20 Mohr, D., 20
failure to establish reliable Monoamine oxidase inhibitors, 330
genotype–disease associations, 233 Morgan, T. H., 242
failure to establish reliable Mplus program, 164
genotype–intermediate phenotype Multimorbidity, 342. See also Comorbidity
associations, 233 Multiple regression analysis, 331
gene by environment interactions Munafo, M. R., 324
and, 230–32 Murphy, S. A., 179, 183, 185, 199, 201
Index 361
Mutation, 254 syndromes with endophenotypes,

Muthén, B., 159, 164, 177, 344 replacing, 324
Myers, J., 312 Observational epidemiology, limits of,
206–13, 207–9f, 210–12t
Obsessive–compulsive personality disorder
Nagel, E., 279 (OCPD), 342
National Institute of Mental Health Offord, D. R., 282
(NIMH), 332 O’Mahony, S. M., 5
National Institutes of Health (NIH), 334 Ozer, E. J., 304
Natural confounding, 35–36. See also
Confounding
Natural direct effect. See Pure direct effect Panic disorder, 325–26
Natural experiments, 289–90 best-fitting meta-analysis model for,
prevention trials as, 290–91 312t, 313t
Natural response, 301 factor loadings of, 313t
Neale, M. C., 312 Parametric growth mixture models, 13, 15
NEO Personality Inventory, 309 Parametric linkage, 258
Neuropsychiatric disorders, 252–71 Paroxetine, 330
Neuroticism, 9 Pasteur, L., 322
genetic control of, 308 Path diagram, 52
5-HTTLPR and, 308–9 Pathophysiology, 326–31
and posttraumatic stress disorder, 308–9 import of remission, 331
Nonadherence, 6 normalizing effects of clinical
Noncompliance, 61 psychopharmacology on, 328–31
Nonexperimental observational objective measures with intensive
studies, 6f, 8 design, combining, 334
Non-parametric structural equation model therapeutics as guide to, 327–31
(NPSEM), 105–6, 114–15 Path-specific effects, 137–43, 138f
refutationist critique of, 106 Pearl, J., 4, 10, 47, 49, 51, 53, 55–62, 76,
Nonsteroidal anti-inflammatory drugs 105–8, 114, 120, 122, 125, 128–35,
(NSAIDs), 281 137, 138, 140–45, 339
Norris, F. H., 302 Peritraumatic responses, 304
Nurses’ Health Study, 206 Petersen, M., 108
Pharmacological dissection, 331–34
lumping, 332–33
Objective psychiatric diagnostic criteria, splitting, 332–33
causal thinking for, 321–35 Phenocopy, 214
brain imaging, 326–27 Phenotype
conceptual problems with and germline DNA, causal relationship
endophenotypes, 324–26 between variation in, 66–68
ideal diagnostic entity, search for, 322–27 intermediate, 221–24
pathophysiology, 326–31 relationship between genotype and,
pharmacological dissection, vicissitudes 259–60
of, 331–34 Picciano, M. F., 207
syndromal heterogeneity, 323–24 Placebo response, 160
362 Index
Pleiotropy, 233–34 Psychiatry, causal thinking in, 66–78

Polymorphism. See Single-nucleotide Psychopathology
polymorphism categorical. See Categorical
Posttraumatic stress disorder (PTSD), psychopathology
297–15. See also Trauma dichotomous, 341
comorbidity of, 309–14, 310t, 312f, dimensional. See Psychopathology
313f, 314t research with dimensions
conditional probability of, 301–3, integrating causal analysis into, 3–22
302t, 303t polythetic-categorical, 342
defined, 298 and puberty, relationship between, 285
factor loadings of, 313t status of, 341
inner logic of, 298–00 subclinical, 347
intelligence and, 307–8 Puberty and psychopathology, relationship
neuroticism and, 308–9 between, 285
prevalence of exposure, 301, 302t Pure direct effect (PDE), 104, 105, 120.
prior trauma as risk factor, 304–7, See also Direct effect
305t, 306t determinism, role of, 131–32
research on risk factors, 303–4 extended causal model, role of, 130–31
risk factors, problem of, 300–303 identification of, 122–27, 124f
risk of exposure to, 311t interventional interpretation of,
Potential outcomes model, 4, 25 135–37, 136f
causal effect as comparison between manipulation, effects of, 128–37
hypothetical exposure condition, with measured common cause of Z
29–30 and Y, 124–27, 124f, 134
distinguished from integrated need for conditioning on events of
counterfactual approach, 39t probability zero, 132–34
exposure conditions, 28 substantive motivation for, 128–35, 129f
formulating assumptions, 57–59
graphs and algebra, combining, 60–61
history and principles of, 27–32 Quasi-experimental design, 8–9, 286–89
language of, 57–61 dose–response measures of exposure to
limitations of, 32–37 an event, 287–88
performing inferences, 59–60 sample compared postevent to a
response types, 28 population norm, 286–87
Prentice, R. L., 79, 86, 87, 89, 93, 94, 97 Quitkin, F. M., 172
Prescott, C. A., 70, 307, 309, 312
Prevention trials, as natural experiments,
290–91 Radimer, K., 207
Principal stratum direct effects (PSDEs), Randomized controlled trials (RCTs), 206–8
120–22. See also Direct effect comparison with Mendelian
Probabilistically equivalent, 5 randomization, 228–29, 228f
Probability calculus, 49–50 Rapoport, J. L., 330, 333
Propensity scores, 10–11, 61 Rare variant:common disease
Psychiatric disorders, causes of hypothesis, 257
timing of risk exposure and, 280–82 autism-spectrum disorders and, 268–70
Index 363
new technology and, 270–71 Selective serotonin reuptake inhibitor

Regression dilution bias, 213. See also Bias (SSRI), 330, 332
Relapses, 325 Sequenced Treatment Alternatives to
Remission, 325–26 Relieve Depression (STAR*D), 183
heterogeneity, refining, 333–34 Sequential multiple assignment trial
import of, 331 (SMART), 179, 183–204, 184f
Resnick, H. S., 302 sample size calculations, 186–92, 187t,
Responder class, 160 188t, 190t, 202–4
Response to treatment, 160 sample size formulae, 199–202
Rett syndrome, 265 simulation design, for sample size
Reverse causation, 210, 211. See also formulae evaluation, 192–93
Causation robustness for new sample size
Richardson, T. S., 103, 107, 108, 122, 119, formulae, 194–97, 194–97t
135, 146, 147, 154 test statistics, 185–86, 186t
Rimm, E. B., 207 Serotonin transporter promoter
Risk exposure, timing of, 280–82 polymorphism (5-HTTLPR)
age at first exposure, 281 and neuroticism, 308–9
in developmental psychopathology, 281 Shachter, R. D., 118
in dose–response relationship, 282 Shadish, W. R., 8, 27, 41, 44, 289
to poverty, 282 Short tandem repeat (STR), 254
to a protective factor, 281, 282 Shpitser, I., 108, 125, 140–43
Risk-taking, 9 Simple phobia
Ritenbaugh, C., 83 best-fitting meta-analysis model for,
Robins, J. M., 4, 11, 12, 26, 28, 40, 44, 54, 312t, 313t
55, 58, 60, 61, 103–5, 107, 108, 115, factor loadings of, 313t
117–22, 131, 135, 146, 147, 149, 154, Single-nucleotide polymorphisms (SNPs),
155, 200, 291 253, 254
Robins, L. N., ix Sinisi, S., 108
Robustness analysis, 194–97, 194–97t Slade, T., 312
Rosenzweig, P., 330 Smoller, J. W., 323, 324
Rothman, K. J., 38, 41, 54 Sobel, M., 60
Rotnitzky, A., 122 Social phobia
Rubin, D. B., 4, 5, 7, 10, 25, 28–31, best-fitting meta-analysis model for,
44, 60, 121 312t, 313t
Rubin’s causal model. See Potential factor loadings of, 313t
outcomes model Spencer, S. J., 19
Rutter, M., 282 Spirtes, P., 108, 122
Splitting, 332–33
Stable unit treatment value (SUTVA), 5
Sameroff, A., 282 Stable unit treatment value assumption
Schizophrenia, 38, 330–31 (SUTVA), 5, 30–32, 39t, 41, 43, 44
Schork, N. J., 324 interference between units, 34–35
Sebat, J., 269 stable treatment effect, 33–34
Seifer, R., 282 Stampfer, M. J., 207
Selection bias, 25. See also Bias Stanovich, K. E., 338
364 Index
State, M. W., 252 VanderWeele, T. J.,

Stefanick, M. L., 83 107, 108, 119, 135,
Stein, M. B., 302 146, 147
Straumanis, J., 330 Vansteelandt, S., 122
Structural equation modeling, 68 Villet, R., 214
Subar, A. F., 208 Virchow, R., 322
Subclinical psychopathology, 347. See also Virginia Adult Twin Study of
Psychopathology Psychiatric and Substance
Substance-use disorder Use Disorders, 71
PTSD and, 310, 311
Sufficient cause, 38
Sufficient set, 55 Waddington, C. H., 248
Swanson, C., 207 Wagner, P. D., 251
Sydenham, T., 322, 325 Wassertheil-Smoller, S., 83
Syndromal heterogeneity, 323–24. See also Watson, D., 312, 313
Heterogeneityby pharmacological Wechsler Intelligence Scale for
dissection, reducing, 331–34 Children–Revised, 307
Syndromes, 322. See also Illness Weiss, D. S., 304
diagnosis of, 322–23 Weiss, R. D., 179
with endophenotypes, replacing, 324 Weiss, W. M., 241
phenotypically homogeneous, 328 Willett, W. C., 207
Williams, R. S., 251
Women’s Health Initiative (WHI),
Takagi, S., 220–22, 231, 236 79, 80, 83
Taxometric methods, 343–44 menopausal hormone therapy trial
Temporal patterns, of causal processes, design, 81–82
12–15, 14f Steering Committee, 83
Terwilliger, J. D., 241 trial findings, 82–84, 83
Thirion, B., 326 Writing Group for the Women’s
Toh, S., 7, 11 Health Initiative
Translational research, 327 Investigators, 83
Transmission tests, 260 Woodward, J., 54, 66
Trauma. See also Posttraumatic stress Wright, S., 51–52, 62
disorder
risk of exposure to, 311t
Tsuang, M. T., 323, 324 Yamamoto, T., 108
Twins as natural experiment. Yamauchi, R., 220, 221
See Co-twin–control method Yasuno, S., 220, 221
Yehuda, R., 301
Validity
construct, 41–42 Zanna, M. P., 19
external, 42–43 Zuckerkandl, E., 214

Causality and Psychopathology Finding The Determinants of Disorders and Their Cures American Psychopathological Association PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Causality and Psychopathology Finding The Determinants of Disorders and Their Cures American Psychopathological Association PDF

Uploaded by

Copyright:

Available Formats

Causality and Psychopathology

american psychopathological association

patrick e. shrout, ph.d.

katherine m. keyes, ph.d., mph

katherine ornstein, mph

American Psychopathological Association. Meeting (98th : 2008 : New York, N.Y.)

Research in psychopathology can reveal basic insights into human biology,

about how causal inference can be promoted in psychopathology research in

psychopathology (Chapter 11), posttraumatic stress disorder (Chapter 12),

part i causal theory and scientiﬁc inference

2 What Would Have Been Is Not What Would Be 25

3 The Mathematics of Causal Relations 47

4 Causal Thinking in Psychiatry 66

5 Understanding the Effects of Menopausal Hormone Therapy 79

part ii innovations in methods

7 General Approaches to Analysis of Course 159

8 Statistical Methodology for a SMART Design in the

9 Obtaining Robust Causal Evidence From Observational Studies 206

10 Rare Variant Approaches to Understanding the Causes of

part iii causal thinking in psychiatry

12 Causes of Posttraumatic Stress Disorder 297

13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 321

14 The Need for Dimensional Approaches in Discerning the

garnet l. anderson, ph.d e. jane costello, ph.d.

donald f. klein, m.d., d.sc. judea pearl, ph.d.

Causal Theory and Scientiﬁc Inference

Integrating Causal Analysis into Psychopathology

Both in psychopathology research and in clinical practice, causal thinking is

Themes in Causal Analysis

In the pioneering work of Rubin (1978), randomized experiments had a

Average Causal Effects from Experiments

in the biological assays. When random assignment is used to deﬁne T, then E

Nonexperimental Observational Studies

Innovative Designs and Analyses for Improving Causal

rural families in the Great Smoky Mountain developmental study (see

Analytic Approaches to Confounding

comparative studies has been aided by formal representations of the nature of

There may be other reasons why T is related to Y (as indicated by correlation

Temporal Patterns of Causal Processes

twelve weeks and shift around 10 weeks.

Figure 1.3 Examples of

trajectory in group A to be compared with that in group B. This class of

Mediation and Moderation of Causal Effects

Just as psychopathology researchers are willing to consider scientiﬁc

depression 3 months after treatment completion among depressed patients

studies, we must conclude that important bias in estimates of the indirect

Indirect Effect Bias

The time is ripe for psychopathology researchers to reconsider the conven-

What Would Have Been Is Not What Would Be

Epidemiology is often described as the basic science of public health. A

it to observational studies, and expanded by Robins to exposures that vary

A less pessimistic assessment suggests that although public-health inter-

The Potential Outcomes Model

In the epidemiologic literature, a counterfactual approach is generally equa-

Stable Unit Treatment Value Assumption

Thus, if one were to study the effects of a particular form of psychother-

Limitations of the Potential Outcomes Model for

our treatments, creating exchangeability between treated and untreated

Direct Violations of SUTVA

In order to identify a causal effect, a necessary SUTVA is that there is only

Interference Between Units

When considered in real-world applications over a long enough time frame,

that time frame within that normative context. However, in a public-health