You are on page 1of 37

IY35CH15-Chakraborty ARI 30 January 2017 13:42

Review in Advance first posted online


V I E W
E on February 6, 2017. (Changes may
R

still occur before final publication

S
online and in print.)

C E
I N

A
D V A

A Perspective on the Role


of Computational Models
in Immunology
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Arup K. Chakraborty
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics,
Chemistry, and Biological Engineering, Massachusetts Institute of Technology, Cambridge,
Massachusetts 02139; email: arupc@mit.edu

Annu. Rev. Immunol. 2017. 35:403–39 Keywords


The Annual Review of Immunology is online at computational immunology, signaling, T cell repertoire, immune
immunol.annualreviews.org
monitoring, vaccine design
This article’s doi:
10.1146/annurev-immunol-041015-055325 Abstract
Copyright  c 2017 by Annual Reviews. This is an exciting time for immunology because the future promises to be
All rights reserved
replete with exciting new discoveries that can be translated to improve health
and treat disease in novel ways. Immunologists are attempting to answer in-
creasingly complex questions concerning phenomena that range from the
genetic, molecular, and cellular scales to that of organs, whole animals or
humans, and populations of humans and pathogens. An important goal is
to understand how the many different components involved interact with
each other within and across these scales for immune responses to emerge,
and how aberrant regulation of these processes causes disease. To aid this
quest, large amounts of data can be collected using high throughput in-
strumentation. The nonlinear, cooperative, and stochastic character of the
interactions between components of the immune system as well as the over-
whelming amounts of data can make it difficult to intuit patterns in the data
or a mechanistic understanding of the phenomena being studied. Compu-
tational models are increasingly important in confronting and overcoming
these challenges. I first describe an iterative paradigm of research that in-
tegrates laboratory experiments, clinical data, computational inference, and
mechanistic computational models. I then illustrate this paradigm with a few
examples from the recent literature that make vivid the power of bringing to-
gether diverse types of computational models with experimental and clinical
studies to fruitfully interrogate the immune system.

403

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

INTRODUCTION
Our desire to combat infectious diseases, the untold amount of suffering caused by autoimmune
diseases, and the allure of discovering the mechanisms that underlie how the remarkable immune
system of higher organisms work have led to extensive laboratory experiments and clinical re-
search. Some spectacular discoveries have resulted. In spite of these major advances, a mechanistic
understanding of how systemic immune or autoimmune responses emerge has remained elusive.
Practical consequences of this missing basic science include the inability to rationally design vac-
cines against diverse pathogens that are wreaking havoc and the inability to effectively treat several
autoimmune diseases. An important goal has been, and continues to be, to remedy this situation by
developing a mechanistic understanding of how immune responses develop, and then harnessing
that knowledge for rational design of therapies and vaccines.
A barrier to the quest for mechanistic principles in immunology is that the pertinent processes
Access provided by University of California - San Diego on 04/26/17. For personal use only.

involve dynamic events with many participating components that must act cooperatively for any
given phenomenon to emerge. Moreover, these collective processes span a spectrum of scales.
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

The activation of single cells upon receptor stimulation requires that the many molecules that
are part of the intracellular biochemical network act in highly cooperative ways. Many types of
cells must cooperate in tissues in order to mount an immune response on the scale of the entire
organism, and populations of pathogens evolve to beat our immune system. This set of cooperative
processes, arranged in a hierarchy of scales, with feedback between the scales, can make it difficult
to intuit underlying mechanisms from experimental observations. Further confounding intuition
is that many of the pertinent processes are inherently stochastic.
Immunologists can now also collect huge amounts of genetic, cellular, and biochemical data
and can interrogate the immune systems of many humans on an unprecedented scale. But, as
Sidney Brenner (1) has noted, “we are drowning in a sea of data and thirsting for knowledge.”
Can theoretical and computational models complement clinical studies and model experiments to
help translate data (big or small sets) to mechanistic knowledge that can be harnessed to achieve
practical goals? A paradigm of research is emerging that suggests that the answer to this question
is yes.
All fields of science are driven by experimental data. If the set of observations is small, as has
usually been true until recently for immunology, patterns of how some measured variables are
related to others (or correlated) can be inferred by careful perusal of the data. The correlations
revealed by the data can suggest new mechanistic hypotheses. If large amounts of data on many
variables are collected in a high throughput fashion without the bias of any particular hypothesis,
it is difficult to infer the empirical patterns of correlations between measured variables without
the aid of a computer. Specifically, statistical methods and machine learning algorithms developed
by computer scientists need to be employed to reveal patterns in large data sets. Just as with
smaller data sets, these patterns can sometimes directly suggest mechanistic hypotheses that can
then be tested by carefully designed experiments, and iterative loops between experimental results
and hypothesis generation can enable discoveries. But, whether the data set is small or large, the
complexity of the cooperative, nonlinear, and stochastic phenomena pertinent to immunology can
make it difficult to parse the multifaceted implications of a hypothesis and determine if it provides
a plausible explanation for the observations. Thus, even after empirical patterns in the data are
clear, it can be difficult to formulate meaningful mechanistic hypotheses that can be subjected to
experimental tests.
Biophysical models can be formulated that augment known mechanistic information with new
hypotheses that may explain the observations. Computational or theoretical studies of the model
can keep track of every possible event that can occur consistent with a hypothesis and reveal

404 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

Formulate new
hypothesis
2
1 1
Mechanistic
Infer patterns by
computational
examining data
models
Gather new
experimental data
Infer patterns using
computational algorithms
(e.g., machine learning)
4
3
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Formulate new
hypothesis
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Figure 1
A paradigm of research that integrates diverse computational approaches with laboratory experiments and
clinical data to develop a mechanistic understanding of immunology that can be harnessed for rational design
of vaccines and therapies. Iterative loop 1 is a traditional paradigm that does not involve computational
models and has been very successful. Iterative loop 2 includes mechanistic computational models to aid the
formulation of hypotheses that may explain puzzling observations regarding complex phenomena when a
small number of variables are measured. Iterative loop 3 involves the use of computational inference
methods to discern patterns in high-throughput data. Iterative loop 4 includes mechanistic computational
models as in iterative loop 2, but patterns in the data are determined using computational inference.

whether a particular hypothesis is plausible. Hypotheses that appear feasible at first glance can
be incorrect because of the complexity of the underlying phenomena. Computational biophysics–
based models not only can screen out these hypotheses prior to fruitless experimental tests, but
also can shed light on the reason a hypothesis is unlikely to be right and thus guide the choice
of other feasible hypotheses. Such computational studies are not exercises in fitting parameters
to quantitate known mechanistic models. Rather they offer ways to obtain mechanistic insights
and guide the choice of meaningful hypotheses underlying puzzling observations and the design
of realistic experiments that can test the hypotheses. Iterative loops that involve data collection,
intuitive or computer-aided inference, mechanistic biophysical modeling, and new experiments
can enable discoveries.
Figure 1 illustrates this paradigm of research, and there are two important points to note:
(a) This paradigm requires close collaboration between experimental/clinical and computational
scientists. (b) Computational inference is often labeled (in my view incorrectly) as data-driven
modeling and sometimes considered to be an alternative to biophysical mechanistic models. This is
a false debate [as has previously been noted (2)]. As Figure 1 shows, these different computational
approaches are elements of the tapestry of a unified research paradigm where computational
modeling complements experimental and clinical research to prevent us from drowning in a sea
of data and thirsting for knowledge in immunology.
The goal of this review is not to comprehensively list recent studies that blend computational
modeling and experiments in interesting ways. Rather, I discuss only a few examples that illus-
trate each of the four iterative loops in Figure 1 and that make clear how bringing together
computational models and experimental/clinical work can be fruitful for a range of problems in
immunology.

www.annualreviews.org • Computational Models 405

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

LYMPHOCYTE SIGNALING
The activation of T and B lymphocytes when their surface receptors engage cognate ligands is a
key event in the initiation of an immune response (3). Lymphocyte activation exhibits several re-
markable features. For example, even a single agonist peptide–major histocompatibility (pMHC)
molecule in a sea of thousands of endogenous pMHC molecules can trigger activation (4–11)
even though the difference between their binding strengths to the TCR are not large (12–15).
Membrane–proximal T cell signaling is also digital in character, with each cell in a population
expressing either a high or a low level of active downstream signaling products [e.g., bimodal
phospho-ERK expression (16, 17)]. This remarkably sensitive, selective, and digital response is
exhibited on a fast time scale (18). An important question is how the individual signaling com-
ponents interact with each other to result in a signaling network topology that enables these
properties. This issue has been explored extensively by bringing together computational and ex-
Access provided by University of California - San Diego on 04/26/17. For personal use only.

perimental studies, especially over the last 15 years. Numerous reviews and perspectives have
summarized the primary literature in the last few years (3, 16, 19–25), and so to avoid repetition
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

my discussion of this topic is brief.


Computational research on signaling often begins with a puzzling experimental observation.
A hypothesis that may be able to describe the observation can be formulated by adding features to
the known interactions between the pertinent signaling proteins. The new signaling network thus
formulated represents a set of rules that describe which proteins modify and interact with other
proteins and at what rate. The rules are then translated into mathematical language, and then to
a set of instructions for a computer so that it can carry out calculations with the mathematical
description. The computer then executes these instructions in a prescribed condition (e.g., upon
receptor-ligand engagement). The results describe how the hypothesis affects the complex non-
linear interactions in the signaling network and whether it is a plausible explanation for the puzzle
under consideration. Mutations correspond to modifications of the rules, and results of calcula-
tions with the modified rules describe the consequences of mutations, which can then guide the
choice of experiments that can best test a plausible hypothesis—the calculations may show that if
the hypothesis is correct then a certain mutation will abrogate the observed phenomenon whereas
another will not.
Often we are interested in how downstream signaling products are activated as a function of
time after a stimulus (e.g., receptor-ligand engagement). Thus, we need to calculate the number of
modified/activated proteins produced in a series of short intervals of time following the stimulus.
Given the rules, the available numbers of each type of protein, the rates at which the relevant bio-
chemical reactions occur, and parameters associated with protein transport (if important), this can
be done. Which specific mathematical language into which the rules are translated depends upon
the circumstances. If the proteins under consideration are expressed in relatively large numbers,
then the number of proteins available in the volume in which the signaling reactions take place (e.g.,
the membrane or region of the cytoplasm) can be adequately described by its concentration. The
concentration is a measure of the average number of protein molecules divided by the volume. If the
protein expression levels are large, the fluctuations in protein numbers relative to the average value
are small compared to the average itself and the concentration is a good descriptor of the amount of
a protein that is available for biochemical reactions. If expression levels are small, the fluctuations
are not small compared to the average and one must explicitly account for these fluctuations.
If protein concentrations are adequate descriptors for the system under consideration, and
the spatial distribution of signaling proteins is not of interest or the transport rates are much
faster than the rates at which the pertinent biochemical reactions occur, then the rules that de-
fine the biochemical signaling network are translated into the mathematical language of ordinary

406 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

differential equations. The output of the calculation is the concentration of every activated and
unmodified protein as a function of time, as might be measured by Western blot assays. If protein
transport and spatial distribution of signaling components is important, but concentration is still
a good descriptor of the amount of each signaling protein, then the appropriate mathematical
language is one called partial differential equations. The output of the calculation is the concen-
tration of each activated or unmodified signaling protein as a function of time and location in the
cell, as might be observed in imaging experiments.
If the fluctuations in protein numbers around the average value are not negligible (i.e., concen-
tration is not an adequate descriptor), one must calculate the temporal evolution of the probability
with which a certain number of proteins of a particular type is observed. This may be the situation
when protein expression levels are low within a single cell and/or when cell-to-cell variation in
expression levels is large. Stochastic calculations are often carried out using the mathematical lan-
Access provided by University of California - San Diego on 04/26/17. For personal use only.

guage of Master equations. These equations are often solved by using computational techniques
called Monte Carlo methods (named after the European city famous for its casinos). The output of
the calculations is the probability of observing a particular number of proteins at each time point
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

at different locations. In some circumstances, protein transport can be ignored, in which case the
results will just be the probability of observing a particular number of proteins at each time point.
These results can be displayed as histograms that should correspond to experimental measure-
ments carried out on single cells in a population (e.g., a FACS experiment). Figure 2 shows the
type of circumstance in which a particular type of mathematical language is appropriate, and its
relative computational burden. It is important that experimental immunologists be aware of the

Spatially inhomogeneous,
stochastic calculations Small copy-numbers of proteins;
(Gillespie and Monte Carlo algorithms) spatial reorganization is important

Spatially homogeneous
stochastic calculations Small copy-numbers
of proteins; well mixed
Computational time

(Gillespie and Monte Carlo algorithms)

Partial differential Large copy-numbers of proteins;


equations spatial reorganization is important

Ordinary differential Large copy-numbers


equations of proteins; well mixed

Degree of complexity

Figure 2
Computational methods used to study immune cell signaling. The circumstances for which each type of method is appropriate and the
computational cost and complexity associated with each method is illustrated. Adapted from Chakraborty & Das (20).

www.annualreviews.org • Computational Models 407

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

basic approximations inherent to each of these types of mathematical languages, as the reliability
of the calculations for the problem under consideration depends upon the assumptions made.
A number of standard computer codes now exist that can carry out the calculations described
above (e.g., 26–28). These software tools translate the biochemical rules that describe the signaling
pathway into a mathematical language and computer code automatically using approaches that
are often rooted in graph theory (25). They are sometimes called rule-based methods, because
one needs to specify only the biochemical rules and the pertinent parameters. This is a nontrivial
advantage because a small set of rules defining the signaling network can lead to an enormous
number of possible biochemical reactions, and translating each one of them into mathematical
language and computer code by hand is very tedious. Examples of software that can be used by
experimental immunologists and modeling experts include the Stochastic Simulation Compiler
(SSC) (26), Simmune (27), and BionetGen (28). The last method is the most commonly used
Access provided by University of California - San Diego on 04/26/17. For personal use only.

software package, and it has several attractive features that represent an enormous effort by Faeder,
Hlavacek, and coworkers (25).
The outcome of a calculation depends upon the values of the parameters, such as rate constants
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

and protein concentrations. Therefore, it is important to quantitatively measure as many pertinent


parameters as possible, and efforts toward this end are under way (2). It is important to note,
however, that biological systems often exhibit robustness to variations in several parameters. In
other words, the qualitative behavior is not significantly different upon changing some of the
parameters over modest ranges (20 and citations therein). One may speculate that the topology
(or wiring) of the biochemical networks have evolved to exhibit robustness given the ubiquity
of stochastic fluctuations and changing environments. It is also true, however, that a biological
system can be extremely sensitive to small changes in some parameters. Calculations should be
carried out varying the unknown parameters over large ranges to determine the robustness of the
hypothesis to such changes. A model that is extremely sensitive to many parameter values should
give the investigator pause. The few parameters whose values are found to determine qualitative
behavior in a sensitive fashion need to be measured.
Most of the work that has been done by pairing computation and experiments to discover
new aspects of lymphocyte signaling has followed iterative loop 2 in Figure 1. Over the last two
decades, a number of advances have resulted from complementary computational and experimen-
tal work on lymphocyte signaling. These studies have been reviewed elsewhere (2, 3, 19–25), and
so here I mention only a few examples. The features of the T cell signaling network that enable
the extraordinary sensitivity and selectivity to agonist ligands have been illuminated (3). In par-
ticular, an understanding of how the basic model of kinetic proofreading must be embellished by
feedback loops and coagonists has greatly benefited from complementary iterative experimental
and theoretical research (7–11, 17, 29, 30). Similarly, insights into how and which feedback loops
can regulate digital signaling in T cells (17), and which might not (31), have also been obtained.
The origin of antagonism has been extensively explored (32–35) but remains puzzling. A very
large number of studies have also focused on deciphering whether the on-rate, the half-life, or
the equilibrium affinity of TCR-pMHC bonds determines functional outcome (15). One possi-
bility that seems to rationalize the data is that the principal determining factor is half-life, but
a sufficiently high on-rate could compensate for a lower half-life because of rebinding prior to
disassembly of downstream signaling complexes; this results in an effective half-life that can be
calculated when the on and off rates are given (15, 36). Current experimental efforts that aim to
determine receptor-ligand binding interactions at the intercellular junction (e.g., 37–40), after a
period of uncertainty, have converged on the fact that ligands with longer half-lives activate T
cells more robustly. But, an interesting development is the suggestion that the force applied to
the bond is important (32, 41–46), and that good agonists increase their half-life of interaction

408 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

with TCRs upon the application of force. Future research will clarify these ideas. The role of
spatial patterns, such as the immune synapse, in regulating signaling, especially in extinguishing
membrane-proximal signaling, has been elucidated (47, 48). Signaling in B cells and NK cells has
also been studied (49). Research along these lines is expected to grow further with the availability
of standard software tools.
As various high-throughput methods for assaying cell signaling molecules as a function of
time are now becoming available, research along the lines of iterative loop 4 in Figure 1 is
growing. Reference 2 provides a concise description of many of the experimental tools that can
obtain high-dimensional data pertinent to unraveling the network of interactions in a signaling
module. One such technique is phospho-proteomics, which can map the phosphorylation state of
signaling molecules as a function of time. The patterns revealed by such data, and the underlying
mechanisms, need to be determined (i.e., the network of interactions) by a combination of inference
Access provided by University of California - San Diego on 04/26/17. For personal use only.

methods, mechanistic models, and experiments. One example is provided by a study where a
quantitative mass spectrometry–based technology, stable isotope labeling by amino acids in cell
culture (SILAC), was used to analyze the temporal changes in phosphorylation patterns of a
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

number of molecules involved in the early events in TCR signaling for three cellular systems:
wild-type Jurkat T cells, ZAP-70-deficient Jurkat T cells, and Jurkat T cells containing a ZAP-70
molecule that lacked catalytic activity (50). Computational modeling suggested that the differences
in phosphorylation patterns could be explained if ZAP-70 negatively regulates the Src kinase Lck
in a way that is dependent upon its catalytic activity. This, in turn, leads to the observed differences
between the three systems in the phosphorylation patterns of the immunoreceptor tyrosine-based
activation motifs (ITAMs) of the CD3 and ζ chain of the TCR in the basal state and after TCR
stimulation. This prediction has now motivated experiments to search for the precise mechanism
(and molecules) via which this negative feedback loop is mediated.
I note another theoretical study because it illustrates how theoretical predictions can sometimes
await the right experimental tests for a long time. Grb2 is an adapter molecule that recruits SOS
to the membrane upon receptor stimulation. Samelson and coworkers showed that two Grb2
molecules could bind to SOS via SH3 domains and thus bridge between two LAT molecules to
form oligomers (51); this feature appeared to be important for signaling in T cells and mast cells
(51, 52). Goldstein and coworkers developed a theoretical model that showed that when the valency
of interaction between LAT and Grb2 equaled three, a gel-like phase could form (53). Recent
experiments by Vale and coworkers, published seven years after the theoretical prediction, reveal
in a vivid way that such gels do form upon receptor stimulation, and this may be a mechanism to
create a high concentration of LAT, ZAP-70, and other molecules involved in forming the LAT
signalosome, which, in turn, may help propagate signaling more efficiently (54). This example, and
extensive studies concerning the immunological synapse (47, 48), may be just the tip of the iceberg
representing the diverse ways in which spatial organization of proteins can regulate signaling.

SINGLE-CELL RNA SEQUENCING STUDIES OF DENDRITIC CELLS


AND TH17 CELL BIOLOGY
Single-cell mRNA sequencing (RNA-Seq) uses deep-sequencing technologies to quantify the tran-
scriptome across a collection of cells (55–60). RNA-Seq has an advantage over earlier techniques
like gene expression microarrays in that it can measure gene expression within individual cells
and identify cell-to-cell variation within a larger cell population that might appear homogeneous
otherwise. Using RNA-Seq, thousands of RNA transcript species can be measured in hundreds or
thousands of individual cells, generating data sets with very high dimensionality. Computational

www.annualreviews.org • Computational Models 409

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

statistical inference methods, classified as unsupervised learning techniques, are necessary to distill
this large body of data into interpretable form in order to drive biological discovery.
The term unsupervised learning is not derogatory but instead indicates that a mechanistic
model is not specified in advance (61, 62). No explicit mathematical structure is assumed, and no
parameters are fit from the data. Instead, the goal can be cast as one of dimensionality reduction.
Among the vast number of combinations of experimental conditions and measured responses, are
there a small number that capture the most important variation observed in the data? Can the
data be aggregated or viewed in a way that essential patterns emerge? In the context of genomic
analysis, what genes are operating in a coordinated manner to accomplish a specific biological
function? Does a cellular population fall into a few subsets with distinct characteristics?
There is biological intuition to suggest that dimensionality reduction is possible for high-
dimensionality gene expression data obtained by RNA-Seq. It is well known that genes operate in
Access provided by University of California - San Diego on 04/26/17. For personal use only.

a coordinated manner in complex networks, and cell lines typically exhibit a small number of dis-
tinct phenotypes. Principal components analysis (PCA) is the simplest prototypical unsupervised
learning technique that can carry out dimensionality reduction. It carries out linear operations
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

(see the section titled Immune System Monitoring for weakness of this approach) on the matrix of
observed correlations between measured variables. Each principle component (PC) thus obtained
represents a particular combination of measured variables and is orthogonal to other PCs. The
PCs are arranged in the order in which each PC describes the observed variance in the data. Sev-
eral groups have performed RNA-Seq experiments and made extensive use of PCA, but here we
focus on the recent work of Kuchroo, Park, Regev, Shalek and colleagues as illustrative examples.
Shalek et al. (63) investigated the immune response of murine bone marrow–derived DCs
(BMDCs) after exposure to lipopolysaccharide (LPS). Since LPS is a component of gram-negative
bacteria, this is an attractive model system for pathogenic bacterial infection. Shalek et al. stim-
ulated BMDCs with LPS, collected single cells after four hours, and quantified gene expression
through mRNA levels in 18 individual cells and in three replicate populations of 10,000 cells. Gene
expression was highly correlated between population samples [Pearson correlation coefficient
r > 0.98 for log(TPM) values, where TPM denotes mRNA transcripts per million], but correla-
tion between individual cells was much lower (r ≈ 0.5). By sequencing single cells, Shalek et al.
showed that 185 of the 522 most highly expressed genes had bimodal expression patterns, with high
mRNA levels in many cells but levels more than an order of magnitude lower (or undetectable)
in other cells. Many of these genes are involved in antiviral or inflammatory immune responses.
A bimodal expression pattern suggests the presence of discrete on and off states. If N genes are
expressed bimodally, there are in principle 2N unique states if each gene operates independently,
or in this case 2185 ∼ 1056 states. Clearly an inference technique is necessary to reduce the space
of possibilities. Shalek et al. applied PCA to the matrix of single-cell RNA-Seq data and found
that the first principal component (PC1, which accounted for 15% of the total sample variance)
discriminated two distinct subpopulations, one in the early stages of pathogen-dependent matu-
ration, with the other fully mature (Figure 3). The first group expressed high levels of antiviral
and inflammatory cytokines, while the second predominantly expressed cell surface proteins and
chemokines that accompany BMDC maturation.
Looking deeper, the second principal component (PC2, explaining 8% of the variance) iden-
tified a cluster of 137 genes acting in concert during the immune response. The cluster included
two master regulators of a key antiviral circuit, Irf7 and Stat2. As an example of the feedback
between inference and experiment (iterative loop 3 in Figure 1), Shalek et al. then postulated
that bimodal gene expression within the cluster resulted from differences in the levels and ac-
tivities of Irf7 and Stat2. Testing this hypothesis motivated further RNA-Seq measurements on
Irf7 knockout (Irf7−/− ) mice, which showed greatly reduced the expression of most antiviral genes

410 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

0.4

PC2 0.2

0.0

–0.2
Access provided by University of California - San Diego on 04/26/17. For personal use only.

–0.4
Maturing
Mature
–0.2 0.0 0.2 0.4 0.6
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

PC1
Figure 3
Identifying cell states and circuits using single-cell RNA-Seq. A principal components analysis (PCA) of the
expression of 632 variable genes across 18 lipopolysaccharide-stimulated, bone marrow–derived dendritic
cells reveals two distinct maturity states, separated by principal component 1 (PC1). Meanwhile, more
continuous variation along PC2 reflects heterogeneity in the activation of an antiviral signaling pathway
among those same cells. Results are from Shalek et al. (63).

but no effect on Stat2 expression or variability. This result implies that Stat2 operates in parallel
with or upstream from Irf7, a conclusion that could not be made from the initial data set without
subsequent experiments driven by the initial data analysis using inference methods. These studies
are an example of iterative loop 3 in Figure 1.
In a follow-up study (64), the authors expanded their experimental protocol to include three
different pathogenic stimuli and sampling points at five different times after stimulation. The first
three principal components of the gene expression matrix delineated four distinct modules related
to the source of immune stimulation and the location along the maturation trajectory.
Unsupervised learning techniques have also led to increased understanding and insight into the
pathogenicity of Th17 cells. The Th17 lineage is a subset of effector T helper cells characterized
by their production of IL-17 (65). Th17 cells serve a protective role by fighting bacterial and fungal
infections at mucosal interfaces, particularly Candida albicans (66, 67). Importantly, however, Th17
cells are also implicated in several autoimmune disorders, including rheumatoid arthritis, psoriasis,
inflammatory bowel disease, and multiple sclerosis (65).
An important question in Th17 cell biology concerns the factors that control their differentia-
tion from naive CD4+ T cells and their subsequent pathogenicity. An early study (68) differentiated
Th17 cells in vitro under six combinations of inducing cytokines and measured mRNA profiles by
whole-genome microarray (quantifying expression at the population level, without the single-cell
resolution of RNA-Seq). PCA using 233 expressed genes or a 23-gene subset identified three
groups corresponding to combinations of the stimulating cytokines. Two of the groups were
highly pathogenic in vivo (after transfer to mice) and one was benign, with expression of the IL-23
receptor being a common characteristic of pathogenicity. In both analyses, the first two PCs
explained more than 80% of the variance in the data.
More recent work (69, 70) explored genetic circuits that regulate Th17 cell pathogenicity.
Gaublomme et al. (69) induced autoimmune encephalomyelitis (EAE) (a murine model for

www.annualreviews.org • Computational Models 411

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

multiple sclerosis) in vivo then harvested Th17 cells from draining lymph nodes (LNs) and the
central nervous system (CNS). The authors obtained single-cell RNA-Seq profiles for ∼500
cells and quantified the expression of ∼4,000 genes. There was substantial cell-to-cell variation
in gene expression, with typical correlation of 0.7. As in earlier work on BMDCs (63), several
key regulatory genes were expressed in a bimodal distribution, indicating a heterogeneous cell
population and a spectrum of pathogenicity.
The data set describing the expression of ∼4,000 genes in ∼500 cells contains ∼2 million
entries. Computational approaches are necessary to tease out the most important biological in-
formation from that sea of data. Gaublomme et al. (69) conducted PCA and found that the first
principal component distinguishes effector from memory phenotypes. The second component
identifies the source (LN or CNS) and correlates with the point along a maturation trajectory
following a progression from a naive-like state with low activity in the LN to Th1-like effector
Access provided by University of California - San Diego on 04/26/17. For personal use only.

and memory states in the CNS.


Gaublomme et al. (69) also differentiated naive CD4+ T cells in vitro under pathogenic (IL-
1β + IL-6 + IL-23) and nonpathogenic (TGF-1β+ IL-6) conditions. They collected RNA-Seq
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

data for 420 cells and quantified the expression of ∼7,000 genes. A separate PCA on the in
vitro data showed that key signatures of pathogenicity correlated strongly with the first PC. The
immunologically important characteristic of Th17 cells, their pathogenicity, is clearly identified
by the particular combination of measured variables that describes the most variance, the first PC.
As noted previously, many genes participating in the immune response (inflammatory and
regulatory cytokines and their receptors) exhibit a bimodal expression pattern, with high mRNA
levels in at least 20% of cells but much lower or undetectable levels in other cells. To proceed
further and identify the genetic networks in operation, the authors examined the intercellular cor-
relations between these bimodally expressed genes. Two distinct modules become apparent when
visualizing the correlation matrix as a heat map (PCs by eye). In this case, one module is enriched
in genes for proinflammatory Th17 cytokines, some of which are associated with autoimmune
disorders like inflammatory bowel disease, rheumatoid arthritis, and multiple sclerosis. The other
module contains regulatory genes such as Il10, Il24, and Il9 and should be responsible for reducing
pathogenicity. Having identified these two modules, the authors were then able to classify their
constituent genes, many of which had unknown function previously, as pathogenic or regulatory.
Four newly classified genes were selected for further experimental analysis. Briefly, Gpr65 is a
member of the proinflammatory module. Using populations of Gpr65−/− Th17 cells, the authors
demonstrated that Gpr65 promotes Th17 cell pathogenicity and is essential for EAE. In exten-
sive companion work (70) Cdl5 was identified as a novel regulatory gene, and results obtained
using Cdl5−/− mice demonstrated that Cdl5 expression reduces pathogenicity through complex
interactions with metabolic pathways in Th17 cells.
The studies discussed above show the power of single-cell RNA-Seq in highlighting the impor-
tance of bimodal expression. It is difficult to identify previously unknown participants in the genetic
network with data collected at the population level, because the bimodal expression is obscured.
It also provides an illustrative example of statistical inference motivating further experiment and
biological discovery.

AFFINITY MATURATION
The development of potent antibodies against infectious agents is central to how the immune
system combats infection and is the basis for many potent vaccines. Upon natural infection or
vaccination, antibodies are produced by affinity maturation, a Darwinian evolutionary process.
Ever since the seminal papers by Eisen and coworkers describing affinity maturation (71, 72), the

412 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

topic has been the subject of intense study. The most recent advances are summarized by Victora
& Nussenzweig (73) and De Silva & Klein (74), and a recent paper by Victora and coworkers
vividly highlights the stochastic aspects of affinity maturation (75).
The study of affinity maturation using mechanistic models and detailed computation also has a
rich history. The seminal early studies are attributable to Perelson and coworkers (76–78). High-
affinity antibodies with several mutations are produced by germinal center reactions. A key aspect
of germinal center reactions was first suggested by computational studies carried out by Oprea &
Perelson (78). They formulated a mechanistic model of affinity maturation based on information
known (from experiments) at that time. They added a new element, however: A fraction of B
cells that received survival signals from T helper cells and were positively selected after a round
of mutation and selection were recycled back to the dark zone for further rounds of mutation
and selection. Oprea & Perelson (78) predicted that a significant fraction of these cells would
Access provided by University of California - San Diego on 04/26/17. For personal use only.

need to be recycled in every round in order for the results of their calculations to recapitulate the
experimentally observed increases in affinity and acquired mutations. This remained a prediction
for well over a decade until experiments imaging germinal center reactions in live mice became
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

possible, resulting in a decisive positive test. Perelson’s seminal work and the experiments in the
Nussenzweig lab (73, 79–81), exemplify iterative loop 2 in Figure 1. Since Perelson’s studies, many
computational scientists have studied diverse aspects of affinity maturation using coarse-grained
and very detailed computational models, and much has been learned from these studies (82–86).
More recently, a practical challenge has brought a less-studied fundamental aspect of affinity
maturation to the forefront of research. It has proven difficult to successfully vaccinate against
highly mutable pathogens, such as HIV, or those that present themselves in different guises, like
malaria (87). There is no universal vaccine against influenza either. Recently, immunologists have
isolated antibodies from HIV-infected individuals that, at least in vitro, can potently neutralize
diverse viral strains (88). These broadly neutralizing antibodies (bnAbs) are usually produced in low
titers and several years after infection. However, they provide proof that the human immune system
can evolve such antibodies and raise the possibility of designing potent vaccination strategies that
could induce bnAbs effectively in diverse humans. bnAbs against influenza have also been identified
and studied (89).
To illustrate the challenge associated with inducing bnAbs efficiently by vaccination, consider
the structure of the viral spike of HIV (Figure 4). It contains some conserved residues, such as
the CD4 binding site, but they are often surrounded by highly variable residues and protected by
a shield of glycans (90, 91). bnAbs usually contain conserved residues as part of the epitope that
they target. It seems evident that the induction of bnAbs will require immunization with multiple
variant antigens that share a set of conserved residues but differ from each other in the variable
residues that surround them. This is because immunization with a single strain of the antigen
would likely result in strain-specific antibodies. Evidence for the need for immunization with
variant antigens is also suggested by studies elucidating how bnAbs may have evolved in individual
patients by studying historical samples and sequencing viral strains and antibodies as a function
of time (92–95). Viral diversification seems to precede the development of bnAbs.
The need to immunize with multiple variant antigens raises many questions, including, What
should be the variant antigens? How many? In what concentrations and temporal order should they
be administered? The answers to these questions are drawn from a large space of possibilities, and
it may be difficult to find the conditions that can efficiently induce affinity maturation to evolve
bnAbs by empirically sampling this space via experiments with nonhuman primates. However,
enormous resources and talent are being devoted to this endeavor, and so empirical efforts may
find a protocol that induces bnAbs upon vaccination. It is difficult to see how one might optimize
this protocol even in this event or develop generalized strategies for other pathogens. Mechanistic

www.annualreviews.org • Computational Models 413

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

ycans
V2/glycans Shielding residues
140s)
(PG9, PG16, CH01–04, PGT140s)
Glycans
ycans
(2G12)
2G12)
V3/V4/glycans
CD4bs
(PGT120s, PGT130s)
V3/CD4i
(3BC176)

CD4bs gp120 ccore


(b12, VRC01, NIH 45–46)
Access provided by University of California - San Diego on 04/26/17. For personal use only.

MPER gp41
(2F5, 4E10, 10E8)
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Viral membrane

Figure 4
A schematic depiction of the spike on the surface of HIV particles (courtesy of Dennis Burton). It contains
some conserved residues, such as the CD4 binding site ( yellow), but these are often surrounded by highly
variable residues (red ) and protected by a shield of glycans (blue).

knowledge of affinity maturation can guide answers to the questions noted above. Unfortunately,
how affinity maturation occurs in the presence of multiple variant antigens is not well understood
because, until recently, affinity maturation has been studied in the context of single-model antigens.
Shedding light on the fundamental problem of how affinity maturation ensues upon immunization
with multiple variant antigens is a frontier for complementary experimental and computational
research.
Recently, four pertinent computational studies have been published (96–99). At least two of
these were motivated by the problem of inducing bnAbs against HIV. Wang et al. (97) developed
a stochastic rule–based model. Rules regarding the steps involved in affinity maturation were
derived from experimental studies of mice immunized with single-model antigens and provided as
a set of instructions to the computer. The computer then executed these instructions in a setting
where there were multiple variant antigens, and complex phenomena emerged because of the
coupling between the rules. Processes like somatic hypermutation of B cell receptors and antigen
internalization upon interaction with antigens displayed on follicular DCs (FDCs) were treated as
stochastic events. The probability of a B cell’s ability to bind to and internalize antigen is higher
if its receptor has a higher affinity for the antigen displayed on FDCs. Similarly, a B cell competes
for T cell help with other B cells that have internalized antigen in a stochastic fashion. Wang et al.
(97) did not explicitly treat the process of migration of germinal center B cells from the dark zone
of the germinal center to the light zone, and vice versa.

414 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

The model for the epitope-paratope affinities employed by Wang et al. (97) was inspired by
antibodies that may target epitopes containing the conserved residues of the CD4 binding site.
The CD4 binding site of the HIV viral spike is surrounded by variable residues and glycans
(Figure 4). The footprint of a B cell receptor that can target such epitopes may include both
conserved and variable residues in the same epitope. bnAbs specific for the CD4 binding site often
have narrow footprints that principally target the conserved residues. Such antibodies may have
evolved a relatively large number of mutations (compared to germline B cells) to shrink their
footprint on the antigen to focus interactions with the conserved residues. Wang et al. studied a
situation where they assumed that germline B cells that target an epitope containing conserved
residues have already been activated, perhaps by using model antigens (100–103). Thus, they
considered the situation where the epitope on a complex antigen (like a mimic of the trimeric
HIV spike) targeted by germinal center B cells can include both conserved and variable residues.
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Their model for epitope-paratope affinities was coarse grained in that atomistic details were not
included, but it did caricature some features of such an antigen.
Wang et al. (97) studied what happens when one immunizes with variant antigens that share a set
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

of conserved residues but their variable residues are distinct because of nonoverlapping mutations.
The antigenic distance between the three variant antigens they studied was relatively large—11
mutations. The most important mechanistic idea emerging from their computer simulations is
that, under certain conditions, the multiple-variant immunogens can present conflicting selection
forces that can frustrate the Darwinian evolutionary process of affinity maturation.
This idea is made vivid by considering immunization with a cocktail of the three variant
antigens. It is not clear whether the spatial distribution of variant antigens on the FDCs will be
sufficiently homogeneous to allow all types of antigens to be present in each B cell–FDC synapse
or whether the heterogeneity of the distribution results in each B cell–FDC interaction being
restricted to only a fraction of the variant antigens. Computer simulations showed that which of
these scenarios is true makes a difference to the outcome of affinity maturation, thus highlighting
the need for experimental interrogation of this issue.
To clarify the concept of frustration, consider first the situation where each B cell–FDC in-
teraction involves only one of the different types of variant antigens (picked randomly). This may
correspond to the situation where antigen concentration is relatively low. The affinity of the BCRs
for antigens is relatively low during the early stages of affinity maturation. A B cell that is pos-
itively selected in one round of selection is thus unlikely to have developed strong interactions
with the conserved residues of the epitope. After the next round of mutation, if it now encounters
a different variant antigen, it is therefore unlikely to exhibit a sufficiently strong interaction to be
positively selected and will likely die by apoptosis. Thus, the different variant antigens can act like
conflicting selection forces that can frustrate affinity maturation. This is the underlying reason
why Wang et al. (97) reported that the probability of obtaining cross-reactive antibodies upon
immunization with a cocktail of variant antigens separated by large mutational distances is not
high and the antibody titers are low (Figure 5). Of course, this frustration can be alleviated if one
encounters every type of variant antigen in the synapse during each B cell–FDC interaction, or
if during multiple attempts to internalize antigen before apoptosis a B cell encounters the same
type of antigen in successive rounds. But then, strain-specific antibodies are likely to result.
For the variant antigens considered by Wang et al., sequential immunization with the same
variant antigens that led to frustration upon immunization with a cocktail was predicted to enhance
the probability of evolving cross-reactive antibodies (Figure 5). Model experiments in mice tested
this prediction and showed it to be correct (97). Mice immunized sequentially with variant antigens
that share conserved residues resulted in more cross-reactive antibodies compared to immunization
with a cocktail of the same antigens. Various assays also indicated that the antibodies produced by

www.annualreviews.org • Computational Models 415

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

a b
0.25
0.3
0.20
Probability

0.15 0.2

0.10
0.1
0.05

0 0
0.01 0.16 0.31 0.46 0.61 0.76 0.91 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
Breadth
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Figure 5
(a) Simulation results of Wang et al. (97) for the probability of obtaining antibodies of different fractional breadths of coverage upon
immunization with a cocktail of three variant antigens separated by relatively large mutational distances. Complete coverage of a large
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

panel of viral strains corresponds to a fractional breadth of 1. In this case, broadly neutralizing antibodies are predicted to arise very
rarely. (b) Simulation results for the probability of obtaining antibodies of different fractional breadths of coverage upon sequential
immunization with the same three variant antigens as in panel a. In this case, broadly neutralizing antibodies are predicted to emerge
with higher probability.

sequential immunization were more focused on targeting the conserved CD4 binding site residues.
This is an example of mechanistic modeling resulting in a new hypothesis that can be tested in
model experiments (iterative loop 2 in Figure 1). The importance of sequential immunization for
generating bnAbs against HIV has recently been further emphasized by Nussenzweig, Schief and
coworkers (104).
Prior to considering subsequent computational studies motivated by the challenge of inducing
bnAbs against HIV, I briefly discuss an interesting study on malaria (96). The membrane antigen
of malaria is composed of a few conserved epitopes and a cluster of variable epitopes. Chaudhury
et al. (96) studied immunization with multiple strains of malaria motivated by the observation
that immunization with a few strains of malaria led to a cross-reactive antibody response (105),
but the increase in cross-reactivity could not be explained by enhanced targeting of the conserved
epitopes. They developed a model for affinity maturation where the individual events were treated
as chemical reactions (e.g., B cell binding to antigen displayed on FDCs), and the resulting math-
ematical equations were solved accounting for stochastic effects using the same approach that is
used frequently to study signaling (see the section titled Lymphocyte Signaling). Many of the
parameters used in the calculations were derived from experiments. Chaudhury et al. considered
each immunizing antigen to be composed of two distinct epitopes: one purely conserved and the
other variable. The variable epitope was different for different antigens. They found that antibod-
ies specific for the conserved epitope, and therefore cross-reactive, did develop. This is because
the B cells specific for the conserved epitopes always can find the purely conserved epitope during
the selection step, and therefore their continued evolution is not frustrated. Importantly, this is
because the conserved and variable epitopes are distinct, which may be realistic for malaria, but as
noted above it is not likely to be the case for B cells that evolve into bnAbs against HIV. Another
interesting result emerging from the work done by Chaudhury et al. is that antibodies that are more
cross-reactive to the different variable epitopes are also generated upon immunizing with multiple
antigens. This explains the experimental observation wherein the enhanced cross-reactivity could
not be rationalized based on increased targeting of the conserved epitopes. This is an example of
how computational studies with mechanistic models can help elucidate the origins of experimental
observations. It would be interesting to study whether the enhanced cross-reactivity to variable

416 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

residues observed by Chaudhury et al. would also occur if the numbers of variable epitopes on
each antigen were not one, but a more realistic number.
Luo & Perelson (98) used a similar model for the epitopes on each antigen as Chaudhury et al.,
but in the context of HIV—i.e., the epitope is either completely conserved or completely variable.
B cells target either the conserved epitope or the variable epitopes. They studied immunization
with cocktails of such antigens that were separated by variable degrees of dissimilarity between
the variable epitopes. The model for epitope-paratope interactions was different compared to that
by Chaudhury et al., but otherwise the situation studied by Luo & Perelson is very similar. Luo
& Perelson considered broadly cross-reactive antibodies to be those that targeted the conserved
epitope. They reported that, upon immunization with a cocktail of variant antigens, the probability
of obtaining cross-reactive antibodies increases if the mutational distances separating the variable
epitopes and the number of variable epitopes are larger. This result is most likely due to the
Access provided by University of California - San Diego on 04/26/17. For personal use only.

fact that Luo & Perelson (98) assume that there is one perfectly conserved epitope shared by all
antigens and a purely variable epitope that is distinct for each antigen, and B cells target either
the conserved epitope or a variable epitope. The structure of the HIV spike (90, 91) indicates
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

that the conserved residues, such as those that make up the CD4 binding site, are surrounded by
highly variable residues and protective glycans, which can be part of an epitope that also contains
the conserved residues [as is the case in the study by Wang et al. (97)]. Childs et al. (99) have also
studied a problem pertinent to the development of cross-reactive antibodies using the NK model,
which was first used by Deem & Lee (106) to study affinity maturation. Childs et al. (99) predict
that antibodies with lower breadth result when there are multiple antigenic sites.
While recent efforts have revealed some interesting concepts and suggested that sequential
immunization with variant antigens separated from each other by relatively large mutational
distances may be more successful at inducing bnAbs against HIV than a cocktail of the same
antigens, many key questions remain. These questions include (a) What should be the variant
antigens that are used as immunogens in a real vaccine against HIV or influenza? (b) What is
an optimal vaccination protocol? Of the many permutations, only a few scenarios have been
studied. One could imagine that the optimal vaccination protocol might change if the mutational
distance between the variant antigens and their concentrations were manipulated. (c) What are
the principles that describe how the degree of frustration is influenced by the pertinent variables,
and how should we manipulate conditions so that evolutionary trajectories that result in bnAbs
become highly probable? In particular, how do we set the conditions such that the degree of
frustration is not so high as to frustrate affinity maturation or so low as to result in strain-specific
antibodies? To answer these questions, a detailed multi-scale model of affinity maturation induced
by multiple variant antigens needs to be developed. Elements of this work have already been done
for single antigens in the past (107), but this now needs to be combined with work along the lines of
References 96 and 97 as well as a full atomistic description for computing affinities between BCRs
and antigens similar to the SOSIP construct that is considered to mimic the HIV spike (108, 109).
Such a research program could result in the discovery of new principles and in specific predictions
of potentially successful vaccination strategies and immunogens that could then be advanced to
tests in nonhuman primate models. For this goal to be achieved, close collaboration between
experimental and computational immunologists, structural biologists, and protein engineers will
be necessary.

CHARACTERIZING THE T CELL REPERTOIRE


The diversity of T and B cell repertoires is central to our ability to combat pathogens and cancer.
Major advances have been made in our understanding of how lymphocyte repertoires are generated

www.annualreviews.org • Computational Models 417

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

(110). How thymic development shapes the mature T cell repertoire has also been extensively
studied (111–114). The diversity of the repertoire required to successfully combat pathogens, and
the intriguing property that T cells exhibit of exquisite specificity in recognizing specific peptides
while also being able to respond to more than one peptide, has also been investigated (115). Today,
we can also sequence TCRs in plasma samples derived from humans and mice and this can, along
with the use of inference procedures, be extraordinarily helpful in characterizing the repertoire.
The purpose here is not to comprehensively review recent advances in our understanding of thymic
development or characterization of the T cell repertoire; rather, as in the other sections, I aim
to discuss illustrations of how complementary computational and experimental research has shed
light on these issues. The first example illustrates a hybrid of iterative loops 2 and 3 in Figure 1,
and the second exemplifies iterative loop 3 but opens up the possibility of work that represents
iterative loop 4.
Access provided by University of California - San Diego on 04/26/17. For personal use only.

We begin with a study by Huseby et al. of how the peptide specificity of mature T cells is altered
by limiting negative selection (116, 117). They compared three types of mice, but here we focus on
only two: (a) wild-type C57BL/6 mice in which the combination of IAb and numerous mouse self-
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

peptides leads to normal negative selection of thymocytes and (b) mice genetically manipulated to
express IAb bearing just a single peptide (IAb-SP mice) (118). In this case, negative selection is
severely limited because there is only one IAb + peptide combination present in the thymus. The
cross-reactivity of T cells derived from these mice against a panel of pMHC molecules was then
characterized. If a T cell recognized a particular peptide, each of its amino acids was mutated to the
other 19 possibilities, one at a time, and whether or not the T cell still recognized the pMHC was
determined. Thus, the cross-reactivity of the repertoire to point mutations was determined. The
results showed that T cells derived from the IAb-SP mice were more cross-reactive to products
of point mutations of the peptides that they recognized compared to the repertoire of C57BL/6
mice. This observation suggested that defective negative selection makes the T cell repertoire
statistically more cross-reactive to peptide antigens.
Kosmrlj and coworkers (119) set out to develop a mechanistic understanding of how negative
selection might tune the peptide specificity of the T cell repertoire by carrying out both physics-
based theory and computer simulations. They employed a coarse-grained model of TCR-pMHC
interactions. Each TCR comprised one part that represented CDR residues that interacted with
the MHC residues and another part that interacted with the peptide residues. The latter interaction
was treated with a string model, similar to that employed by Chao et al. (120), wherein individual
amino acids on the peptide contact residues of the TCR could interact with the amino acids
of the peptide. The interaction free energies between amino acids was represented by an oft-
used potential function for biophysical studies. While this model for TCR-pMHC interactions
is clearly very crude, Kosmrlj et al. demonstrated that their qualitative results were independent
of the particular choice of the form of TCR-pMHC interaction free energies. In their model,
given a number of displayed pMHC molecules, thymocytes were positively selected if they were
able to interact with at least one pMHC with a binding free energy that exceeded a threshold
value similar to the positive selection limit identified by Palmer and coworkers (12) using the OT1
system. Thymocytes would be negatively selected if they interacted with any of the encountered
self-pMHC molecules with a free energy exceeding a somewhat higher threshold [again as per
the OT1 system (12)]. The resulting mature T cells were screened against a panel of randomly
generated peptides [proxy for the panel of peptides used by Huseby et al. (117, 121) in their
assays], and the cross-reactivity of each T cell was obtained just as in the experiments. Kosmrlj
et al. simulated the effects of varying the number of types of peptides presented in the thymus
across a range of values, rather than only one and the usual number studied by Huseby et al. A
computational model, once formulated, can more readily be used to study phenomena across a

418 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

larger range of the manipulated variable compared to experiments. Such studies can often yield
mechanistic insights (see below) that can lead to new experiments.
The principal findings reported by Kosmrlj et al. (119) can be summarized as follows.
(a) Thymocytes that were selected against fewer types of self-peptides in the thymus matured
into T cells that were more cross-reactive to point mutations of the peptides that they recognize,
and were also more likely to be self-reactive. (b) As the number of types of peptides displayed
in the thymus increases, the peptide contact residues of the mature T cell repertoire exhibit an
attenuation in the frequency of occurrence of amino acids that tend to bind most other types of
amino acids more strongly. Because of the coarse-grained nature of the way that Kosmrlj et al.
treated interaction free energies between TCR and pMHC molecules, they could not do more
than speculate on what specific biochemical properties might exemplify strongly interacting amino
acids.
Access provided by University of California - San Diego on 04/26/17. For personal use only.

The theory and computations carried out by Kosmrlj et al. suggested a mechanistic reason
for their results (119, 122, 123). If thymocytes are selected against a single type of peptide in the
thymus, those with TCRs that interact with this single peptide (and associated MHC molecule)
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

with an interaction free energy between the positive and negative selection thresholds can success-
fully mature. Several types of TCR sequences can achieve this. For example, TCRs with peptide
contact residues composed of one or two amino acids that tend to interact strongly with other
amino acids have a reasonable probability of successfully maturing, as they can bind to the sin-
gle type of pMHC with the requisite interaction free energy; whether or not they interact with
any other self-pMHC with a free energy that exceeds the negative selection threshold is irrele-
vant. In the periphery, T cells with such TCRs would likely derive their interaction free energy
with antigenic peptides they recognize via only a few important contacts involving these amino
acids.
As the number of selecting peptides in the thymus increases, thymocytes encounter a larger
number of self-peptides during development. Let M denote the number of types of self-peptides
that a thymocyte encounters in the thymus. Of the M interaction free energies between its TCR
and these self-peptides, the strongest one must lie between the positive and negative selection
thresholds for successful positive selection and avoidance of negative selection. This condition
ensures that it will be positively selected by at least one self-pMHC and negatively selected by
none. As the number (M) of selecting self-pMHC increases, it becomes increasingly difficult for
thymocytes bearing TCRs with peptide contact residues composed of strongly interacting amino
acids to survive negative selection. This is because thymocytes with such TCRs are likely to
encounter at least one self-peptide (of many) with which the interaction free energy exceeds the
negative selection threshold. Thus, increasing the number of types of selecting pMHC molecules
in the thymus results in a mature T cell repertoire with TCRs that contain peptide contact
residues that exhibit lower frequencies of occurrence of amino acids that typically interact with
other amino acids strongly (compared to the preselection repertoire). Mature T cells with TCRs
whose peptide contact residues are likely to interact moderately with other amino acids are more
likely to recognize antigenic peptides in the periphery via a number of moderate-strength bonds,
with each contact making a significant contribution to the overall binding free energy. Thus,
the recognition of pMHC molecules by such T cells was more likely to be abrogated by most
single point mutations to the peptide. Therefore, the T cell repertoire in IAb mice was less cross-
reactive than that in IAb-SP mice because in the latter case the interaction free energy is likely to be
dominated by a few important contacts, and only mutations to the peptide at these residues would
abrogate recognition. Indeed, Huseby et al. (117) reported that, for the few cases they studied in
detail using calorimetry, TCRs in mature T cells that developed in IAb-SP mice exhibited only a

www.annualreviews.org • Computational Models 419

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

few strong contacts with the peptides they recognized, whereas those that developed in C57BL/6
mice exhibited a number of moderate-strength contacts.
Theoretical analyses and calculations by Kosmrlj et al. suggested that a key to explaining the
observations by Huseby et al. was the lower frequency (compared to the preselection repertoire)
of strongly interacting amino acids in the peptide contact residues of the TCRs when negative
selection was efficient. Experiments motivated computation and theory that led to this hypothesis,
which now needed to be tested to complete iterative loop 2 in Figure 1. The hypothesis could only
be tested by high throughput sequencing of T cell repertoires since the mechanistic explanation
relied on statistical arguments pertinent to the entire repertoire, not every T cell in the repertoire.
Such a study would exemplify a research paradigm that is a hybrid of iterative loops 2 and 3 in
Figure 1.
Stadinski et al. (124) sequenced the TCR β chains of T cell repertoires in diverse mouse
Access provided by University of California - San Diego on 04/26/17. For personal use only.

models. By studying 43 structures of mouse and human TCRs in complex with pMHC molecules
they established that two TCR residues, P6 and P7, always contacted the pMHC. Thus, they
focused on characterizing the frequency of amino acids at this doublet of residues in different T
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

cell repertoires. Their principal findings pertinent to the theoretical predictions noted above are
as follows. The self-reactivity of preselection TCRs of diverse Vβ families is strongly correlated
with the hydrophobicity of the amino acids at the P6-P7 doublet, with more hydrophobic residues
corresponding to greater self-reactivity. Thus, they connected the strongly interacting amino
acids in the theoretical predictions to hydrophobicity. This correspondence is sensible because
hydrophobic residues are likely to enhance TCR-pMHC interactions given that the formation of
such interfaces reduces their exposure to water. Using MHC-deficient (β2m−/− H2− Ab−/− ) mice
they isolated and sequenced preselection thymocytes and compared them to postselection MHC-
positive mice. Careful statistical analyses showed an enrichment of moderately hydrophobic, but
not strongly hydrophobic, residues at the P6-P7 doublet for postselection TCRs (Figure 6a,b).
Sequencing the repertoire of genetically identical mice that have negative selection defects because
they are deficient in the proapoptotic factor, BIM, showed a significant enrichment for the most
hydrophobic residues at the P6-P7 doublet (Figure 6c,d ). These results are in harmony with the
theoretical prediction that negative selection will tend to eliminate thymocytes with TCRs that
contain strongly interacting residues in their peptide contact residues, and the combination of
positive and negative selection will result in mature T cells with moderately interacting amino
acids.
Stadinski et al. (124) also sequenced the T cell repertoires of NOD mice, which are prone to
diabetes, and found that their CD4+ T cells were enriched in strongly hydrophobic residues at the
P6-P7 doublet, but no such enrichment was observed in the CD8 compartment. This is consistent
with an MHC class II molecule being implicated, and indeed the IAg MHC molecule has been
associated with autoimmunity in NOD mice (125). Stadinski et al. then made congenic mice with
B6 mice bearing the IAg MHC and NOD mice bearing the H-2b MHC instead of IAg. The CD4
T cell repertoire of the latter mice showed no enrichment of hydrophobic residues at the P6-P7
doublet, whereas the repertoire of the congenic B6 mice did show enrichment. This suggested that
T cells bearing TCR restricted by the IAg molecule are more likely to bear strongly interacting
residues in their peptide contact residues thus making them more prone to autoimmunity. The
IAg molecule has been shown to bind peptides in an unstable fashion (125), thus suggesting that,
consistent with the theoretical prediction, selection against a smaller diversity of self-peptides
in the thymus may not effectively suppress the frequency of strongly interacting residues at the
peptide contact residues of the mature T cell repertoire.
The theoretical prediction that the same effect would make such a T cell repertoire statistically
more cross-reactive to mutants of cognate pMHC ligands was, however, not tested. Indirect

420 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

a b
(log2) CD4+ T cells/preselection

(log2) CD8+ T cells/preselection


P6-P7 doublet enrichment

P6-P7 doublet enrichment


2 2

1 1

0 0

–1 –1

–2 –2

–3 –3

High Low High Low


Access provided by University of California - San Diego on 04/26/17. For personal use only.

c d
(log2) Bim–/–/Bim+/+ CD8+ T cells

(log2) Bim–/–/Bim+/+ CD4+ T cells


P6-P7 doublet enrichment

P6-P7 doublet enrichment


Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

3 3
2 2
1 1
0 0
–1 –1
–2 –2 Promotes
–3 –3 Neutral
Limits
High Low High Low
Self-reactivity Self-reactivity

Figure 6
(a) Enrichment of moderately hydrophobic, but not strongly or weakly hydrophobic, residues in the P6-P7
doublet of the TCRs’ peptide-MHC contact residues in the CD4+ T cell repertoire compared to the
preselection repertoire. More strongly hydrophobic residues result in a higher degree of self-reactivity.
(b) Enrichment of moderately hydrophobic, but not strongly or weakly hydrophobic, residues in the P6-P7
doublet of the TCRs’ peptide-MHC contact residues in the CD8+ T cell repertoire compared to the
preselection repertoire. (c,d ) Enrichment of strongly hydrophobic residues in the P6-P7 doublet of the
TCRs’ peptide-MHC contact residues in the CD8+ T cell repertoire and CD4+ T cell repertoire in mice
deficient in the apoptotic factor, BIM (i.e., Bim−/− mice). These results are adapted from Stadinski et al.
(124) and are consistent with theoretical predictions.

evidence for this is provided by a combination of theoretical analyses and patient data (126).
Algorithms that predict peptide binding to MHC class I molecules (e.g., 127, 128) suggested that
the HLA B57 molecule binds to fewer human peptides. Extending their past work, Kosmrlj et al.
reported results that suggested that individuals with the HLA B57 MHC were more likely to be
present in cohorts of elite controllers of HIV not only because of the dominant effect of HLA
B57 presenting peptides from mutationally vulnerable regions of the virus, but also because their
T cell repertoires are more likely to be cross-reactive to mutations that the virus may evolve in
these peptides. Enhanced protection against hepatitis C virus, another highly mutable virus that
results in a chronic infection, has also now been reported for B57+ individuals (129). The study
by Kosmrlj et al. is also consistent with the fact that individuals with the HLA B57 allele are more
prone to T cell–mediated autoimmune diseases.
T cell repertoires in individual humans can now be sequenced in great depth and characterized
via computational inference. For illustrative purposes, here we discuss only the work of Walczak,

www.annualreviews.org • Computational Models 421

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

Callan, and coworkers (130, 131). Walczak and coworkers first focused on sequences of the entire
CDR3 region of β chains of nonfunctional TCRs derived from CD4+ T cells in nine individuals.
These TCRs are nonfunctional because of a reading frame shift or an accidental stop codon. These
TCRs were, therefore, not subject to thymic selection. As T cells do not undergo affinity matura-
tion in response to infection, exposure to antigens will change clone sizes and result in differences
between the memory and naive compartments but not the overall distribution of TCR sequences.
In their studies, Murugan et al. (130) studied one sequence per clone. For the nonfunctional TCR
sequences, they developed a statistical inference procedure that allowed them to characterize the
preselection repertoire diversity and obtain quantitative estimates for the probabilities associated
with the VDJ recombination rules. The inference procedure was carried out for the T cell reper-
toire of each individual separately, and not much variation of the statistical rules was observed across
the nine donors. Thus, they obtained an inferred model for the probability of generating a preselec-
Access provided by University of California - San Diego on 04/26/17. For personal use only.

tion TCR. Then, Elhanati et al. (131) compared the sequences of the functional T cell repertoires in
each individual with the model for preselection repertoires and developed a maximum-likelihood-
based inference procedure to characterize the statistics of how selection influences TCR sequences.
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Interesting results are reported by Elhanati et al. (131), including those pertinent to the pos-
sible origin of public TCRs and how the VDJ recombination procedure and the thymic selection
machinery coevolved such that CDR3 regions that were more likely to be generated were also
more likely to be selected. The main result that I focus on here is their observation that selection
results in significant suppression or enhancement of particular amino acids at various positions in
the CDR3 region, especially away from the boundaries of the entire CDR3 region. When they
tried to correlate the inferred selection-mediated suppression or enhancement in the frequencies
of amino acids in the entire CDR3 region with biochemical properties, they found only a strong
negative correlation with amino acid volume. The lack of correlation with the hydrophobicity may
at first glance appear to contradict the studies with mouse models by Stadinski et al. (124). But the
correlation with hydrophobicity should be observed only in the peptide contact residues of the
TCRs according to the mechanistic models proposed by Kosmrlj et al., and thus to compare with
data (119, 124), the focus was on studying these residues. An excellent future study would be to
determine the structures of some of the human TCRs studied by Walczak, Callan and coworkers
interacting with cognate pMHC. One could then correlate the inferred suppression or enhance-
ment in the frequencies of amino acids at the peptide contact residues to biochemical properties.
This would exemplify a study along the lines of iterative loop 4 in Figure 1, as high-throughput
sequencing data would have been subjected to computation to infer patterns, mechanistic models
would have provided a hypothesis positively tested in mouse models, and this would now prompt
experiments with human cells.

IMMUNE SYSTEM MONITORING


Immune responses result from collective processes involving many interacting components. In
recent years, there has been much interest in characterizing the immune system as a whole, rather
than individual components. One way to do so is to profile many immune markers in human
cohorts (e.g., 132–137). The immune systems of different individuals are likely to differ because,
in addition to heritable traits, the adaptive immune system is altered by exposure to pathogens,
age, diet, and a myriad other factors during an individual’s life. Indeed, cell compositions and
other immune markers have been observed to be different across individuals (132–134). Yet, most
humans can combat the same pathogens effectively and stave off many cancerous cells efficiently.
This may imply that the different components of the immune system interact in somewhat different
ways in each individual in order to achieve similar systemic responses. However, it is also true

422 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

that different humans respond differently to the same stimulus (e.g., a vaccine) (138), suggesting
that their immune systems are functionally different. How do we characterize the similarities
and differences between the immune systems of humans across a population, and longitudinally
in time in a single individual, and understand the mechanistic origins of how different humans
mount similar and different responses? Answering these difficult questions will enable us to learn
how different parts of the immune system interact, and how to harness this knowledge to combat
or prevent infectious and autoimmune diseases, cancers, etc. It will also teach us when and how
personalized intervention is necessary, rather than a one-size-fits-all remedy.
A number of technologies have now become available to measure different immune system
parameters in individual units in a population. One example is measurements of diverse cell
surface markers in individual T cells in a population of cells derived from a human or mouse.
Another example may be measurements of multiple cell compositions and cytokine expression
Access provided by University of California - San Diego on 04/26/17. For personal use only.

levels in individuals in a cohort of humans. The following are just some examples of techniques
that enable such measurements: flow cytometry (139), which can measure up to 30 markers (but
usually measures 15 or less); mass cytometry (140) using the CyTOF (Helios TM, Fluidigm)
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

instrument, which can measure up to 45 parameters; single-cell gene expression (141); high-
dimensional protein serum analyses (142); and RNA seq. An important question is how to analyze
and visualize the high-dimensional data produced by these instruments and techniques—i.e.,
how do we infer patterns revealed by such data? Flow cytometry data have traditionally been
visualized and analyzed by making biaxial graphs. For high-dimensional data, making such graphs
and analyzing them can be tedious. For example, if one measures 35 variables, there are 595
possible unique biaxial projections. Moreover, such biaxial projections can be misleading since
two-dimensional projections of the data might hide populations of cells that are distinct only in
higher dimensions.
Several approaches are available to group units in to similar clusters. For example, cells could
be grouped into clusters that exhibit similar protein expression. The spanning-tree progression
analysis of density-normalized events (SPADE) algorithm (143, 144) has been applied to mass
cytometry data to define such clusters of cells and visualize the hierarchy of phenotypes on a tree
structure. Such clustering approaches result in loss of single-cell resolution and require that the
number of clusters (usually unknown for novel data) be specified a priori.
Another approach is to determine lower-dimensional projections of the data that reflect the
most important collective coordinates that best describe variation of the data across individual
units in a population. Each collective coordinate is a specific combination of the measured vari-
ables. A number of different inference methods have been used to obtain and visualize such lower-
dimensional representations of the data without loss of single-cell resolution (145). The simplest is
principal component analysis (PCA), which is routinely used (146). Each principal component is a
collective coordinate comprising a linear combination of the measured variables and is orthogonal
to the other such coordinates. The top few principal components often describe most of the vari-
ation in the data, and so projecting multidimensional measurements into the lower-dimensional
space spanned by the top few (two or three) principal components can facilitate interpretation and
visualization. For example, Newell et al. (147) used PCAs to analyze 25-parameter mass cytometry
measurements of a population of human CD8+ T cells, and the top three PCAs revealed a much
richer distribution of phenotypes than was known.
PCA finds the optimal linear combinations of the measured variables, and this is a potentially
serious limitation because the underlying manifold that best describes variation in the data may not
be linear. Two methods (148, 149), based on the same previously developed inference algorithm
that does not use a linear approximation [t-distributed stochastic neighbor embedding, or t-SNE
(150)] have recently been developed and applied to mass cytometry data. The t-SNE algorithm

www.annualreviews.org • Computational Models 423

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

obtains a two-dimensional projection of multivariate data in such a way that the “distance” between
units (e.g., cells) in the high-dimensional space is preserved in the lower-dimensional space. Amir
et al. (149) showed that leukemic and normal bone marrow cells could be phenotypically related
using t-SNE. Shekhar et al. (148) combined t-SNE with density-based partitioning to develop a
tool called ACCENSE and demonstrated its utility by uncovering a novel CD8+ T cell pheno-
type that gets hidden among naive, central, and effector memory cells when projected onto the
usual CD62L-CD44 two-dimensional projection. The t-SNE-based approaches are being used
extensively in diverse applications. Various other machine learning methods are also employed
to analyze high-dimensional data. A number of interesting observations have been described by
those studying human cohorts with a diversity of approaches, and here I highlight a small sampling
of them.
There is a large amount of interindividual variation in immunological parameters. These can
Access provided by University of California - San Diego on 04/26/17. For personal use only.

be influenced by both heritable genetic factors and environmental forces. Brodin et al. (132)
conducted a twin study in which they measured various immune parameters, spanning cell subtype
proportions, serum protein levels, and cytokine stimulation responses. Their results suggest that
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

environmental factors play an important role in shaping our immune systems (132). That is not to
say that heritable influences are not important. An example of such an effect has been identified
for immune cell populations; data from a cohort of dizygotic twins younger than those studied by
Brodin et al. (132) suggest that CD4/CD8 ratio may be genetically linked to CD5/CD6 genes in
younger twins (151). An additional twin study revealed that various markers of innate immunity
are also strongly influenced by genetics (152).
Immune system parameters appear to be longitudinally stable over for weeks to months, aside
from large variations that accompany events such as acute infection. But even after such events,
the immune markers, when viewed statistically, seem to return to baseline. Yet, the age of an
individual seems to play a role in shaping the immune system (132, 133, 153). Taken together,
these data suggest that changes occur slowly over time and are not statistically discernable in a
time frame of weeks or months.
The importance of environmental effects including diet have been highlighted by Carr et al.
(133), who studied cohabiting couples and compared them with randomly chosen pairs of individ-
uals. Previous pathogenic exposure (e.g., human cytomegalovirus, or HCMV), age, microbiome
composition, and hormone balance have also been shown to influence the measured immune
system markers (154–157).
The variations in measured immunological parameters affect immune performance, specifically
shown for strength of vaccination responses (134, 158). The identification of both genetic and
environmentally shaped baseline immune markers as predictors of response to vaccination suggests
that desired responses could be induced in diverse individuals by manipulating immune parameters
(perhaps by rational design of adjuvants). Although inference methods applied to high-dimensional
data have led to several important observations, in order to manipulate the immune system so as
to maintain health and prevent and cure disease the next step must be to obtain a mechanistic
understanding of the origins of these observations. Thus, a paradigm of research similar to iterative
loop 4 in Figure 1 is expected to emerge soon in this area of immunological research.

CHARACTERIZING THE IMMUNOLOGICAL VULNERABILITIES OF


HIGHLY MUTABLE PATHOGENS FOR RATIONAL VACCINE DESIGN
Vaccination has saved more lives and money than any other medical procedure. Successful vac-
cination programs have eradicated smallpox, which had plagued humanity since antiquity, and
nearly eradicated polio. Vaccines for many childhood diseases have contributed to the reduction

424 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

of infant mortality. The traditional empirical paradigm of vaccine development, which was pi-
oneered by Jenner and Pasteur in modern times, has not been successful against a number of
pathogens that are wreaking havoc (e.g., HIV, hepatitis C virus, tuberculosis, and malaria). A
universal vaccine against diverse influenza strains is also not available. Many pathogens that have
been difficult to vaccinate against share the following features: (a) They are pathogenic and viable
in different guises, thus making them hard to target with specific immune memory responses.
(b) They often degrade, or hide from, the immune system. A prominent example is HIV, which is
a highly mutable virus with a high replication rate that primarily infects CD4 T cells.
Protecting against such pathogens requires rational design of vaccines based on fundamen-
tal scientific principles (159), rather than empiricism. I considered one aspect of this challenge
when discussing work pertinent to the induction of broadly neutralizing antibodies. Here, I prin-
cipally focus on how the immunologically vulnerable targets in a highly mutable pathogen can be
Access provided by University of California - San Diego on 04/26/17. For personal use only.

identified and how this knowledge may be harnessed for practical ends. In particular, I illustrate
how complementary computational, experimental, and clinical studies are shedding light on these
questions. I note that considerable efforts have been devoted to studying coarse-grained models of
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

virus dynamics in individuals and important advances have thus been made, as has been discussed
in other reviews (160–163). I do not describe these studies.
The work that I do describe is an illustration of iterative loop 4 in Figure 1 and focuses on
HIV as an exemplar of a highly mutable pathogen. Because of its high mutability and replication
rate, HIV can mutate to evade vaccine-induced antibodies and T cell responses that specifically
target certain regions of the viral spike and other parts of the virus’s proteome, respectively.
For this reason, there has been interest in focusing a vaccine-induced T cell response to target
peptide epitopes containing residues that appear relatively conserved in sequences of the same
HIV protein derived from virus samples extracted from diverse patients. Such a response could
be effective because mutations that evade them would likely result in a mutant strain that is not
very fit in propagating infection (otherwise these residues would not be relatively conserved).
Thus, the virus would be trapped between being eliminated by the immune response and evolving
mutations that would hurt its function and ability to propagate infection. However, the virus
can evade this trap by evolving compensatory mutations that can partially restore the fitness cost
incurred by the immune-evading mutation (164–166). Determining the mutational vulnerabilities
of HIV therefore involves defining the compensatory pathways that the virus can sample to evade
host immunity as well as the combinations of mutations that are deleterious for the virus. Regions
of the proteome that are rife with compensatory mutations should not be targets of an effective
vaccine-induced immune response, whereas those that exhibit negative synergistic effects should.
Thus, what is required is knowledge of the intrinsic replicative fitness of viral strains as a function
of sequence with explicit accounting for the effects of epistatic coupling between mutations. By
intrinsic I mean a replicative fitness that is not confounded by the effects of the immune pressure
in individual hosts, so that it can be of use in considering vaccination strategies for a population
composed of people with diverse genotypes. Information regarding the fitness of viral strains
as a function of sequence traces out a fitness landscape (illustrated in Figure 7). The fitness
landscape reflects the effects of combinations of mutations on individual protein structures, the
ability to create multiprotein assemblies critical for viral function, etc. This information is often
neither easily available nor trivial to intuit just from examining structures of individual proteins.
However, this information is encoded in the thousands of sequences of HIV proteins derived from
virus samples extracted from diverse patients, although it may be confounded by human immune
responses. Several groups have embarked on developing computational approaches to try and
translate this data into knowledge of the HIV fitness landscape (167–170). For brevity, I focus
on one approach that has been used extensively in contexts other than virology and immunology

www.annualreviews.org • Computational Models 425

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

Fitness = f(Sequence)
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Fitness
Residue 2
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Figure 7
The fitness landscape of a cartoon virus with one protein composed of two residues (to enable illustration on
a printed page). The grid points on the two axes that define a plane represent mutant amino acids that could
arise in each of the two residues. Thus, any point in this plane is a mutant viral strain. Information about
the ability of a viral strain to replicate and propagate infection is shown in the vertical dimension. Information
about the fitness of viral strains as a function of their sequences traces out hills and valleys, thus defining the
fitness landscape. The fitness landscape of a real virus exists in a much-higher-dimensional space.

(such as neuroscience), and that seems to be predictive of in vitro and clinical data. I note that
the challenges that have to be confronted to translate sequences of influenza strains into a fitness
landscape are different, and I do not describe here important work that has been done recently in
this context (171–173).
Any sequence can be represented by a vector, z, as it is a list of amino acids at each residue.
The first step is to obtain the prevalence landscape of the virus. The prevalence landscape is
the probability, P(z), of observing the sequence z in circulation. The sequence data, available as
multiple-sequence alignments, contain this information. The sequence data also contain informa-
tion on the probability of observing single mutations at every residue, double mutations at every
pair of residues, triple mutations at every triplet of residues, and so on. Any mathematical model
for P(z) that can recapitulate these mutational probabilities is legitimate. Ferguson et al. (169)
and Barton et al. (174) sought to determine the least biased model for P(z) that recapitulates the
observed probabilities of single point mutations at each residue and double mutations at every pair
of residues. Exploiting the connections between information theory and statistical physics, least
biased probability distribution is interpreted to mean that which exhibits the maximum entropy
(175). Straightforward analysis reveals that the maximum entropy probability distribution for P(z)
that fits the observed single- and double-mutation probabilities is given by

e −H(z)  
P(z) = , H(z) = − h i zi − Ji j zi z j , 1.
Z i i< j

The parameters, hi and Jij , in Equation 1 are obtained by ensuring that the single- and double-
mutation probabilities predicted by P(z) match those observed in the sequence data. The algo-
rithms used to fit these parameters have been described (169, 174).

426 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

While it can be argued that the more prevalent strains should also be more fit (i.e., able to
propagate infection), this can be demonstrated rigorously only for simple situations (176). The
viral protein sequences used to infer the prevalence landscape were derived from patients whose
immune responses may have forced the virus to mutate to an intrinsically less fit strain in order to
evade these responses. Thus, an intrinsically less fit strain may be effectively more fit in a particular
patient, and hence more prevalent. Shekhar et al. (148) explored the connections between the
prevalence and fitness landscapes for HIV, given its biology and evolutionary history, employing a
combination of computer simulations and relatively sophisticated statistical mechanical analyses.
They concluded that the rank order of the prevalence of strains was expected to be statistically
similar to the rank order of fitness, provided that the mutational distance between two compared
strains was not very large.
The principal biological reasons underlying this conclusion are (a) HIV is a chronic infection,
Access provided by University of California - San Diego on 04/26/17. For personal use only.

and so its proteome is targeted extensively by patient T cell responses. Because of the great diversity
in human MHC genes, no region of the proteome is targeted by a large fraction of infected
individuals. (b) Clinical observations suggest that, if the mutation the virus makes in an individual
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

in order to evade immune responses incurs a fitness cost, and if this individual then infects another
whose immune system does not target the mutated epitope, then over time the mutation can revert
(177). (c) Unlike influenza, for example, the circulating population of HIV strains has not been
consistently subjected to effective classes of natural or vaccine-induced immune responses. Thus,
the HIV population has not evolved in narrowly directed ways to evade a few specific classes of
potent memory immune responses. One reflection of this is the difference between the characters
of the phylogenetic trees for influenza and HIV strains (178, 179). I note that rare reports suggest
that in some localized geographic regions the effects of certain HLA-restricted immune responses
may have had an effect on the evolutionary history of HIV (180).
Taken together, the reasons noted above are predicted to result in statistically similar fitness
and prevalence landscapes. The condition that the compared strains should not be separated by
large mutational distances arises from the fact that, even if two strains are equally fit, one is unlikely
to evolve from another in an infected person if the mutational distance separating them is large.
This is because the mutational probability per base pair per replication cycle is smaller than unity.
A recent report has suggested that this effect is somewhat ameliorated by recombination of viral
strains (181), which may facilitate viral evolution (182) and further supports the conclusions in
References 148, 169, 170, and 174.
The veracity of the arguments and analyses above can only be established by testing predictions
from the fitness model against in vitro laboratory and clinical data. The model can make predictions
for the intrinsic fitness of viral strains. These strains can then be made by mutagenesis, and their
ability to infect human target cells and propagate infection in vitro can be measured and compared
to a reference strain. Figure 8a shows one example of such a comparison for 43 HIV strains bearing
mutations in the Gag polyprotein (170). Although not perfect, with high statistical significance,
the comparison between model predictions and experiments is good. Encouraging predictions
result also for protease (168, 183, 184), Nef, and Env mutants ( J.P. Barton; unpublished data).
Two kinds of comparisons have been made with clinical data. Elite controllers are rare patients
who control the virus to low levels without therapy (185). They disproportionately target regions
of the viral proteome that the inferred fitness landscape predicts to be vulnerable to single or
multiple mutations (169, 186). The pattern of mutations observed in virus sequences derived from
elite controllers is also consistent with predictions of the fitness landscape. For example, multiple
mutations are not observed in targeted regions that are predicted to be especially vulnerable to
multiple mutations because of negative synergistic interactions between mutations (186). Observed
escape mutations in diverse epitopes in Gag targeted by elite controllers and other patients are

www.annualreviews.org • Computational Models 427

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

a
1

Relative fitness
0
Access provided by University of California - San Diego on 04/26/17. For personal use only.

0 4 8 12 16 20
E(mutant) – E(reference)
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

309S
310S
b

I
323
33 321
6G
9S

3
33
260E

242N
147
L
A
2 23
51

M
51
21

182G pe

21
epito

182G pe
epito
c
103
Escape time (days)

102

101

100
0 70
Time to escape in simulation
(evolutionary generations)
Figure 8
(a) Comparison of the measured in vitro replicative fitness of mutant HIV strains, compared to a reference
strain, bearing mutations in the Gag polyprotein compared to the theoretically predicted fitness. The metric
of fitness for the model predictions is a quantity E; higher values of E correspond to lower predicted fitness.
The statistical correlation coefficient is −0.78 (P = 8 × 10−9 ). (b) Two patients targeting the same epitope
evolve escape mutations at very different rates because of differences in the background sequences that
infected them. The circles represent the HIV protein in which the targeted epitope is embedded. Blue lines
represent predicted synergistic compensatory interactions between the putative escape mutation and the
preexisting mutations. Orange lines represent predicted synergistic antagonistic interactions. The thickness
of each line corresponds to the predicted strength of the synergistic interactions. (c) Comparison of the
measured escape time in patients and the predicted times (in generations of evolutions) from the computer
simulations. The statistical correlation coefficient is 0.81 (P = 3 × 10−13 ). These results are adapted from
Barton et al. (175).

428 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

predicted to be the mutations that incur the smallest fitness costs (169). These results are consistent
with a mechanism wherein elite controllers suppress viral replication by targeting regions with
high fitness costs upon mutation, thus cornering the virus between being killed by the immune
response and evolving low-fitness mutants. Bonhoeffer and collaborators have also shown that
higher predicted viral fitness was significantly associated with higher viral load in a large clinical
cohort (168).
McMichael and coworkers sequenced the swarm of viruses as a function of time in drug-naive
individuals (187). They also determined the immunodominant T cell responses in these patients
at the first detectable time points. In these patients, the virus evolved mutations that evaded
these responses. Escape was considered to have occurred when more than 50% of the swarm
of viruses had evolved the escape mutation. Barton et al. studied virus evolution subject to T
cell responses in these patients by combining computer simulations of virus evolution with their
Access provided by University of California - San Diego on 04/26/17. For personal use only.

inferred fitness landscape of HIV (175). Specifically, they tried to answer the following questions: 1]
Can the calculations predict the specific residue in the targeted epitopes where escape mutations
were observed? 2] Can the relative times at which escape occurred in the individual patients be
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

predicted?
The inputs to the calculations were the measured epitopes targeted by the T cell responses,
their relative immunodominance, and the initial swarm of sequenced viral strains in the patient.
The technical details of the simulation methods, and its limitations, are described in ref (174,
188). The typical metric used to predict the location of escape mutations is the entropy associated
with the frequency with which mutations are observed at different residues in a multiple sequence
alignment of viral proteins (39). Residues in a targeted epitope that have the larger entropy are more
mutable, suggesting that they incur a lower fitness cost for the virus, and are more likely to be the
locations of escape mutations. Barton et al. (175) showed that comparisons between the clinically
observed locations of the escape mutations and predictions using entropy were significantly inferior
to those using their evolutionary dynamics calculations with the inferred HIV fitness landscape.
The authors note that this is because of two reasons. First, the fitness landscape includes the effects
of the sequence background in which the escape mutation has to evolve. Preexisting mutations
can interact synergistically with the ultimate escape mutation in compensatory or antagonistic
manners, and thus influence which mutation incurs the lower fitness cost in a way that the entropy
of a residue does not account for. The second reason is that dynamic simulation of the evolutionary
pathways allows effects of competition between the swarm of viruses present and the existence of
multiple escape routes to be naturally included.
Barton et al. (175) provide some vivid illustrations of the importance of sequence background
in determining the fitness of HIV strains by considering pairs of patients who target the same
epitopes, but in whom escape occurs after significantly different durations of time. They find that
this is due to differences in the founder viruses that infected these patients. Figure 8b depicts a
particular pair of such patients. The circles in the figure depict the rest of the protein in which
the targeted epitope is embedded, and the marked residues are those that were mutated in the
founder virus. The individual in whom escape took much longer was infected with a strain bearing
several mutations that the fitness landscape predicts to be strongly antagonistically coupled to
the ultimate escape mutation. Thus, the mutant strain that would escape the T cell response is
relatively more unfit than the escape mutant in the other patient. Thus, the escape mutant should
take much longer to grow out and take over the viral population.
Figure 8c shows a comparison between the predicted and measured times required to escape.
Although there is considerable scatter, with high statistical significance the comparison is rea-
sonably good. Importantly, predictions for cases wherein escape is either slow or fast are very
good. Predicting these regions of the virus is important for immunogen design. The importance

www.annualreviews.org • Computational Models 429

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

of identifying regions where mutations come at a high fitness cost to the virus is further empha-
sized by recent results suggesting that fitter viruses are more likely to be transmitted than less fit
variants (38). Thus, targeting these regions could lead not only to reductions in viral load, but
also to decreased transmission. Predictions such as those shown in Figure 8c could be improved
by relaxing some of the assumptions in the simulations. This includes incorporating the effects of
clonal interference for competing escape mutations in different epitopes, a potentially important
contributor to the kinetics of escape (189).
Based on encouraging comparisons between predictions from fitness models and in vitro and
clinical data, immunogens for the T cell component of a possible therapeutic vaccine can be
designed with three criteria: (a) Exclude regions of the virus’s proteome that are rife with com-
pensatory mutational pathways. (b) Include regions of the proteome that contain residues that
have high fitness costs upon mutation, either due to very high conservation of a single residue or
Access provided by University of California - San Diego on 04/26/17. For personal use only.

because multiple mutations in the region are strongly antagonist because of negative synergistic
effects. (c) Include regions of the proteome that can be presented by diverse MHC molecules in
order to provide broad coverage for a population. Such immunogens have been designed. An im-
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

portant challenge for translating these designed immunogens into a practical vaccine is that long
peptide antigens are often not very immunogenic—this is also a challenge for cancer vaccines.
New delivery methods need to be devised for such immunogens, and much effort is being devoted
to this endeavor. One delivery approach that seems to be encouraging considers an immunogen
that is designed based only on single-point conservation and appears to induce T cell responses
in nonhuman primates (190).

CONCLUDING REMARKS
The reader may have noticed that the examples I discuss largely focus on studies on the adap-
tive immune system. I believe that very significant fundamental discoveries, with translational
implications, will be made by integrated computational and experimental investigations focused
on the innate immune system and how it critically enables adaptive immunity. But, there have
been few such studies to date. Another important area of focus should be how the immune system
is regulated by the microbiome, and how imbalances in these interactions can cause disease. A
lot of pertinent data are being collected, and bringing together computational, experimental, and
clinical studies in this area is expected to lead to significant advances. Cancer immunotherapy
is resulting in revolutionary changes in how we treat patients. But, we do not understand why
immunotherapies work for only some cancers and for only some patients. This is another area
that is ripe for studies using the paradigm of research depicted in Figure 1. Of course, each of the
areas I do discuss continues to be a frontier in basic and clinical immunology, and much remains
to be done.
What is a good model for making the convergence of computational/theoretical, experimental,
and clinical immunology commonplace? The successful model so far has been to bring together
scientists trained in the physical, computational, and mathematical sciences with those trained
as immunologists. To make such collaborations commonplace, each community needs to learn
the other’s language and be willing to be intellectually uncomfortable at the beginning of this
process. To facilitate this, short courses (run by professional organizations) and curricula (of-
fered by universities) that focus on educating physical/mathematical scientists about immunology
and immunologists about diverse computational models need to be developed. There are many
encouraging signs in this regard. The Kavli Institute for Theoretical Physics routinely holds
workshops on immunology, as do European physics-based entities. The American Association of

430 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

Immunologists now includes a module on computational modeling as part of its annual advanced
immunology course. Many universities are developing courses and degree programs at the con-
vergence of biology, physics, and computer science. But, these efforts are far from adequate at this
point. I see a future where experimental and clinical immunologists will be trained in much the
same way as an experimental physicist is trained today—she or he is not an expert in theoretical/
computational physics but understands it sufficiently to absorb the results produced by such stud-
ies and collaborate with theorists and computational scientists. Similarly, computational scientists
working on immunology will be trained in a way analogous to theoretical physicists—they are not
adept at experimental science but understand what questions can be addressed by current methods
and how to interpret experimental results. Many advances will be enabled if this future is realized.

DISCLOSURE STATEMENT
Access provided by University of California - San Diego on 04/26/17. For personal use only.

The author is not aware of any affiliations, memberships, funding, or financial holdings that might
be perceived as affecting the objectivity of this review.
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

ACKNOWLEDGMENTS
Drs. John Barton, Maria Fruschicheva, Eric Huseby, and Scott Shaffer; Mr. Kevin Kaczorowski;
and Ms. Renee Saindon helped me with preparation of this article. I am thankful to my doctoral
and postdoctoral trainees over the past 17 years who have influenced my views on the topic of this
review. My frequent collaborations with physicist Mehran Kardar continue to be invaluable. I am
indebted to the many basic and clinical immunologists with whom I have collaborated—especially
Drs. Dennis Burton, Mark Davis, Michael Dustin, Eric Huseby, Andrew McMichael, Michel
Nussenzweig, Ed Palmer, Jeroen Roose, Andrey Shaw, Bruce Walker, and Arthur Weiss—they
have taught me immunology. I am grateful to the NIH and the Ragon Institute of MGH, MIT,
and Harvard for supporting my work at a crossroad of disciplines.

LITERATURE CITED
1. Brenner S. 2002. Nobel lecture: Nature’s gift to Science. Biosci. Rep. 23:225–37
2. Germain RN, Meier-Schellersheim M, Nita-Lazar A, Fraser IDC. 2011. Systems biology in immunology:
a computational modeling perspective. Annu. Rev. Immunol. 29(1):527–85
3. Chakraborty AK, Weiss A. 2014. Insights into the initiation of TCR signaling. Nat. Immunol. 15(9):798–
807
4. Huang J, Brameshuber M, Zeng X, Xie J, Li Q-J, et al. 2013. A single peptide-major histocompatibility
complex ligand triggers digital cytokine secretion in CD4+ T cells. Immunity 39(5):846–57
5. Purbhoo MA, Irvine DJ, Huppa JB, Davis MM. 2004. T cell killing does not require the formation of a
stable mature immunological synapse. Nat. Immunol. 5(5):524–30
6. Sykulev Y, Joo M, Vturina I, Tsomides TJ, Eisen HN. 1996. Evidence that a single peptide-MHC
complex on a target cell can elicit a cytolytic T cell response. Immunity 4(6):565–71
7. Irvine DJ, Purbhoo MA, Krogsgaard M, Davis MM. 2002. Direct observation of ligand recognition by
T cells. Nature 419(6909):845–49
8. Davis MM, Krogsgaard M, Huse M, Huppa J, Lillemeier BF, Li Q-J. 2007. T cells as a self-referential,
sensory organ. Annu. Rev. Immunol. 25(1):681–95
9. Hoerter JAH, Brzostek J, Artyomov MN, Abel SM, Casas J, et al. 2013. Coreceptor affinity for MHC
defines peptide specificity requirements for TCR interaction with coagonist peptide-MHC. J. Exp. Med.
210(9):1807–21
10. Krogsgaard M, Li Q-J, Sumen C, Huppa JB, Huse M, Davis MM. 2005. Agonist/endogenous peptide-
MHC heterodimers drive T cell activation and sensitivity. Nature 434(7030):238–43

www.annualreviews.org • Computational Models 431

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

11. Li Q-J, Dinner AR, Qi S, Irvine DJ, Huppa JB, et al. 2004. CD4 enhances T cell sensitivity to antigen
by coordinating Lck accumulation at the immunological synapse. Nat. Immunol. 5(8):791–99
12. Daniels MA, Teixeiro E, Gill J, Hausmann B, Roubaty D, et al. 2006. Thymic selection threshold defined
by compartmentalization of Ras/MAPK signalling. Nature 444(7120):724–29
13. Gascoigne NR, Palmer E. 2011. Signaling in thymic selection. Curr. Opin. Immunol. 23(2):207–12
14. Starr TK, Jameson SC, Hogquist KA. 2003. Positive and negative selection of T cells. Annu. Rev.
Immunol. 21(1):139–76
15. Govern CC, Paczosa MK, Chakraborty AK, Huseby ES. 2010. Fast on-rates allow short dwell time
ligands to activate T cells. PNAS 107(19):8724–29
16. Altan-Bonnet G, Germain RN. 2005. Modeling T cell antigen discrimination based on feedback control
of digital ERK responses. PLOS Biol. 3(11):e356
17. Das J, Ho M, Zikherman J, Govern C, Yang M, et al. Digital signaling and hysteresis characterize Ras
activation in lymphoid cells. Cell 136(2):337–51
Access provided by University of California - San Diego on 04/26/17. For personal use only.

18. Huse M, Klein LO, Girvin AT, Faraj JM, Li Q-J, et al. 2007. Spatial and temporal dynamics of T cell
receptor signaling with a photoactivatable agonist. Immunity 27(1):76–88
19. Germain RN. 2010. Computational analysis of T cell receptor signaling and ligand discrimination—past,
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

present, and future. FEBS Lett. 584(24):4814–22


20. Chakraborty AK, Das J. 2010. Pairing computation with experimentation: a powerful coupling for un-
derstanding T cell signalling. Nat. Rev. Immunol. 10(1):59–71
21. Lever M, Maini PK, van der Merwe PA, Dushek O. 2014. Phenotypic models of T cell activation. Nat.
Rev. Immunol. 14(9):619–29
22. Coward J, Germain RN, Altan-Bonnet G. 2010. Perspectives for computer modeling in the study of T
cell activation. Cold Spring Harb. Perspect. Biol. 2(6):a005538
23. Molina-Paris C, Lythe G, eds. 2011. Mathematical Models and Immune Cell Biology. New York: Springer
24. Gottschalk RA, Martins AJ, Sjoelund VH, Angermann BR, Lin B, Germain RN. 2013. Recent progress
using systems biology approaches to better understand molecular mechanisms of immunity. Syst.
Immunol. 25(3):201–8
25. Chylek LA, Harris LA, Tung C-S, Faeder JR, Lopez CF, Hlavacek W. 2014. Rule-based modeling: a
computational approach for studying biomolecular site dynamics in cell signaling systems. Wiley Interdisc.
Rev. Sys. Biol. Med. 6(1):13–36
26. Lis M, Artyomov MN, Devadas S, Chakraborty AK. 2009. Efficient stochastic simulation of reaction-
diffusion processes via direct compilation. Bioinformatics 25(17):2289–91
27. Meier-Schellersheim M, Mack G. 1999. SIMMUNE, a tool for simulating and analyzing immune system
behavior. arXiv:cs/9903017
28. Blinov ML, Faeder JR, Goldstein B, Hlavacek WS. 2004. BioNetGen: software for rule-based modeling
of signal transduction based on the interactions of molecular domains. Bioinformatics 20(17):3289–91
29. Krogsgaard M, Juang J, Davis MM. 2007. A role for “self ” in T-cell activation. Syst. Immunol. 19(4):236–
44
30. Stepanek O, Prabhakar AS, Osswald C, King CG, Bulek A, et al. 2014. Coreceptor scanning by the T
cell receptor provides a mechanism for T cell tolerance. Cell 159(2):333–45
31. Paster W, Bruger AM, Katsch K, Gregoire C, Roncagalli R, et al. 2015. A THEMIS:SHP1 complex
promotes T-cell survival. EMBO J. 34(3):393–409
32. Dittel BN, Štefanova I, Germain RN, Janeway CA Jr. 1999. Cross-antagonism of a T cell clone expressing
two distinct T cell receptors. Immunity 11(3):289–98
33. Wylie DC, Das J, Chakraborty AK. 2007. Sensitivity of T cells to antigen and antagonism emerges from
differential regulation of the same molecular signaling module. PNAS 104(13):5533–38
34. Feinerman O, Germain RN, Altan-Bonnet G. 2008. Quantitative challenges in understanding ligand
discrimination by αβ T cells. Mol. Immunol. 45(3):619–31
35. François P, Altan-Bonnet G. 2016. The case for absolute ligand discrimination: modeling information
processing and decision by immune T cells. J. Stat. Phys. 162(5):1130–52
36. Aleksic M, Dushek O, Zhang H, Shenderov E, Chen J-L, et al. 2010. Dependence of T cell antigen
recognition on T cell receptor-peptide MHC confinement time. Immunity 32(2):163–74

432 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

37. Huppa JB, Axmann M, Mortelmaier MA, Lillemeier BF, Newell EW, et al. 2010. TCR-peptide-MHC
interactions in situ show accelerated kinetics and increased affinity. Nature 463(7283):963–67
38. Huang J, Zarnitsyna VI, Liu B, Edwards LJ, Jiang N, et al. 2010. The kinetics of two-dimensional TCR
and pMHC interactions determine T-cell responsiveness. Nature 464(7290):932–36
39. Liu B, Chen W, Evavold BD, Zhu C. 2014. Accumulation of dynamic catch bonds between TCR and
agonist peptide-MHC triggers T cell signaling. Cell 157(2):357–68
40. O’Donoghue GP, Pielak RM, Smoligovets AA, Lin JJ, Groves JT. 2013. Direct single molecule mea-
surement of TCR triggering by agonist pMHC in living primary T cells. eLife 2:681
41. Kim ST, Takeuchi K, Sun Z-YJ, Touma M, Castro CE, et al. 2009. The αβ T cell receptor is an
anisotropic mechanosensor. J. Biol. Chem. 284(45):31028–37
42. Kim ST, Touma M, Takeuchi K, Sun Z-YJ, Dave VP, et al. 2010. Distinctive CD3 heterodimeric
ectodomain topologies maximize antigen-triggered activation of αβ T cell receptors. J. Immunol.
185(5):2951–59
Access provided by University of California - San Diego on 04/26/17. For personal use only.

43. Yoon ST, Dianzani U, Bottomly K, Janeway CA. 1994. Both high and low avidity antibodies to the T
cell receptor can have agonist or antagonist activity. Immunity 1(7):563–69
44. Lanier LL, Ruitenberg JJ, Allison JP, Weiss A. 1986. Distinct epitopes on the T cell antigen receptor of
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

HPB-ALL tumor cells identified by monoclonal antibodies. J. Immunol. 137(7):2286–92


45. Adams JJ, Narayanan S, Liu B, Birnbaum ME, Kruse AC, et al. T cell receptor signaling is limited by
docking geometry to peptide-major histocompatibility complex. Immunity 35(5):681–93
46. Kim ST, Shin Y, Brazin K, Mallis RJ, Sun Z-YJ, et al. 2012. TCR mechanobiology: torques and tunable
structures linked to early T cell signaling. Front. Immunol. 3:76
47. Lee KH. 2003. The immunological synapse balances T cell receptor signaling and degradation. Science
302(5648):1218–22
48. Dustin ML, Chakraborty AK, Shaw AS. 2010. Understanding the structure and function of the immuno-
logical synapse. Cold Spring Harb. Perspect. Biol. 2(10):a002311
49. Das J, Khakoo SI. 2015. NK cells: tuned by peptide? Immunol. Rev. 267(1):214–27
50. Sjolin-Goodfellow H, Frushicheva MP, Ji Q, Cheng DA, Kadlecek TA, et al. 2015. The catalytic activity
of the kinase ZAP-70 mediates basal signaling and negative feedback of the T cell receptor pathway. Sci.
Signal. 8(377):ra49
51. Houtman JCD, Yamaguchi H, Barda-Saad M, Braiman A, Bowden B, et al. 2006. Oligomerization of
signaling complexes by the multipoint binding of GRB2 to both LAT and SOS1. Nat. Struct. Mol. Biol.
13(9):798–805
52. Wilson BS, Pfeiffer JR, Surviladze Z, Gaudet EA, Oliver JM. 2001. High resolution mapping of mast
cell membranes reveals primary and secondary domains of FcRI and LAT. J. Cell Biol. 154(3):645–58
53. Nag A, Monine MI, Faeder JR, Goldstein B. 2009. Aggregation of membrane proteins by cytosolic
cross-linkers: theory and simulation of the LAT-Grb2-SOS1 system. Biophys. J. 96(7):2604–23
54. Su X, Ditlev JA, Hui E, Xing W, Banjade S, et al. 2016. Phase separation of signaling molecules promotes
T cell receptor signal transduction. Science 352(6285):590–95
55. Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev.
Genet. 10(1):57–63
56. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. 2011. Full-length transcriptome
assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29(7):644–52
57. Risso D, Schwartz K, Sherlock G, Dudoit S. 2011. GC-content normalization for RNA-Seq data. BMC
Bioinform. 12(1):480–17
58. Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without
a reference genome. BMC Bioinform. 12(1):323–16
59. Wagner GP, Kin K, Lynch VJ. 2012. Measurement of mRNA abundance using RNA-Seq data: RPKM
measure is inconsistent among samples. Theory Biosci. 131(4):281–85
60. Risso D, Ngai J, Speed TP, Dudoit S. 2014. Normalization of RNA-Seq data using factor analysis of
control genes or samples. Nat. Biotechnol. 32(9):896–902
61. Friedman J, Hastie T, Tibshirani R. 2001. The Elements of Statistical Learning, Vol. 1. Berlin: Springer
62. James G, Witten D, Hastie T, Tibshirani R. 2013. An Introduction to Statistical Learning, Vol. 6. New
York: Springer

www.annualreviews.org • Computational Models 433

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

63. Shalek AK, Satija R, Adiconis X, Gertner RS, Gaublomme JT, et al. 2013. Single-cell transcriptomics
reveals bimodality in expression and splicing in immune cells. Nature 498(7453):236–40
64. Shalek AK, Satija R, Shuga J, Trombetta JJ, Gennert D, et al. 2014. Single cell RNA-Seq reveals dynamic
paracrine control of cellular variation. Nature 510(7505):363–69
65. Korn T, Bettelli E, Oukka M, Kuchroo VK. 2009. IL-17 and Th17 cells. Annu. Rev. Immunol. 27(1):485–
517
66. Littman DR, Rudensky AY. 2010. Th17 and regulatory T cells in mediating and restraining inflammation.
Cell 140(6):845–58
67. Hernández-Santos N, Gaffen SL. 2012. Th17 cells in immunity to Candida albicans. Cell Host Microbe
11(5):425–35
68. Lee Y, Awasthi A, Yosef N, Quintana FJ, Xiao S, et al. 2012. Induction and molecular signature of
pathogenic TH 17 cells. Nat. Immunol. 13(10):991–99
69. Gaublomme JT, Yosef N, Lee Y, Gertner RS, Yang LV, et al. 2015. Single-cell genomics unveils critical
Access provided by University of California - San Diego on 04/26/17. For personal use only.

regulators of Th17 cell pathogenicity. Cell 163(6):1400–12


70. Wang C, Yosef N, Gaublomme J, Wu C, Lee Y, et al. 2015. CD5L/AIM regulates lipid biosynthesis
and restrains Th17 cell pathogenicity. Cell 163(6):1413–27
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

71. Eisen HN, Siskind GW. 1964. Variations in affinities of antibodies during the immune response.
Biochemistry 3(7):996–1008
72. Steiner LA, Eisen HN. 1967. The relative affinity of antibodies synthesized in the secondary response.
J. Exp. Med. 126(6):1185–205
73. Victora GD, Nussenzweig MC. 2012. Germinal centers. Annu. Rev. Immunol. 30(1):429–57
74. De Silva NS, Klein U. 2015. Dynamics of B cells in germinal centres. Nat. Rev. Immunol. 15(3):137–48
75. Tas JMJ, Mesin L, Pasqual G, Targ S, Jacobsen JT, et al. 2016. Visualizing antibody affinity maturation
in germinal centers. Science 351(6277):1048–54
76. Perelson AS, Oster GF. 1979. Theoretical studies of clonal selection: minimal antibody repertoire size
and reliability of self-non-self discrimination. J. Theor. Biol. 81(4):645–70
77. Kepler TB, Perelson AS. 1993. Cyclic re-entry of germinal center B cells and the efficiency of affinity
maturation. Immunol. Today 14(8):412–15
78. Oprea M, Perelson AS. 1997. Somatic mutation leads to efficient affinity maturation when centrocytes
recycle back to centroblasts. J. Immunol. 158(11):5155–62
79. Allen CDC, Okada T, Tang HL, Cyster JG. 2007. Imaging of germinal center selection events during
affinity maturation. Science 315(5811):528–31
80. Schwickert TA, Lindquist RL, Shakhar G, Livshits G, Skokos D, et al. 2007. In vivo imaging of germinal
centres reveals a dynamic open structure. Nature 446(7131):83–87
81. Hauser AE, Junt T, Mempel TR, Sneddon MW, Kleinstein SH, et al. 2007. Definition of germinal-center
B cell migration in vivo reveals predominant intrazonal circulation patterns. Immunity 26(5):655–67
82. Meyer-Hermann M, Deutsch A, Or-Guil M. 2001. Recycling probability and dynamical properties of
germinal center reactions. J. Theor. Biol. 210(3):265–85
83. IBER D, Maini PK. 2002. A mathematical model for germinal centre kinetics and affinity maturation.
J. Theor. Biol. 219(2):153–75
84. Meyer-Hermann ME, Maini PK, Iber D. 2006. An analysis of B cell selection mechanisms in germinal
centers. Math. Med. Biol. 23(3):255–77
85. Zhang J, Shakhnovich EI. 2010. Optimality of mutation and selection in germinal centers. PLOS Comput.
Biol. 6(6):e1000800
86. Meyer-Hermann M, Mohr E, Pelletier N, Zhang Y, Victora GD, Toellner K-M. 2012. A theory of
germinal center B cell selection, division, and exit. Cell Rep. 2(1):162–74
87. Takala SL, Coulibaly D, Thera MA, Batchelor AH, Cummings MP, et al. 2009. Extreme polymorphism
in a vaccine antigen and risk of clinical malaria: implications for vaccine development. Sci. Trans. Med.
1(2):2ra5
88. Burton DR, Hangartner L. 2016. Broadly neutralizing antibodies to HIV and their role in vaccine design.
Annu. Rev. Immunol. 34(1):635–59
89. Pappas L, Foglierini M, Piccoli L, Kallewaard NL, Turrini F, et al. 2014. Rapid development of broadly
influenza neutralizing antibodies through redundant mutations. Nature 516(7531):418–22

434 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

90. Julien JP, Cupo A, Sok D, Stanfield RL, Lyumkis D, et al. 2013. Crystal structure of a soluble cleaved
HIV-1 envelope trimer. Science 342(6165):1477–83
91. Lyumkis D, Julien JP, de Val N, Cupo A, Potter CS, et al. 2013. Cryo-EM structure of a fully glycosylated
soluble cleaved HIV-1 envelope trimer. Science 342(6165):1484–90
92. Liao H-X, Lynch R, Zhou T, Gao F, Alam SM, et al. 2013. Co-evolution of a broadly neutralizing
HIV-1 antibody and founder virus. Nature 496(7446):469–76
93. Doria-Rose NA, Schramm CA, Gorman J, Moore PL, Bhiman JN, et al. 2014. Developmental pathway
for potent V1V2-directed HIV-neutralizing antibodies. Nature 509(7498):55–62
94. Wu X, Zhang Z, Schramm CA, Joyce MG, Do Kwon Y, et al. 2015. Maturation and diversity of the
VRC01-antibody lineage over 15 years of chronic HIV-1 infection. Cell 161(3):470–85
95. Bhiman JN, Anthony C, Doria-Rose NA, Karimanzira O, Schramm CA, et al. 2015. Viral variants
that initiate and drive maturation of V1V2-directed HIV-1 broadly neutralizing antibodies. Nat. Med.
21(11):1332–36
Access provided by University of California - San Diego on 04/26/17. For personal use only.

96. Chaudhury S, Reifman J, Wallqvist A. 2014. Simulation of B cell affinity maturation explains enhanced
antibody cross-reactivity induced by the polyvalent malaria vaccine AMA1. J. Immunol. 193(5):2073–86
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

97. Wang S, Mata-Fink J, Kriegsman B, Hanson M, Irvine DJ, et al. 2015. Manipulating the selection forces
during affinity maturation to generate cross-reactive HIV antibodies. Cell 160(4):785–97
98. Luo S, Perelson AS. 2015. Competitive exclusion by autologous antibodies can prevent broad HIV-1
antibodies from arising. PNAS 112(37):11654–59
99. Childs LM, Baskerville EB, Cobey S. 2015. Trade-offs in antibody repertoires to complex antigens. Phil.
Trans. R. Soc. B 370(1676):20140245
100. Jardine J, Julien JP, Menis S, Ota T, Kalyuzhniy O, et al. 2013. Rational HIV immunogen design to
target specific germline B cell receptors. Science 340(6133):711–16
101. Jardine JG, Ota T, Sok D, Pauthner M, Kulp DW, et al. 2015. Priming a broadly neutralizing antibody
response to HIV-1 using a germline-targeting immunogen. Science 349(6244):156–61
102. Jardine JG, Kulp DW, Havenar-Daughton C, Sarkar A, Briney B, et al. 2016. HIV-1 broadly neutralizing
antibody precursor B cells revealed by germline-targeting immunogen. Science 351(6280):1458–63
103. Briney B, Sok D, Jardine JG, Kulp DW, Skog P, et al. 2016. Tailored immunogens direct affinity
maturation toward HIV neutralizing antibodies. Cell 166(6):1459–70.e11
104. Escolano A, Steichen JM, Dosenovic P, Kulp DW, Golijanin J, et al. 2016. Sequential immunization
elicits broadly neutralizing anti-HIV-1 antibodies in Ig knockin mice. Cell 166(6):1445–58.12
105. Dutta S, Dlugosz LS, Drew DR, Ge X, Ababacar D, et al. 2013. Overcoming antigenic diversity by
enhancing the immunogenicity of conserved epitopes on the malaria vaccine candidate apical membrane
antigen-1. PLOS Pathog. 9(12):e1003840
106. Deem MW, Lee HY. 2003. Sequence space localization in the immune system response to vaccination
and disease. Phys. Rev. Lett. 91(6):068101–4
107. Heo M, Zeldovich KB, Shakhnovich EI. 2011. Diversity against adversity: how adaptive immune system
evolves potent antibodies. J. Stat. Phys. 144(2):241–67
108. Sanders RW, Derking R, Cupo A, Julien J-P, Yasmeen A, et al. 2013. A next-generation cleaved, soluble
HIV-1 Env trimer, BG505 SOSIP.664 gp140, expresses multiple epitopes for broadly neutralizing but
not non-neutralizing antibodies. PLOS Pathog. 9(9):e1003618
109. de Taeye SW, Ozorowski G, Torrents de la Peña A, Guttman M, Julien J-P, et al. 2015. Immuno-
genicity of stabilized HIV-1 envelope trimers with reduced exposure of non-neutralizing epitopes. Cell
163(7):1702–15
110. Murphy K, Weaver C. 2016. Janeway’s Immunobiology. New York: Garland Sci. 9th ed.
111. Hogquist KA, Jameson SC. 2014. The self-obsession of T cells: How TCR signaling thresholds affect
fate “decisions” and effector function. Nat. Immunol. 15(9):815–23
112. Klein L, Kyewski B, Allen PM, Hogquist KA. 2014. Positive and negative selection of the T cell repertoire:
What thymocytes see (and don’t see). Nat. Rev. Immunol. 14(6):377–91
113. Vrisekoop N, Monteiro JP, Mandl JN, Germain RN. 2014. Revisiting Thymic Positive Selection and
the Mature T Cell Repertoire for Antigen. Immunity 41(2):181–90

www.annualreviews.org • Computational Models 435

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

114. Jenkins MK, Chu HH, McLachlan JB, Moon JJ. 2010. On the Composition of the Preimmune Reper-
toire of T Cells Specific for Peptide-Major Histocompatibility Complex Ligands. Annu. Rev. Immunol.
28(1):275–94
115. Eisen HN, Chakraborty AK. 2010. Evolving concepts of specificity in immune reactions. PNAS
107(52):22373–80
116. Huseby ES, White J, Crawford F, Vass T, Becker D, et al. 2005. How the T Cell Repertoire Becomes
Peptide and MHC Specific. Cell 122(2):247–60
117. Huseby ES, Crawford F, White J, Marrack P, Kappler JW. 2006. Interface-disrupting amino acids
establish specificity between T cell receptors and complexes of major histocompatibility complex and
peptide. Nat. Immunol. 7(11):1191–99
118. Ignatowicz L, Kappler J, Marrack P. 1996. The repertoire of T cells shaped by a single MHC/peptide
ligand. Cell 84(4):521–29
119. Kosmrlj A, Jha AK, Huseby ES, Kardar M, Chakraborty AK. 2008. How the thymus designs antigen-
Access provided by University of California - San Diego on 04/26/17. For personal use only.

specific and self-tolerant T cell receptor sequences. PNAS 105(43):16671–76


120. Chao DL, Davenport MP, Forrest S, Perelson AS. 2005. The effects of thymic selection on the range
of T cell cross-reactivity. Eur. J. Immunol. 35(12):3452–59
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

121. Huseby ES, White J, Crawford F, Vass T, Becker D, et al. 2005. How the T Cell Repertoire Becomes
Peptide and MHC Specific. Cell 122(2):247–60
122. Košmrlj A, Chakraborty AK, Kardar M, Shakhnovich EI. 2009. Thymic Selection of T-Cell Receptors
as an Extreme Value Problem. Phys. Rev. Lett. 103(6):068103EP-
123. Chakraborty AK, Kosmrlj A. 2010. Statistical Mechanical Concepts in Immunology. Annu. Rev. Phys.
Chem. 61(1):283–303
124. Stadinski BD, Shekhar K, Gomez-Tourino I, Jung J, Sasaki K, et al. 2016. Hydrophobic CDR3 residues
promote the development of self-reactive T cells. Nat. Immunol. 17(8):946–55
125. Suri A, Levisetti MG, Unanue ER. 2008. Do the peptide-binding properties of diabetogenic class II
molecules explain autoreactivity? Curr. Opin. Immunol. 20(1):105–10
126. Kosmrlj A, Read EL, Qi Y, Allen TM, Altfeld M, et al. 2010. Effects of thymic selection of the T-cell
repertoire on HLA class I-associated control of HIV infection. Nature 465(7296):350–54
127. Peters B, Sidney J, Bourne P, Bui H-H, Buus S, et al. 2005. The Immune Epitope Database and Analysis
Resource: From Vision to Blueprint. PLOS Biol. 3(3):e91–e93
128. Rao X, Fontaine Costa AICA, van Baarle D, Kesmir C. 2009. A Comparative Study of HLA Binding
Affinity and Ligand Diversity: Implications for Generating Immunodominant CD8+ T Cell Responses.
J. Immunol. 182(3):1526–32
129. Kim AY, Kuntzen T, Timm J, Nolan BE, Baca MA, et al. 2011. Spontaneous control of HCV is associated
with expression of HLA-B∗ 57 and preservation of targeted epitopes. Gastroenterology 140(2):686–96.e1
130. Murugan A, Mora T, Walczak AM, Callan CG. 2012. Statistical inference of the generation probability
of T-cell receptors from sequence repertoires. PNAS 109(40):16161–66
131. Elhanati Y, Murugan A, Callan CG Jr., Mora T, Walczak AM. 2014. Quantifying selection in immune
receptor repertoires. PNAS 111(27):9875–80
132. Brodin P, Jojic V, Gao T, Bhattacharya S, Angel CJL, et al. 2015. Variation in the human immune system
is largely driven by non-heritable influences. Cell 160(1–2):37–47
133. Carr EJ, Dooley J, Garcia-Perez JE, Lagou V, Lee JC, et al. 2016. The cellular composition of the human
immune system is shaped by age and cohabitation. Nat. Immunol. 17(4):461–68
134. Tsang JS, Schwartzberg PL, Kotliarov Y, Biancotto A, Xie Z, et al. 2014. Global analyses of human
immune variation reveal baseline predictors of postvaccination responses. Cell 157(2):499–513
135. Roederer M, Quaye L, Mangino M, Beddall MH, Mahnke Y, et al. The Genetic Architecture of the
Human Immune System: A Bioresource for Autoimmunity and Disease Pathogenesis. Cell 161(2):387–
403
136. Orrù V, Steri M, Sole G, Sidore C, Virdis F, et al. 2013. Genetic Variants Regulating Immune Cell
Levels in Health and Disease. Cell 155(1):242–56
137. Maecker HT, McCoy JP, Nussenblatt R. 2012. Standardizing immunophenotyping for the Human
Immunology Project. Nat. Rev. Immunol. 12(3):191–200

436 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

138. Nakaya HI, Wrammert J, Lee EK, Racioppi L, Marie-Kunze S, et al. 2011. Systems biology of vaccination
for seasonal influenza in humans. Nat. Immunol. 12(8):786–95
139. Chattopadhyay PK, Roederer M. 2015. A Mine Is a Terrible Thing to Waste: High Content, Single
Cell Technologies for Comprehensive Immune Analysis. Am. J. Transplant. 15(5):1155–61
140. Bandura DR, Baranov VI, Ornatsky OI, Antonov A, Kinach R, et al. 2009. Mass Cytometry: Technique
for Real Time Single Cell Multitarget Immunoassay Based on Inductively Coupled Plasma Time-of-
Flight Mass Spectrometry. Anal. Chem. 81(16):6813–22
141. Spitzer MH, Nolan GP. 2016. Mass cytometry: single cells, many features. Cell 165(4):780–91
142. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, et al. 2015. Highly parallel genome-wide expression
profiling of individual cells using nanoliter droplets. Cell 161(5):1202–14
143. Qiu P, Simonds EF, Bendall SC, Gibbs KD, Bruggner RV, et al. 2011. Extracting a cellular hierarchy
from high-dimensional cytometry data with SPADE. Nat. Biotech. 29(10):886–91
144. Bendall SC, Simonds EF, Qiu P, Amir EAD, Krutzik PO, et al. 2011. Single-Cell Mass Cytome-
Access provided by University of California - San Diego on 04/26/17. For personal use only.

try of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum. Science
332(6030):687–96
145. Van Der Maaten L, Postma E, Van den Herik J. 2009. Dimensionality reduction: A comparative.
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

J. Mach. Learn. Res. 10:66–71


146. Jolliffe I. 2002. Principal Component Analysis. New York: Springer
147. Newell EW, Sigal N, Bendall SC, Nolan GP, Davis MM. 2012. Cytometry by time-of-flight shows
combinatorial cytokine expression and virus-specific cell niches within a continuum of CD8+ T cell
phenotypes. Immunity 36(1):142–52
148. Shekhar K, Brodin P, Davis MM, Chakraborty AK. 2014. Automatic Classification of Cellular Expression
by Nonlinear Stochastic Embedding (ACCENSE). PNAS 111(1):202–7
149. Amir E-AD, Davis KL, Tadmor MD, Simonds EF, Levine JH, et al. 2013. ViSNE enables visualization
of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotech.
31(6):545–52
150. Maaten LVD, Hinton G. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov):2579–605
151. Evans DM, Zhu G, Duffy DL, Frazer IH, Montgomery GW, Martin NG. 2004. A major quantitative
trait locus for CD4-CD8 ratio is located on chromosome 11. Genes Immun. 5(7):548–52
152. de Craen AJM, Posthuma D, Remarque EJ, van den Biggelaar AHJ, Westendorp RGJ, Boomsma DI.
2005. Heritability estimates of innate immunity: An extended twin study. Genes Immun. 6(2):167–70
153. Shaw AC, Goldstein DR, Montgomery RR. 2013. Age-dependent dysregulation of innate immunity.
Nat. Rev. Immunol. 13(12):875–87
154. Giefing-Kröll C, Berger P, Lepperdinger G, Grubeck-Loebenstein B. 2015. How sex and age affect
immune responses, susceptibility to infections, and response to vaccination. Aging Cell 14(3):309–21
155. Sylwester AW, Mitchell BL, Edgar JB, Taormina C, Pelte C, et al. 2005. Broadly targeted human
cytomegalovirus-specific CD4+ and CD8+ T cells dominate the memory compartments of exposed
subjects. J. Exp. Med. 202(5):673–85
156. Chidrawar S, Khan N, Wei W, McLarnon A, Smith N, et al. 2009. Cytomegalovirus-seropositivity has
a profound influence on the magnitude of major lymphoid subsets within healthy individuals. Clin. Exp.
Immunol. 155(3):423–32
157. Hooper LV, Littman DR, Macpherson AJ. 2012. Interactions Between the Microbiota and the Immune
System. Science 336(6086):1268–73
158. Furman D, Jojic V, Kidd B, Shen-Orr S, Price J, et al. 2013. Apoptosis and other immune biomarkers
predict influenza vaccine responsiveness. Mol. Syst. Biol. 9(1):659–59
159. De Gregorio E, Rappuoli R. 2014. From empiricism to rational design: A personal perspective of the
evolution of vaccine development. Nat. Rev. Immunol. 14(7):505–14
160. Ho DD, Neumann AU, Perelson AS, Chen W, Leonard JM, Markowitz M. 1995. Rapid turnover of
plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373(6510):123–26
161. Wei X, Ghosh SK, Taylor ME, Johnson VA, Emini EA, et al. 1995. Viral dynamics in human immuno-
deficiency virus type 1 infection. Nature 373(6510):117–22
162. Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD. 1996. HIV-1 Dynamics in Vivo:
Virion Clearance Rate, Infected Cell Life-Span, and Viral Generation Time. Science 271(5255):1582–86

www.annualreviews.org • Computational Models 437

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

163. Perelson AS, Nelson PW. 1999. Mathematical analysis of HIV-I: dynamics in vivo. SIAM Rev. 41:3–44
164. Kelleher AD, Long C, Holmes EC, Allen RL, Wilson J, et al. 2001. Clustered Mutations in HIV-1 Gag
Are Consistently Required for Escape from Hla-B27-Restricted Cytotoxic T Lymphocyte Responses.
J. Exp. Med. 193(3):375–86
165. Martinez-Picado J, Prado JG, Fry EE, Pfafferott K, Leslie A, et al. 2006. Fitness Cost of Escape Mutations
in p24 Gag in Association with Control of Human Immunodeficiency Virus Type 1. J. Virol. 80(7):3617–
23
166. Brockman MA, Schneidewind A, Lahaie M, Schmidt A, Miura T, et al. 2007. Escape and Compensation
from Early HLA-B57-Mediated Cytotoxic T-Lymphocyte Pressure on Human Immunodeficiency Virus
Type 1 Gag Alter Capsid Interactions with Cyclophilin A. J. Virol. 81(22):12608–18
167. Hinkley T, Martins J, Chappey C, Haddad M, Stawiski E, et al. 2011. A systems analysis of mutational
effects in HIV-1 protease and reverse transcriptase. Nat. Genet. 43(5):487–89
168. Kouyos RD, Wyl von V, Hinkley T, Petropoulos CJ, Haddad M, et al. 2011. Assessing Predicted HIV-1
Access provided by University of California - San Diego on 04/26/17. For personal use only.

Replicative Capacity in a Clinical Setting. PLOS Pathog. 7(11):e1002321–26


169. Ferguson AL, Mann JK, Omarjee S, Ndung’u T, Walker BD, Chakraborty AK. 2013. Translating HIV
Sequences into Quantitative Fitness Landscapes Predicts Viral Vulnerabilities for Rational Immunogen
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

Design. Immunity 38(3):606–17


170. Mann JK, Barton JP, Ferguson AL, Omarjee S, Walker BD, et al. 2014. The Fitness Landscape of HIV-1
Gag: Advanced Modeling Approaches and Validation of Model Predictions by In Vitro Testing. PLOS
Comput. Biol. 10(8):e1003776–11
171. Łuksza M, Lässig M. 2014. A predictive fitness model for influenza. Nature 507(7490):57–61
172. Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees.
eLife 3:e01914–28
173. Neher RA, Bedford T, Daniels RS, Russell CA, Shraiman BI. 2016. Prediction, dynamics, and visual-
ization of antigenic phenotypes of seasonal influenza viruses. PNAS 113(12):E1701–9
174. Barton JP, De Leonardis E, Coucke A, Cocco S. 2016. ACE: adaptive cluster expansion for maximum
entropy graphical model inference. Bioinformatics 32:3089–97
175. Barton JP, Goonetilleke N, Butler TC, Walker BD, McMichael AJ, Chakraborty AK. 2016. Relative
rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nat. Comms.
7:11660–27
176. Sella G, Hirsh AE. 2005. The application of statistical physics to evolutionary biology. PNAS
102(27):9541–46
177. Friedrich TC, Dodds EJ, Yant LJ, Vojnov L, Rudersdorf R, et al. 2004. Reversion of CTL escape-variant
immunodeficiency viruses in vivo. Nat. Med. 10(3):275–81
178. Korber B. 2001. Evolutionary and immunological implications of contemporary HIV-1 variation. Br.
Med. Bull. 58(1):19–42
179. Korber B, Gaschen B, Yusim K, Thakallapally R, Kesmir C, Detours V. 2001. Evolutionary and im-
munological implications of contemporary HIV-1 variation. Br. Med. Bull. 58(1):19–42
180. Moore CB. 2002. Evidence of HIV-1 Adaptation to HLA-Restricted Immune Responses at a Population
Level. Science 296(5572):1439–43
181. Zanini F, Brodin J, Thebo L, Lanz C, Bratt G, et al. 2015. Population genomics of intrapatient HIV-1
evolution. eLife 4:13239–39
182. Moradigaravand D, Kouyos R, Hinkley T, Haddad M, Petropoulos CJ, et al. 2014. Recombination
accelerates adaptation on a large-scale empirical fitness landscape in HIV-1. PLOS Genet. 10:e1004439
183. Yang W-L, Kouyos RD, Böni J, Yerly S, Klimkait T, et al. 2015. Persistence of Transmitted HIV-1
Drug Resistance Mutations Associated with Fitness Costs and Viral Genetic Backgrounds. PLOS Pathog.
11(3):e1004722–29
184. Butler TC, Barton JP, Kardar M, Chakraborty AK. 2016. Identification of drug resistance mutations in
HIV from constraints on natural evolution. Phys. Rev. E 93(2):022412–15
185. Deeks SG, Walker BD. 2007. Human Immunodeficiency Virus Controllers: Mechanisms of Durable
Virus Control in the Absence of Antiretroviral Therapy. Immunity 27(3):406–16
186. Dahirel V, Shekhar K, Pereyra F, Miura T, Artyomov M, et al. 2011. Coordinate linkage of HIV evolution
reveals regions of immunological vulnerability. PNAS 108(28):11530–35

438 Chakraborty

Changes may still occur before final publication online and in print
IY35CH15-Chakraborty ARI 30 January 2017 13:42

187. Liu MKP, Hawkins N, Ritchie AJ, Ganusov VV, Whale V, et al. 2012. Vertical T cell immunodominance
and epitope entropy determine HIV-1 escape. J. Clin. Investig. 123:1–18
188. Cocco S, Monasson R. 2011. Adaptive Cluster Expansion for Inferring Boltzmann Machines with Noisy
Data. Phys. Rev. Lett. 106(9):090601–13
189. Pandit A, de Boer RJ. 2016. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal
interference and genetic hitchhiking among immune escape variants. Retrovirology 11(1):56–38
190. Borthwick N, Ahmed T, Ondondo B, Hayes P, Rose A, et al. 2013. Vaccine-elicited Human T Cells
Recognizing Conserved Protein Regions Inhibit HIV-1. Mol. Ther. 22(2):464–75
Access provided by University of California - San Diego on 04/26/17. For personal use only.
Annu. Rev. Immunol. 2017.35. Downloaded from www.annualreviews.org

www.annualreviews.org • Computational Models 439

Changes may still occur before final publication online and in print

You might also like