Professional Documents
Culture Documents
Bioengineering Program, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
Department of Pathology, Yale School of Medicine, New Haven, CT, USA
Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
Department of Neurology, Yale School of Medicine, New Haven, CT, USA
Department of Science Education, Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, USA
Human and Translational Immunology Program, Yale School of Medicine, New Haven, CT, USA
Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
Department of Genetics, Harvard Medical School, Boston, MA, USA
AbVitro, Inc., Boston, MA, USA
Edited by:
Ramit Mehr, Bar-Ilan University, Israel
Reviewed by:
Ronald B. Corley, Boston University
School of Medicine, USA
Masaki Hikida, Kyoto University,
Japan
*Correspondence:
Steven H. Kleinstein, Department of
Pathology, Yale School of Medicine,
Suite 505, 300 George Street, New
Haven, CT 06511, USA
e-mail: steven.kleinstein@yale.edu
1.
INTRODUCTION
www.frontiersin.org
C T (thymine) transition mutations (2). The AID-induced mismatches can alternatively be recognized by UNG or MSH2/MSH6
to initiate base excision or mismatch repair pathways, respectively.
These pathways operate in an error-prone manner to introduce the
full spectrum of mutations at the initial lesion, as well as spreading mutations to the surrounding bases. Overall, SHM introduces
point mutations into the Ig locus at a rate of 103 per base-pair
per division (3, 4). While the process of SHM appears to be stochastic, there are clear intrinsic biases, both in the bases that are
targeted (5, 6) as well as the substitutions that are introduced (7, 8).
Accurate background models for SHM micro-sequence targeting
Yaari et al.
previous models. We also find that the nucleotide substitution profiles at all bases are dependent on the surrounding nucleotides. The
S5F targeting and substitution models can be employed as background distributions for mutation analysis, such as the detection
and quantification of affinity-dependent selection in Ig sequences
(11, 20). These models improve dramatically the ability to analyze
mutation patterns in Ig sequences, and provide insights into the
SHM process.
2. RESULTS
To develop models for SHM targeting and substitution preferences, we curated a large database of mutations from highthroughput sequencing studies (Table 1). These data were derived
from 7 human blood and lymph node samples, and Ig sequencing
was carried out using both Roche 454 and Illumina MiSeq nextgeneration sequencing technologies. In total, the data contained
42,122,509 raw reads, which were processed (see Materials and
Methods) to arrive at 1,145,182 high-fidelity Ig sequences, which
were each supported by a minimum of two independent reads in
a sample. These high-fidelity sequences were clustered to identify
clones (sequences related by a common ancestor) and one effective
sequence was constructed per clone so that each observed mutation corresponded to an independent event. Overall, this process
produced a set of 806,860 synonymous mutations that were used
to model somatic hypermutation targeting and substitution.
2.1.
NBCCATC
NCCCATC
+ NGCCATC + NTCCATC
Yaari et al.
Table 1 | Next-generation sequencing data sets used to construct the S5F targeting and substitution models.
Study
Sample
Subject
Tissue
Tech.
Raw reads
3931LN
4014LN
LN
MiSeq
3,641,633
LN
MiSeq
3,714,152
1
1
4106LN
LN
MiSeq
10,917,517
3928LN
LN
MiSeq
7,691,509
PGP1-1
PBMC
MiSeq
PGP1-2
PBMC
PGP1-3
PGP1-4
hu420143
3
3
Total
Processed reads
Clones
# Mutations (substitution)
# Mutations (targeting)
79,777
16,272
25,307
53,840
106,006
32,972
57,215
106,265
231,387
54,400
108,591
208,338
99,519
76,375
68,051
132,795
3,851,658
55,606
50,514
23,939
48,558
MiSeq
3,946,514
59,611
54,374
24,971
50,117
PBMC
MiSeq
4,543,353
48,971
45,788
20,865
42,737
PBMC
MiSeq
3,121,884
52,844
49,054
23,243
47,049
PBMC
454
178,584
92,055
14,956
23,260
48,838
420IV
PBMC
454
398,517
248,363
39,047
24,771
50,899
PGP1-5
PBMC
454
117,188
71,043
12,275
8,209
17,424
42,122,509
1,145,182
446,027
408,422
806,860
Tissue types are lymph node (LN) or peripheral blood mononuclear cell (PBMC). The different filters applied to arrive at the number of (synonymous) mutations used
for the targeting and substitution models are described in the text. All three studies relate to manuscripts in preparation.
www.frontiersin.org
Yaari et al.
SYC
WRC
1.0
GA
GC
GT
0.8
A
A
0.6
A
C
T
AC
AC
T
T
CT
C
A
C
0.2
A
C
C
A
0.4
C
T
T
T
CT
WA
C
G
C
A
A G
Middle
Upstream
Downstream
Hot
spots
Pearson
0.40
0.37
0.15
0.04
Spearman
0.20
0.24
0.23
0.09
Pearson
0.58
0.57
0.61
0.73
Spearman
0.61
0.58
0.64
0.79
GCCAC
GGCTT
GGCAA
GTCTG
GTCTC
GTCCA
GTATA
ACAAT
GGCAC
GT
A
T
T
C
A
A G
ACTGC
A T
G
T
A T
CCTGG
GT
GGTAC
G
T
Mutability
TCTTC
GT
T
C
G
0.0
Substitution
WA
0.4
Model
WA
0.2
GGGAA
SYC
A
GTGTA
GCGAG
ACGAT
CGA
GGA
SYC
CGT
TGG
SYC
GCTTC
0.6
SYC
NA
NC
NG
NT
TCCAA
0.8
WA
TGT
B
1.0
RS1F
S1F
G: Literature
0.0
averaging over surrounding bases. 3-mer motifs were estimated using silent
mutations and dependencies on the immediately adjacent bases (S3F), while
5-mer motifs refer to the complete S5F model. Horizontal lines in (A) indicate
the substitution values for the S1F model following the color scheme shown
in the legend. Motifs that fall into one of the standard hot or cold-spots
categories are indicated by the motif above the column.
single aggregated targeting model for each data set. Estimating the
relative mutabilities of 5-mer motifs for an individual Ig sequence
involves two steps: (1) Calculating the background frequency of the
different 5-mers based on the germline (unmutated) version of the
sequence, and (2) creating a table of the 5-mers that were mutated
in the sequence. To avoid the confounding influence of selection,
only mutations that were synonymous (i.e., that do not produce
an amino acid exchange in the germline context) were included
in the analysis. Note that these criteria are slightly different from
those used in the substitution model. In the substitution model,
mutations were used only where all possible mutations at that
position had to be synonymous, while all synonymous mutations
were considered for mutabilities (see Table 1).
For each of the 1024 possible 5-mers motifs (M) in each
Ig sequence, the background frequency (B M ) was calculated as
follows:
BM =
X X
i
(i, M, b)
SbM I
GL
(1)
Yaari et al.
(i, M)
I
GL,OS
(2)
2.2.3.
(i, M) is 1 in
sequence (OS), and the indicator function I
GL,OS
cases where the 5-mer surrounding GL[i] is M and a mutation in
position i from GL[i] to OS[i] is synonymous and 0 otherwise.
E and BE , a mutability score, , was
After calculating the arrays C
defined for each motif M in the vector (for sequence j) as:
j
M = CM /BM
which was then normalized to one:
X j
j
j
m
M = M /
(3)
(4)
m
j
www.frontiersin.org
3.
3.1.
Yaari et al.
G
C
G
C
G
C
3931LN
C
C
G
A
C
A
C
G
A
G
C
A
G
A
G
C
A
C
T
C
T
C
A
T
C
A
G
C
C
G
A
T
G
C
A
T
G
C
G
T
A
C
T
C
G
G
C
T
G
C
T
G
C
A
C
G
C
A
T
A
T
C
A
T
G
A
T
G
A
C
G
T
C
T
A
C
A
T
C
T
A
T
A
T
C
G
T
A
T
A
T
C
A
T
C
A
C
T
A
G
T
A
A
G
T
C
T
G
T
C
G
A
C
G
A
T
A
C
T
A
TT
C
A
T
G
A
T
A
T
C
G
A
C
T
G
A
C
T
A
T
G
T
A
C
T
A
G
T
A
C
G
C
G
A
T
G
C
T
C
A
A
A
T
C
T
A
T
C
A
G
T
C
G
A
C
G
T
C
C
A
T
G
C
T
A
G
C
T
A
G
A
T
C
A
T
A
C
T
A
T
C
T
A
T
C
T
C
A
T
C
G
T
C
A
G
T
C
A
G
A
C
A
C
G
C
T
C
A
G
T
C
T
A
C
T
G
A
T
G
T
C
A
C
T
G
C
T
G
A
C
C
A
C
A
T
C
A
C
T
G
C
G
C
G
A
C
C
A
G
T
C
A
G
C
C
T
G
C
T
G
C
C
A
G
A
C
G
C
G
C
T
G
C
A
G
G
C
C
G
C
A
C
G
C
T
A
C
G
C
C
T
G
T
C
G
G
C
T
C
G
C
G
C
G
C
0
100
200
300
400
2.00
1.00
0.50
0.20
0.10
C
G
0.05
500
0.05
rank
G
G
C
CGG
C
C G
C
CCG
C
C A
G A CA
CGG A
A G
C AG
C
AC
TTT
C
C
CT
C
A
CG
C
CT
C
GT AA
AG
T
A
C
A
T
C
G
TCGT
A
C G
T
G
C
G
G
G
C
T
C
G
C
A
C
GA
G
CC
A
TT T
A
T
TC
C
A
TA
G
A
T
G
C
G
T
C
AC
TA
TA
C
CGC
T
A
TA A
A
T
GC
TTT
TT
T
C
TA
C
A
TTA
A
A
C
AG
T
A
A
A
G
CT
TAAA
T
T
C
C
G
G
C
G
AA
A
C
T
A
A
T
T
T
T
CT
C
A
A
T
G
T
G
C
A
TT
A
TT
A
TAAAT
TA
TG
T
C
T
A
G
T
A
C
G
C
AC
G
A
T
CG
C
C
A
A
CT
TT
T
A
T
T
C
G
G
C
G
T
C
C
A
T
T
C
A
G
T
G
GA
A
A
CC
A
T
C
T
T
A
CT
TA
C
TA
G
G
A
GT
TA
C
C
CT
G
T
T
A
TT
A
G
C
A
T
T
T
G
C
C
C
A
G
C
G
C
G
G
T
A
C
G
T
C
G
A
G
G
G
C
A
T
CA
T
C
T
C
C
G
G
0
6
4
2
0
10
G
G CCGC
C G
C
CC
CG A
G AC CG A
ACG
GG A
ACCCCCA
T T
T
TA
C
CG
C
GC
A
AT
GA
C
A
T
G
T
G
TC
ATG
G
G
G
G
T
T
C
C
T
G
C
C
A
C
G
G
C
A
AA
TT
C
AA
T
G
G CAT
A
T
G
C
TG
C
C
T
A T
AAAC
AA
T
T
TT
A
A
T
A
T
C
G
T
TA
T
A
A
A
T
T
T
CC
C
A
A
A
C
A
A
A
G
TTG
T
T
C
T
G
TT
T
G
A
G
G
A
A
A
C
A
TA
T
T
TT
C
AC
TA
G
T
A
C
G
C
A
G
A
T
AA T C
T
TA
T
CA
T
T
AA
G
C
A
G
A
T
G
C
C
T
C
A
A
A
A
C
C
T
C
A
T
C
G
T
C
C
T
G
TT
A
A
G
T
G
C
GA
C
T
C
C
T
AAA
T
C
T
T
C
TT
A
T
G
G
A
G
C
C
T
TT
T
G
A
C
A
C
TT
T
G
T
T
G
A
C
A
C
CT
CA
T
C
CG
CG
C
C
G
A
G
G
C
A
G
A
T
G
G
C
G
C
C
G
A
C
T
A
G
TA
T
T
G
C
G
0
100
0.20
0.05
G
C
GC G
C
G
G
C
AAC
TAGC
G
CG
CC T
TTGT
TT
AG
C
C
G
A
A
C
C
A
C
AT
G
TT
A
ACGA G
AG
T A
TC
TG
A
A
G
T
TA
TA
A
C
G
AG
AG
A ATT
G
C
C
AA
A
CC
A
A
T
TGT
C
G
A
C
T
T
A
G
TA
TG
C
AA
G
A
C
A
AT
T
C
AT
A
G
TTC
T
A
A T
CT
G
C
CA
T
C
A
G
C
C
AC
T
A
A ACT
TT
T
CC
TA
CG
T
G
A
G
A
A
T
A
T
G
C
A
T
A
T
C
C
G
C
C
TGT
T
CA
T
CGG
T
TA
TC
T
A
A
T
GC AC TC
A
C
CT
C
G
C
A
A
T
C
C
C
A
A
A
G
T
TA
T
A
T
TGAT
A
A
T
C
A
A
T
C
G
AG
CT
T
A
T
AT
G
A
C
T
CA
ATAT
C
T G GG
AAT
T
T
T
C
C
A
TA
CTT A
A
C
T
TTT
A T AA
C
C
C
TC
C
T
G
G
T
C
T
T
G
G
A
C
A
AA
T AA
TG
TT
G
G
TTC
TGC
TG C ATCG
C
C
T
C
G T
T
GTC
CG
C
G
TT
C
TGG
C
C
A
A
G
G
C
G
GC
ACTA
T
G
C
CG
G
T
C
CG
A
CC
C
T TAA
CC
C
CC
C
G
C
C
C
C
T
G
A
C
G
G
GC
G
C
G
C
CC
T
CG
CG
G
C
CC
C
G
C
C
G
GCC
G
CGC C
CCC
C
C GG
G
C
GG CC
CCCC
C
G
C
G G GC
C
G
CC C C
C
0.05
6
4
300
400
0.50
2.00
5.00
2.00
1.00
0.50
0.20
0.10
0.05
Pearson: 0.94
Spearman: 0.93
G
C
CG
G
A
CG
A
TCC
C
G
ACG
C
GT
T
T
T
GT
A
A
C
C
G
A
G
A
C
T
AA
C
G
AA
G AG
T
C
C
A
G
T
T
A
TT
C
A G
G
T
A
TT
A
A
TC G
G
A
G
A
C
C
C
A
A
C
T
A
T
A
T
AA
T
TC
A
G
T
GT
C
A
TTGA
A
C
A
G
TC
C
A
T
T
A
A
T
A
C
C
G
T
T
C
A
C
T
T
C
T
A
C
G
C
A
TA
T
T
C
T
T
A
T
G
TCG A
A
A
G
G
C
TATA
C
A
A
T
TA
T
G
C
A
A
TTC
C
C
T
A
C
GCC
C
T
C
CA
T
G
T
TCTC
TG
C A C
TG
A
TA
C
A
TA
A
C
G
A
AC
G
T
C
C
G
A
T
G
GA
TTAA
TA
T
A
T
TT
C
T
A
A
AT
AC
G
C
A
AA
T
T
G
AT
T
ACTA
C
A
C
T
G
A
G
T
T
A
T
T
T
C
A
A
A
T
A
TA
C
A
C
G
T
T T
TC
AA
C
C
C
TT
CC
G
A
C
G
T
T
T
T
G
CA ATTT
A
AC
ATTT
G
TG
C
T
A
T
G
TG
C
CAG
TG
C
GT
C
G
AG
T
C TG
C
T
T
C
C
T
T
C
G
C
AT
G
GC
G
GTG
C
GAC
C
AA
G
C
G
C
T
G
C
TCC
GG
CC
CC C
C
A
GC
G
C
C
C
C GT
A
GG
C
G
G
CGT T
G
CC
CC
C
CC
G
C
G
C
G
G
C
C
GG
C
G CC
C
C
G GCG
C
C
C
C
CC
C
CC
GC
C
GCC C
GG
G C
C
G
C C
C
TG
C
500
0.05
0.20
0.50
2.00
5.00
WRC/GYW
WA/TW
SYC/GRS
Neutral
6
G
GG
T CG C
C AC
G GG
C
AT
C
CTA
TA
G
C
T
A
AAG
CC
G
C
A
A
G
ATG
A
T
AG A
C
T
C
C
T
TG
TCAT
GA
C
AA
A
T
T
T
A T
G
GA
A
A
T
A
TT
A
T
C
C
G
TA
A
G
C
T
AC
C
G
T
G
T
A
TA
T
AT
C
TA
C
C
C
GG
T
TA
G
A
CA
A
A
G
G
T
C
C
T
C
T
G
T
T
A
CT
G
A
T
A
T
T
T
G
C
C
C
C
C
A
C
TAA
AA
T
A
T TCG
T
TT
AT
T
TT
A
A
G
TT
C
GA
A
A
T
A
T
T
T
C
AC
G
T
T
G
T
A
A
AT
G
C
C
A
A
C
T
A
C
T
AA
T
C
G
T
G
C
C
T
C
GT
G
A
AA
G
TC
T
A
T
G
TT A
G
A
C
G
A
A
T
G
T
G
C
C
G
A
A
C
C
C
AT
TG
C
T
T
T
T
A
A
C
T
G
G
G
C
TG
A
TA
C
C
G
T
C
T
G
C
CT
C
T
TC
GT
AA
CT
C
A
CA
G
A
G
C
T
G
A
C
G
C
T
TC
A
T
T
C
A
G
C
C
C
CC
C
G
C
G
TA
G
0
0
6
mutability ; 4106LN
5.00
10.00
200
0.20
10
mutability ; 4106LN
3.1.1.
0.50
5.00
G
G
G
T
G
C
A
C
G
G
C
A
T
C
T
A
G
C
A
G
C
T
A
G
C
A
G
T
A
T
C
C
G
T
A
G
T
A
G
C
T
A
T
T
A
G
A
T
A
A
T
C
G
T
A
C
T
G
C
A
T
C
G
T
G
C
T
A
T
A
C
T
A
C
T
C
G
A
T
G
A
C
A
A
G
T
C
C
T
G
A
G
T
C
A
T
C
G
A
T
A
T
G
T
C
C
A
C
A
T
A
A
C
T
A
T
T
A
T
A
G
T
C
G
A
T
C
T
A
C
A
TT
C
A
G
C
T
G
TT
A
T
A
T
A
G
C
C
T
C
A
T
A
C
A
T
A
T
G
A
G
C
T
G
A
C
T
A
T
C
G
C
TT
A
G
A
G
C
T
T
A
T
G
A
C
G
C
A
T
G
T
C
T
A
G
C
A
G
A
C
T
C
TA
A
G
C
TT
A
T
C
T
G
C
A
C
T
C
G
T
C
G
C
T
A
C
T
A
G
C
G
T
C
T
C
G
T
C
T
A
C
G
G
C
T
G
C
G
TT
G
T
A
T
C
G
A
C
G
T
G
C
T
G
A
C
G
A
C
G
C
T
C
G
G
C
C
T
C
G
C
TA
A
C
G
C
T
C
A
G
C
C
T
A
C
G
C
A
G
C
G
C
C
C
G
C
C
G
C
G
C
G
C
G
C
G
G
C
G
C
G
T
G
0
mutability ; 4014LN
mutability ; 3931LN
1.00
rank
2.00
4014LN
mutability ; 4014LN
8
0.50
WRC/GYW
WA/TW
SYC/GRS
Neutral
10
CT
G
0.20
2.00
0.10
Pearson: 0.91
Spearman: 0.89
5.00
mutability
mutability ; 3931LN
GGT G
G
C
C
CC
G
GAA
G
C
TTG
C
T AAA
A
GC
TAC
C
G
AG
AT
T
AA
CAC G
TC
C
G
C
T
A
G
T
A
GCC
A
A
A
AT
TA
A
G
ATA TT
A
A
TT
C
CAC
GA
TG
A
C
T
GG
C
T
A
TA
ATT
C
TTGC
A
C G
CT
CA
G
G
TT
AG
A
A
A
G
C
TTTT
TC
G
A
A
GA
C
A
T
C
AT
G
G
C
C
CC
AC
C
A
T
T
A
A
A
A
C
T
AA T
TA
T
T
A
T
T
A
A
TG
T
C
A
G
ACTC
T
T
A
T
A
TG
TAAA
TC
C
C
C
T
T
T
T
G
T
A
T
A
T
A
G
C
C
C
TG
A
A
C
T
AATTTTT
AA
AAC
T
T
A
TC
G
C
A
C
G
T
G
A
CC
TA
T
C
T
G
CT
G A GAAT
AA
C
C
C
T
A
T
A
GT
T
G
C
C
A
T
A
G
T
C
T
TG
G
C
AG
A
G
A
C
TT
A
C
C
C
A
G
G
C
T
C
A
T
T
C
T
CGCA
C
C
C CA G
T
C
TG
G
C
TT
C
TTC
AC
A
G
C
G
CT
T
C
TGG
T
C
A
C
C
G
C
G
T
G
C
C
T TA
G
G
T
TG
A
TAAG
CC
CG
G
T
CG
CG
G TT
CC
C
GG
G
C
CC
G
C
CC
C
T
C
C
A
C
G
C
TTT
C
C
G
A
C
C
C
C
A CC C
G
CC
A
G
CCCA
GCC
CG
C
G
CCCC
GGC
C
C C C C
GC
CC
G
C
GGG
G
G
C
C C
G
T
5.00
T
C
10.00
Pearson: 0.91
Spearman: 0.9
10.00
mutability
mutability
G
G
WRC/GYW
WA/TW
SYC/GRS
Neutral
G
C
C
G
C
G
A
C
C
A
T
G
C
G
T
A
C
A
G
C
T
T
G
A
C
G
A
C
T
A
C
A
G
T
A
A
G
C
G
A
T
G
TA
C
A
G
T
A
T
G
A
C
T
G
A
C
G
C
A
C
A
T
A
T
G
T
C
G
A
T
C
T
A
G
T
G
C
A
G
T
A
C
G
A
T
C
A
T
A
G
A
T
A
T
C
G
T
A
C
T
C
T
A
G
C
C
G
A
T
A
C
TT
A
G
T
A
G
C
G
TA
C
A
T
A
T
G
C
A
T
A
T
C
T
C
A
G
C
T
A
C
T
G
C
T
C
T
A
T
G
C
A
T
C
A
C
G
A
C
A
G
A
T
C
T
C
G
A
T
G
A
T
G
T
T
A
T
C
T
A
T
C
A
G
A
T
C
A
T
A
T
G
A
C
T
A
C
T
A
G
T
C
A
T
C
A
A
T
A
T
A
C
G
T
T
A
C
C
T
C
T
G
A
C
G
C
T
T
G
A
C
T
A
C
A
T
A
T
G
T
C
T
G
A
T
C
G
T
C
G
C
T
G
C
G
A
T
C
G
C
TT
A
G
T
C
G
C
G
A
C
G
C
G
A
T
G
A
C
G
T
G
C
T
G
A
C
T
G
C
C
A
G
C
A
C
G
C
C
T
G
A
C
G
C
G
C
G
T
C
G
C
C
G
C
C
T
G
C
G
C
G
C
C
G
C
G
C
G
C
G
C
G
C
G
C
G
C
G
C
G
C
G
C
T
G
C
4106LN
100
200
300
400
500
rank
highest) and color coded by their category (WRC/GYW are red, SYC/GRS are
blue, WA/TW are green, and the rest are gray). Symbols indicate the mutated
nucleotide (in the center of the 5-mer). Correlations between the mutabilities
for all 5-mer motifs across individuals are shown in the upper (log-log scale)
and lower (linear scale) triangles.
Yaari et al.
NNANN
NNTNN
NNCNN
NNGNN
WRC/GYW
WA/TW
SYC/GRS
Neutral
3'
5'
G
AT
C
T G
G C
A
AT
GC
A
T
A
G
T
A
T
G
G
5'
C
C
C
T
A
A
T
C
A
C
G
G
G
TA
C
A C
G
GT
AC
G
G
AC
GT
C
TA
G
A C
C G
T
TA
CG
T
A
T
C
A
C
C
3'
G
A
T
T
A
G
T
G
C
C
C
AT
G
CA
TG
C
NNANN
T G
C
AT
G
AT
C
TG
CA
C
T G
G C
A
AT
GC
A
A
T
G
T
A
T
G
G
5'
C
C
C
T
A
A
T
C
A
C
G
G
G
TA
C
A C
G
GT
AC
G
NNTNN
TA
G
AC
GT
C
TA
G
A C
C G
T
TA
CG
T
T
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
C
A
C
C
3'
G
G
G
A
T
T
A
G
T
G
C
C
C
AT
G
A
C
G
G
A
GC
T
G
C
CG
G
C
CG
TA
G
T A
AC
CGT
GT
ACGT
T AC
ACGT ACGT ACG
T
C G
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
C
C
T
CG
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
C
A
T
G
C
G
A
GC
C
G
GC
AT
C
A T
TG
GCA
CA
TGCA
A TG
TGCA TGCA TGC
A
G C
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
C
G
G
A
GC
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
G
G
C
G
T
A
C
G
5'
T
A
C
T
CG
G
C
TA
CG
TA
G
T A
AC
CGT
GT
ACGT
T AC
ACGT ACGT ACG
T
C G
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
C
C
T
CG
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
C
A
T
G
C
3'
A
T
C
A
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
C
TG
CA
CA
TG
C
NNCNN
T G
C
A
G C
GC
AT
C
A T
TG
GCA
CA
TGCA
A TG
TGCA TGCA TGC
AT
GC
NNGNN
WRC/GYW
WA/TW
SYC/GRS
Neutral
3'
5'
TG
C
T G
G C
A
AT
GC
T
A
A
T
G
G
C
T
A
A
T
C
A
C
G
G
G
TA
C
T
G
C
C
T
CG
GT
AC
G
A C
G
T
C G
G
T A
AC
CGT
GT
ACGT
T AC
ACGT ACGT ACG
CG
TA
TA
AC
CG
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
AC
GT
G
A C
C G
T
TA
CG
A
T
C
C
G
A
T
T
A
G
T
G
C
C
C
AT
G
A
C
G
G
A
GC
CA
TG
C
T G
C
A
G C
C
A T
TG
GCA
CA
TGCA
A TG
TGCA TGCA TGC
GC
AT
AT
GC
FIGURE 3 | Hedgehog plots for the S5F targeting model for somatic
hypermutation targeting A, T, C, and G nucleotides (individual circles).
(A) 5-mer mutabilities estimated directly from the Ig sequencing data. (B) The
complete S5F targeting model after inferring values for missing 5-mer motifs.
Bars radiating from each circle depict the mutability as a function of
surrounding bases. The inner-most ring in the circle corresponds to two
by PCR in a multiplex reaction using primer sets for all possible V-regions (n = 45) and isotype/J-regions (n = 6) to generate
heavy chain transcripts. The amplified library was tagged with
barcodes for sample multiplexing, PCR enriched, and annealed
to the required Illumina clustering adapters. High-throughput
250 base-pair paired-end sequencing was performed using the
Illumina MiSeq platform. Raw reads were exported without the
sample barcodes and Illumina clustering adapters.
3.1.2.
TG
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
C
TG
CA
C
T G
G C
A
AT
GC
A
T
T
A
A
T
G
G
C
T
A
A
T
C
A
C
G
G
G
TA
C
T
G
C
C
T
CG
GT
AC
G
A C
G
T
C G
G
T A
AC
CGT
GT
ACGT
T AC
ACGT ACGT ACG
CG
TA
TA
CG
AC
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
G
AC
GT
G
A C
C G
T
TA
CG
T
A
C
G
T
T
A
C
A
T
C
C
G
A
T
T
A
G
T
G
C
C
C
AT
G
A
C
G
G
A
GC
CA
TG
C
T G
C
A
G C
C
A T
TG
GCA
CA
TGCA
A TG
TGCA TGCA TGC
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
3'
G
G
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
5'
C
C
G
C
A
T
G
C
C
A
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
3'
G
G
C
G
T
A
C
G
5'
T
A
G
T
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
A
5'
C
C
G
C
A
T
G
C
3'
A
T
C
A
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
A
C
G
T
C
TG
CA
GC
AT
AT
GC
positions upstream (50 ) of the mutated base for A and C, and two positions
downstream (30 ) of the mutated base for T and G. Bar colors indicate known
hot/cold-spot motifs (WRC/GYW are red, WA/TW are green, SYC/GRS are
blue, and neutral are gray). Each plot corresponds to a different mutated
nucleotide. Error bars indicate confidence intervals based on the middle 50%
quantiles of the mutability value distribution across the set of samples.
and 6 nested constant region primers. Following ligation of 454compatible sequencing adapters, the expected heavy chain V gene
fragments were purified using PAGE. Each sample was uniquely
barcoded during the ligation process, allowing subsequent mixing of all the samples into one common reaction sample (performed independently for each replicate run). Emulsion PCR
and 454 GS FLX sequencing were performed directly at the 454
Life Sciences facility according to the manufacturers standard
protocols.
www.frontiersin.org
3.2.
Yaari et al.
S5F model
10
mutability
WRC/
GYW
WRCA/
TGYW
WRCT/
AGYW
WRCC/
GGYW
WRCG/
CGYW
SYC/
GRS
WA/
TW
Neutral
Trinucleotide model
3.0
2.5
mutability
2.0
1.5
1.0
0.5
WRC/
GYW
WRCA/
TGYW
WRCT/
AGYW
WRCC/
GGYW
WRCG/
CGYW
SYC/
GRS
WA/
TW
Neutral
Yaari et al.
A
4
Negative selection
mutability
FWR1
CDR1
FWR3
CDR2
CDR3
0.5
0.4
0.3
4
Expected Mutability
mutation frequency
FWR2
Pearson:
2
1
0
0.0
0.2
0.673
0.1
0.2
0.3
0.4
Observed mutation frequency
0.1
0.0
FWR1
CDR1
FWR2
AT
CDR2
FWR3
CDR3
Two positions with evidence of negative selection (red circles) and one
position with positive selection (blue circles) are indicated. The threshold for
calling a position with significant selection was set to 3 SD away from the
linear regression line (shown as a solid line in the inset, with thresholds
plotted as dashed lines).
CG
**
0.03
0.02
0.01
0.10
WA/TW
WA
TW
www.frontiersin.org
0.08
mutability
mutability
0.04
0.06
0.04
0.02
0.00
WRC/GYW
WRC
GYW
Yaari et al.
4.
DISCUSSION
ACKNOWLEDGMENTS
We thank the Yale High Performance Computing Center (funded
by NIH grant: RR19895) for use of their computing resources.
Funding: this work was partially supported by NIH R03AI092379.
The work of Jason A. Vander Heiden, Daniel Gadala-Maria and
Namita Gupta were supported in part by NIH Grant T15 LM07056
from the National Library of Medicine.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online
at http://www.frontiersin.org/journal/10.3389/fimmu.2013.00358/
abstract
REFERENCES
1. Chahwan R, Edelmann W, Scharff MD, Roa S. AIDing antibody diversity by
error-prone mismatch repair. Semin Immunol (2012) 24(4):293300. doi:10.
1016/j.smim.2012.05.005
2. Peled JU, Kuang FL, Iglesias-Ussel MD, Roa S, Kalis SL, Goodman MF, et
al. The biochemistry of somatic hypermutation. Annu Rev Immunol (2008)
26(1):481511. doi:10.1146/annurev.immunol.26.021607.090236
3. McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. Generation of
antibody diversity in the immune response of BALB/c mice to influenza virus
hemagglutinin. Proc Natl Acad Sci U S A (1984) 81(10):31804. doi:10.1073/
pnas.81.10.3180
4. Kleinstein SH, Louzoun Y, Shlomchik MJ. Estimating hypermutation rates from
clonal tree data. J Immunol (2003) 171(9):463949.
5. Betz AG, Rada C, Pannell R, Milstein C, Neuberger MS. Passenger transgenes
reveal intrinsic specificity of the antibody hypermutation mechanism: clustering, polarity, and specific hot spots. Proc Natl Acad Sci U S A (1993) 90(6):23858.
doi:10.1073/pnas.90.6.2385
6. Shapiro GS, Aviszus K, Ikle D, Wysocki LJ. Predicting regional mutability in
antibody v genes based solely on di- and trinucleotide sequence composition. J
Immunol (1999) 163(1):25968.
7. Smith DS, Creadon G, Jena PK, Portanova JP, Kotzin BL, Wysocki LJ. Di- and
trinucleotide target preferences of somatic mutagenesis in normal and autoreactive b cells. J Immunol (1996) 156:264252.
8. Cowell LG, Kepler TB. The nucleotide-replacement spectrum under somatic
hypermutation exhibits microsequence dependence that is strand-symmetric
Yaari et al.
www.frontiersin.org