High-Throughput DNA Sequencing - Concepts and Limitations

Methods, Models & Techniques
High-throughput DNA sequencing –

concepts and limitations
Martin Kircher and Janet Kelso!
Recent advances in DNA sequencing have revolutionized the field of genomics, modern biology and completely trans-
making it possible for even single research groups to generate large amounts of formed the field of genetics.
sequence data very rapidly and at a substantially lower cost. These high- At the time of the sequencing of
wX174, and for almost another decade,
throughput sequencing technologies make deep transcriptome sequencing
DNA sequencing was a barely auto-
and transcript quantification, whole genome sequencing and resequencing mated and very tedious process which
available to many more researchers and projects. However, while the cost involved determining only a few hun-
and time have been greatly reduced, the error profiles and limitations of the dred nucleotides at a time. In the late
new platforms differ significantly from those of previous sequencing technol- 1980s, semi-automated sequencers
with higher throughput became avail-
ogies. The selection of an appropriate sequencing platform for particular types
able [6, 7], still only able to determine
of experiments is an important consideration, and requires a detailed under- a few sequences at a time. A break-
standing of the technologies available; including sources of error, error rate, as through in the early 1990s was the
well as the speed and cost of sequencing. We review the relevant concepts and development of capillary array electro-
compare the issues raised by the current high-throughput DNA sequencing phoresis and appropriate detection sys-
technologies. We analyze how future developments may overcome these limita- tems [8–12]. As recently as 1996, these
developments converged in the pro-
.
tions and what challenges remain.
duction of a commercial single capillary
sequencer (ABI Prism 310). In 1998, the
Keywords: GE Healthcare MegaBACE 1000 and the
ABI/Life Technologies SOLiD; Helicos HeliScope; Illumina Genome Analyzer; ABI Prism 3700 DNA Analyzer became
Roche/454 GS FLX Titanium; Sanger capillary sequencing the first commercial 96 capillary
sequencers, a development which was
termed high-throughput sequencing.
Introduction sequenced [1] using a technology Over the last decade, alternative
invented just a few years earlier [2–5]. sequencing strategies have become
In 1977 the first genome, that of the Since then the sequencing of whole available [13–18] which force us to com-
5,386 nucleotide (nt), single-stranded genomes as well as of individual regions pletely redefine ‘‘high-throughput
bacteriophage wX174, was completely and genes has become a major focus of sequencing.’’ These technologies out-
perform the older Sanger-sequencing
DOI 10.1002/bies.200900181 technologies by a factor of 100–1,000
in daily throughput, and at the same
Department of Evolutionary Genetics, Max ChIP-Seq, Chromatin Immuno-Precipitation sequen- time reduce the cost of sequencing
Planck Institute for Evolutionary Anthropology, cing; CNV, Copy Number Variation; dNTPs/NTPs, one million nucleotides (1 Mb) to
Leipzig, Germany deoxy-nucleotides; ddNTPs, dideoxy-nucleotides
(modified nucleotides missing a hydroxyl group at 4–0.1% of that associated with Sanger
*Corresponding author: the third carbon atom of the sugar); GA, Short sequencing. To reflect these huge
Janet Kelso for Illumina Genome Analyzer; InDel, Insertion/ changes, several companies, research-
E-mail: kelso@eva.mpg.de Deletion; kb/Mb/Gb, kilo base (l03 nt)/mega
base (106 nt)/giga base (l09 nt); MeDIP-Seq,
ers, and recent reviews [19–24] use the
Abbreviations: Methyl- ation-Dependent Immuno-Precipitation term ‘‘next-generation sequencing’’
A/C/G/T, Deoxyadenosine, Deoxycytosine, Deoxy- sequencing; nt nucleotide(s); PCR, Polymerase instead of high-throughput sequencing,
guanosine, Deoxythymidine; ATP, Adenosine tri- Chain Reaction; RNA-Seq, Sequencing of yet this term itself may soon be outdated
phosphate; dATPaS, Deoxy-adenosine-5’-(alpha- mRNAs/transcripts; SAGE, Serial Analysis of
thio)-triphosphate; CCD, Charge-coupled Device, Gene Expression; SNP, Single Nucleotide Poly- considering the speed of ongoing
i.e. semi-conductor device used in digital cameras; morphism; mRNA, messenger RNA/transcripts. developments.
524 www.bioessays-journal.com Bioessays 32: 524–536,! 2010 WILEY Periodicals, Inc.

...... Methods, Models & Techniques M. Kircher and J. Kelso
Here we review the five sequencing modified nucleotides missing a hydro- Unfortunately, there is still little auto-
technologies currently available on the xyl group at the third carbon atom of the mation for creation of the high copy
market (capillary sequencing, pyrose- sugar). The dNTP/ddNTP mixture input DNA with known priming sites.

quencing, reversible terminator chem- causes random, non-reversible termin- Typically this is done by cloning, i.e.,
istry, sequencing-by-ligation, and virtual ation of the extension reaction, introducing the target sequence into a
terminator chemistry), discuss the intrin- creating from the different copies mol- known vector sequence using restriction
sic limitations of each, and provide an ecules extended to different lengths. and ligation procedures and using a
outlook on new technologies on the Following denaturation and clean up bacterial strain to amplify the target
horizon. We explain how the vast of free nucleotides, primers, and the sequence in vivo – thereby exploiting
increases in throughput are associated enzyme, the resulting molecules are the low amplification error due to
with both new and old types of problems sorted by their molecular weight (corre- inherent proof-reading and repair mech-
in the resulting sequence data, and how sponding to the point of termination) anisms. However, this process is very
these limit the potential applications and the label attached to the terminat- tedious and is sometimes hampered
and pose challenges for data analysis. ing ddNTPs is read out sequentially in by difficulties such as cloning specific
the order created by the sorting step. A sequences due to their base compo-
Sanger capillary sequencing schematic representation of this process sition, length, and interactions with
is available in Fig. 1. the bacterial host system. Although
Current Sanger capillary sequencing Sorting by molecular weight was not yet widely used, integrated micro-
systems, like the widely used Applied originally performed using gel electro- fluidic devices have been developed
Biosystems 3xxx series or the GE phoresis but is nowadays carried out by which aim to automate the DNA extrac-
Healthcare MegaBACE instrument, are capillary electrophoresis [7, 25]. tion, in vitro amplification, and sequenc-
still based on the same general scheme Originally, radioactive or optical labels ing on the same chip [26–29].
applied in 1977 for the wX174 genome were applied in four different terminator Using current Sanger sequencing
[1, 3]. First, millions of copies of the reactions (each sorted and read out technology, it is technically possible
sequence to be determined are purified separately), but today four different flu- for up to 384 sequences [29, 30] of
or amplified, depending on the source of orophores, one per nucleotide (A, C, G, between 600 and 1,000 nt in length
the sequence. Reverse strand synthesis and T) are used in a single reaction [6]. [23, 31] to be sequenced in parallel.
is performed on these copies using a Additionally, the advent of more sensi- However, these 384-capillary systems
known priming sequence upstream of tive detection systems and several are rare. The more standard 96-capillary
the sequence to be determined and a rounds of primer extensions (equivalent instruments yield a maximum of
mixture of deoxy-nucleotides (dNTPs, to a linear amplification) permit approximately 6 Mb of DNA sequence
the standard building blocks of smaller amounts of starting DNA to be per day, with costs for consumables
DNA) and dideoxy-nucleotides (ddNTP, used for modern sequencing reactions. amounting to about $500 per 1 Mb.
Figure 1. Schematic representation of the Sanger sequencing proc- to be non-reversibly terminated, creating differently extended mol-
ess. Input DNA is fragmented and cloned into bacterial vectors for ecules. Subsequently, after denaturation, clean up of free nucleo-
in vivo amplification. Reverse strand synthesis is performed on the tides, primers, and the enzyme, the resulting molecules are sorted
obtained copies starting from a known priming sequence and using a using capillary electrophoresis by their molecular weight (corre-
mixture of deoxy-nucleotides (dNTPs) and dideoxy-nucleotides sponding to the point of termination) and the fluorescent label
(ddNTPs). The dNTP/ddNTP mixture randomly causes the extension attached to the terminating ddNTPs is read out sequentially.
Bioessays 32: 524–536,! 2010 WILEY Periodicals, Inc. 525

M. Kircher and J. Kelso Methods, Models & Techniques .....
The sequencing error observed for sequencing platforms on the market synthesized by the polymerase. In the
Sanger sequencing is mainly due to (released in October 2005). It is based process of incorporation, one pyrophos-
errors in the amplification step (a low on the pyrosequencing approach devel- phate per nucleotide is released and
rate when done in vivo), natural var- oped by Pål Nyrén and Mostafa Ronaghi converted to ATP by an ATP sulfurylase.
iance, and contamination in the sample at the Royal Institute of Technology, The ATP drives the light reaction of luci-
used, as well as polymerase slippage at Stockholm in 1996 [34]. In contrast to ferases present and the emitted light
low complexity sequences like simple the Sanger technology, pyrosequencing signal is measured. To prevent the
repeats (short variable number tandem is based on iteratively complementing dATP provided for sequencing reaction
repeats) and homopolymers (stretches single strands and simultaneously from being used directly in the light
of the same nucleotide). Further, lower reading out the signal emitted from reaction, deoxy-adenosine-50 -(a-thio)-
intensities and missing termination the nucleotide being incorporated triphosphate (dATPaS), which is not a
variants tend to lead to sequencing (also called sequencing by synthesis, substrate of the luciferase, is used
errors accumulating toward the end of sequencing during extension). Electro- for the base incorporation reaction.
long sequences. In combination with phoresis is therefore no longer required Standard deoxyribose nucleotides are
reduced separation by the electrophore- to generate an ordered read out of the used for all other nucleotides. After cap-
sis, base miscalls [32] and deletions nucleotides, as the read out is now turing the light intensity, the remaining
increase with read length. However, done simultaneously with the sequence unincorporated nucleotides are washed
the average error rate (the average over extension. away and the next nucleotide is
all bases of a sequence) after sequence In the pyrosequencing process provided.
end trimming is typically very low, with (Fig. 2), one nucleotide at a time is In 2005, pyrosequencing technology
an error every 10,000–100,000 nt [33]. washed over several copies of the was parallelized on a picotiter plate by
sequence to be determined, causing 454 Life Sciences (later bought by Roche
Roche/454 GS FLX Titanium polymerases to incorporate the nucleo- Diagnostics) to allow high-throughput
sequencer tide if it is complementary to the tem- sequencing [16]. The sequencing plate
plate strand. The incorporation stops if has about two million wells – each of
The 454 sequencing platform was the the longest possible stretch of comp- them able to accommodate exactly
first of the new high-throughput lementary nucleotides has been one 28-mm diameter bead covered with
Figure 2. The pyrosequencing process. One of four nucleotides is incorporation, one pyrophosphate per nucleotide is released and
washed sequentially over copies of the sequence to be determined, converted to ATP by an ATP sulfurylase. The ATP drives the light
causing polymerases to incorporate complementary nucleotides. reaction of luciferases present and a light signal proportional
The incorporation stops if the longest possible stretch of the avail- (within limits) to the number of nucleotide incorporations can be
able nucleotide has been synthesized. In the process of measured.
526 Bioessays 32: 524–536,! 2010 WILEY Periodicals, Inc.

single-stranded copies of the sequence different sequences. These ‘‘mixed are washed over the plate) as well as
to be determined. The beads are incubated beads’’ will participate in a high number by the base composition and the order of
with a polymerase and single-strand of incorporations per flow cycle, result- the bases in the sequence to be deter-

binding proteins and, together with ing in sequencing reads that do not mined. Currently, 454/Roche limits this
smaller beads carrying the ATP sulfur- reflect real molecules. Most of these number to 200 flow cycles, resulting in
ylases and luciferases, gravitationally reads are automatically filtered during an expected average read length of
deposited in the wells. Free nucleotides the software post-processing of the data. about 400 nt. This is largely due to
are then washed over the flow cell and The filtering of mixed beads may, how- limitations imposed by the efficiency
the light emitted during the incorpora- ever, cause a depletion of real sequences of polymerases and luciferases, which
tion is captured for all wells in parallel with a high fraction of incorporations drops over the sequencing run, resulting
using a high-resolution charge-coupled per flow cycle. in decreased base qualities. Currently
device (CCD) camera, exploiting the A large fraction of the errors the platforms allows about 750 Mb of
light-transporting features of the plate observed for this instrument are small DNA sequence to be created per day
used. InDels, mostly arising from inaccurate with costs of about 20$/Mb.
One of the main prerequisites for calling of homopolymer length, and
applying this array-based pyrosequenc- single base-pair deletions or insertions
ing approach is covering individual caused by signal-to-noise thresholding Illumina Genome Analyzer II/IIx
beads with multiple copies of the same issues [35]. Most of these problems can
molecule. This is done by first creating be resolved by higher coverage. For long The reversible terminator technology
sequencing libraries in which every (>10 nt) homopolymers, however, there used by the Illumina Genome Analyzer
individual molecule gets two different is often a consistent length miscall that (GA) employs a sequencing-by-syn-
adapter sequences, one at the 50 end is not resolvable by coverage [35–37]. thesis concept that is similar to that
and one at the 30 end of the molecule. Strong light signals in one well of used in Sanger sequencing, i.e. the
In the case of the 454/Roche sequencing the picotiter plate may also result in incorporation reaction is stopped after
library preparation [16], this is done by insertions in sequences in neighboring each base, the label of the base incorp-
sequential ligation of two pre-synthes- wells. If the neighboring well is empty, orated is read out with fluorescent dyes,
ized oligos. One of the adapters added is this can generate so-called ghost and the sequencing reaction is then con-
complementary to oligonucleotides on wells, i.e., wells for which a signal is tinued with the incorporation of the
the sequencing beads and thus allows recorded even though they contain no next base [13, 39] (Fig. 3).
molecules to be bound to the beads by sequence template; hence, the inten- Like 454/Roche, the Illumina
hybridization. Low molecule-to-bead sities measured are completely caused sequencing protocol requires that the
ratios and amplification from the hybri- by bleed-over signal from the neighbor- sequences to be determined are con-
dized double-stranded sequence on the ing wells. Computational post-process- verted into a special sequencing library,
beads (kept separate using emulsion ing may correct for these artifacts [38]. which allows them to be amplified and
PCR) makes it possible to grow beads As for Sanger sequencing, the error rate immobilized for sequencing [13, 40]. For
with thousands of copies of a single increases with the position in the this purpose two different adapters are
starting molecule. Using the second sequence. In the case of 454 sequencing, added to the 50 and 30 ends of all mol-
adapter, beads covered with molecules this is caused by a reduction in enzyme ecules using ligation of so-called forked
can be separated from empty beads efficiency or loss of enzymes (resulting adapters.1 The library is then amplified
(using special capture beads with oligo- in a reduction of the signal intensities), using longer primer sequences, which
nucleotides complementary to the sec- some molecules no longer being elong- extend and further diversify the
ond adapter) and are then used in ated and by an increasing phasing adapters to create the final sequence
the sequencing reaction as described effect. Phasing is observed when a needed in subsequent steps.
above. population of DNA molecules amplified This double-stranded library is
The average substitution (excluding from the same starting molecule melted using sodium hydroxide to
insertion/deletion, InDel) error rate is in (ensemble) is sequenced, and describes obtain single-stranded DNAs, which
the range of 10"3–10"4 [16, 35], which is the process whereby not all molecules in are then pumped at a very low concen-
higher than the rates observed for the ensemble are extended in every tration through the channels of a flow
Sanger sequencing, but is the lowest cycle. This causes the molecules in the cell. This flow cell has on its surface two
average substitution error rate of the ensemble to lose synchrony/phase, and populations of immobilized oligonu-
new sequencing technologies discussed results in an echo of the preceding cleotides complementary to the two
here. As mentioned earlier for Sanger cycles to be added to the signal as noise. different single-stranded adapter ends
sequencing, in vitro amplifications per- The current 454/Roche GS FLX of the sequencing library. These oligo-
formed for the sequencing preparation Titanium platform makes it possible to nucleotides hybridize to the single-
cause a higher background error rate, sequence about 1.5 million such beads
i.e., the error introduced into the sample in a single experiment and to determine
before it enters the sequencer. In sequences of length between 300 and
1
addition, in bead preparation (i.e., 500 nt. The length of the reads is deter- Hybrids of partially complementary oligonucleo-
tides creating one double-stranded end with a T
emulsion PCR) a fraction of the beads mined by the number of flow cycles (the overhang, with a single-stranded and a different
end up carrying copies of multiple number of times all four nucleotides sequence at the other end.

Methods, Models & Techniques M. Kircher and J. Kelso Methods, Models & Techniques .....
Figure 3. Reversible terminator chemistry applied by the Illumina GA. nucleotides are washed away and the label of the bases incorporated
Sequencing primers are annealed to the adapters of the sequences for each sequence is read with four images taken through different
to be determined. Polymerases are used to extend the sequencing filters (T nucleotide filter is indicated in the figure) and using two
primers by incorporation of fluorescently labeled and terminated different lasers (red: A, C and green: G, T) to illuminate fluorophores.
nucleotides. The incorporation stops immediately after the first Subsequently, the fluorophores and terminators are removed and
nucleotide due to the terminators. The polymerases and free the sequencing continued with the incorporation of the next base.
stranded library molecules. By reverse the sequencing primer onto the adapter library and flow cell preparation
strand synthesis starting from the hybri- sequences and starting the reversible includes several in vitro amplification
dized (double-stranded) part, the new terminator chemistry. steps, which cause a high background
strand being created is covalently ‘‘Solexa sequencing’’, as it was error rate and contribute to the average
bound to the flow cell. If this new strand introduced in early 2007, initially error rate of about 10"2–10"3 [41, 42].
bends over and attaches to another oli- allowed for the simultaneous sequenc- Further, the flow cell preparation
gonucleotide complementary to the sec- ing of several million very short sequen- creates a fraction of ordinary-looking
ond adapter sequence on the free end of ces (at most 26 nt) in a single clusters that are initiated from more
the strand, it can be used to synthesize a experiment. In recent years there have than one individual sequence. These
second covalently bound reverse strand. been several technical, chemical, and results in mixed signals and mostly
This process of bending and reverse software updates. The product, which low quality sequences for these clusters.
strand synthesis, called bridge amplifi- is now called the Illumina Genome Similar to the 454 ghost wells, the
cation, is repeated several times and Analyzer, has increased flow cell cluster Illumina image analysis may identify
creates clusters of several 1,000 copies densities (more than 200 million clus- chemistry crystals, dust, and lint
of the original sequence in very close ters per run), a wider range of the particles as clusters and call sequences
proximity to each other on the flow cell flow cell is imaged, and sequence from these. In such cases the resulting
[13, 40]. reads of up to 100 nt can be generated. sequences typically appear to be of low
These randomly distributed clusters A technical update also enabled the sequence complexity.
contain molecules that represent the sequencing of the reverse strand of As is the case for the other platforms,
forward as well as reverse strands each molecule. This is achieved by the error rate increases with increasing
of the original sequences. Before deter- chemical melting and washing away position in the determined sequence.
mining the sequence, one of the strands the synthesized sequence, repeating a This is mainly due to phasing, which
has to be removed to prevent it few bridge amplification cycles for increases the background noise as
from hindering the extension reaction reverse strand synthesis, and then selec- sequencing progresses. While the
sterically or by complementary base tively removing the starting strand ensemble sequencing process for pyro-
pairing. Strands are selectively cleaved (again using base modifications of the sequencing creates uni-directional
at base modifications of oligonucleoti- flow cell oligonucleotide populations), phasing, reversible terminator sequenc-
des on the flow cell. Following strand before annealing another sequencing ing creates bi-directional phasing [41,
removal, each cluster on the flow cell primer for the second read. Using this 43] as some incorporated nucleotides
consists of single stranded, identically ‘‘paired-end sequencing’’ approach, may also fail to be correctly terminated –
oriented copies of the same sequence; approximately twice the amount of allowing the extension of the sequence
which can be sequenced by hybridizing data can be generated. The Illumina by another nucleotide in the same cycle.

With increasing cycle numbers, the on the Church lab sequencing-by- a similar fashion to that described ear-
intensities extracted from the clusters ligation concept, but combines it with lier for the 454/Roche platform. In con-
decline [41, 43, 44]. This is due to fewer a new strategy of sequencing library trast to the 454/Roche technology, the

molecules participating in the extension construction and sequence immobiliz- SOLiD system does not use a picotiter
reaction as a result of non-reversible ation using rolling circle amplification plate for fixation of the beads in the
termination, or due to dimming effects [45]. Here, we focus on the commercial sequencing process; instead the 30 ends
of the sequencing fluorophores. In early SOLiD system as this is the most wide- of the sequences on the beads are modi-
versions of the chemistry, one of the spread application of this concept. fied in a way that allows them to be
fluorophores could become stuck to The principle behind sequencing- covalently bound onto a glass slide.
the clusters creating another source of by-ligation is very different from the As for the Illumina GA system, this cre-
increased background noise [41]. The approaches discussed thus far. The ates a random dispersion of the beads in
simultaneous identification of four sequence extension reaction is not car- the sequencing chamber and allows for
different nucleotides is also an issue. ried out by polymerases but rather by higher loading densities. However, ran-
The GA uses four fluorescent dyes to ligases [17] (see Fig. 4 for a schematic dom dispersion complicates the identi-
distinguish the four nucleotides A, C, representation of the SOLiD 2/3 plat- fication of bead positions from images,
G, and T. Of these, two pairs (A/C and form). In the sequencing-by-ligation and results in the possibility that chemi-
G/T) excited using the same laser, are process, a sequencing primer is hybri- cal crystals, dust, and lint particles can
similar in their emission spectra and dized to single-stranded copies of the be misidentified as clusters. Further,
show only limited separation using opti- library molecules to be sequenced. A dispersal of the beads results in a wide
cal filters. Therefore, the highest substi- mixture of 8-mer probes carrying four range of inter-bead distances, which
tution errors observed are between A/C distinct fluorescent labels compete for then have different susceptibility to be
and G/T [41, 42]. ligation to the sequencing primer. The influenced by signals from neighboring
Even though the Illumina GA reads fluorophore encoding, which is based beads.
show a higher average error rate, on the two 30 -most nucleotides of the Types and causes of sequence errors
a wider average error range, and are probe, is read. Three bases including are diverse: first, the in vitro amplifica-
considerably shorter than 454/Roche the dye are cleaved from the 50 end of tion steps cause a higher background
reads, the GA instrument determines the probe, leaving a free 50 phosphate on error rate. Secondly, beads carrying a
more than 5,000 Mb/day with a price the extended (by five nucleotides) pri- mixture of sequences and beads in close
of about 0.50$/Mb. This is more than mer, which is then available for further proximity to one another create false
six times higher daily throughput and ligation. After multiple ligations (typi- reads and low quality bases. Further,
for a considerably lower price per cally up to 10 cycles), the synthesized signal decline, a small regular phasing
megabase. strands are melted and the ligation effect, and incomplete dye removal
product is washed away before a new result in increasing error as the ligation
sequencing primer (shifted by one cycles progress [47]. Phasing, as
Applied Biosystems SOLiD nucleotide) is annealed. Starting from described earlier, is a minor issue on
the new sequencing primer the ligation this platform as sequences not extended
The prototype of what was further reaction is repeated. The same process is in the last cycle are non-reversibly ter-
developed and later sold by Life followed for three other primers, facili- minated using phosphatases. Since
Technologies/Applied Biosystems (ABI) tating the read out of the dinucleotide hybridization is a stochastic process,
as the SOLiD sequencing platform, was encoding for each start position in the this causes a considerable reduction in
developed by Harvard Medical School sequence. Using specific fluorescent the number of molecules participating
and the Howard Hughes Medical label encoding, the dye read outs (i.e. in subsequent ligation reactions, and
Institute and published in 2005 [17]. colors) can be converted to a sequence therefore substantial signal decline.
With its commercial release in late [46]. This conversion from color space to On the other hand, given the efficiency
2007, SOLiD was only the third new sequence requires a known first base, of phosphatases the remaining phasing
high-throughput system entering a which is the last base of the used library effect can be considered very low.
highly competitive market with all three adapter sequence. Given a reference However, incomplete cleavage of the
vendors selling their instruments for sequence, this encoding system allows dyes may allow cleavage in the next
around half a million dollars. The detection of machine errors and the ligation reaction, which then allows
Church lab at Harvard Medical School application of an error correction to for the extension in the next but one
continued the development of the reduce the average error rate. In the cycle. This causes a different phasing
system and now offers a cheaper absence of a reference sequence, how- effect and additional noise from the
(<$200,000) open source version of ever, color conversion fails with an error previous cycle’s dyes in the dye identi-
the system (called Polonator) in collab- in the dye read out and causes the fication process.
oration with Dover System. In the third sequence downstream of the error to The SOLiD system currently allows
quarter of 2008, a biotechnology com- be incorrect. sequencing of more than 300 million
pany from Mountain View, California, For parallelization, the sequencing beads in parallel, with a typical read
named Complete Genomics started process uses beads covered with length of between 25 and 75 nt. At the
offering a human genome sequencing multiple copies of the sequence to be time of writing, the ABI SOLiD system is
service. Their technology is also based determined. These beads are created in therefore comparable to the Illumina GA

Methods, Models & Techniques M. Kircher and J. Kelso Methods, Models & Techniques .....
Figure 4. Applied Biosystem’s SOLiD sequencing by ligation. A free 50 phosphate for further ligations. After multiple ligations, the
sequencing primer is annealed to single-stranded copies of sequen- synthesized strands are melted and the ligation product is washed
ces to be determined. Octamer probes are hybridized, ligated to the away before a new, by-one-nucleotide-shifted sequencing primer is
sequencing primer, and a fluorescent dye at the 50 end of the ligated annealed. Starting from the new sequencing primer the ligation
8-mer probes, encoding the two 30 -most nucleotides of the probe, is reaction is repeated. The same is done for three other primers,
read out. Non-extended primers are dephosphorylated. Three allowing the read out of the dinucleotide label for every position in
nucleotides of the probe including the dye are cleaved, creating a the sequence.
system in terms of throughput and price Helicos HeliScope The HeliScope, as the Helicos sequencer
per million nucleotides (#5,000 Mb/ is called, was first sold in March 2008,
day, #0.50$/Mb). Average error rates Helicos is the first company to sell a and by the end of the first quarter of
are, however, dependent on the avail- sequencer able to sequence individual 2009 only four machines have been
ability of a reference genome for error molecules instead of molecule ensem- installed worldwide. This might be sur-
correction (10"3–10"4 vs. 10"2–10"3). In bles created by an amplification proc- prising given the advantages of single
the absence of a reference genome, ess. Single molecule sequencing has the molecule sequencing, but probably
assembly and consensus calling may advantage that it is not affected by reflects both the specific limitations of
be performed based on dye read outs biases or errors introduced in a library this platform, the price (about one
(so-called color space sequences) to preparation or amplification step, and million dollars), and a relatively small
reduce the errors before conversion to may facilitate sequencing of minimal market that has already invested exten-
the nucleotide sequence. If no reference amounts of input DNA. Using methods sively in new sequencing technologies.
genome is available for error correction, able to detect non-standard nucleotides, The technology applied (Fig. 5)
and no assembly and consensus calling it could also allow for the identification could be termed asynchronous virtual
is performed, then the average error rate of DNA modifications, commonly lost in terminator chemistry [15]. Input DNA
is higher than for the Illumina GA. the in vitro amplification process. is fragmented and melted before a


Figure 5. Asynchronous virtual terminator chemistry performed by type of fluorescently labeled nucleotides (A, C, G, and T) at a time,
the HeliScope. Input DNA is fragmented, melted, and polyadeny- and the polymerases extend the reverse strand of the sequences
lated. A fluorescently labeled adenine is added in the last step. This starting from the poly-T oligonucleotides. The nucleotide incorpora-
single-stranded DNA is washed over a flow cell with poly-T oligo- tion of the polymerases is slowed down by the fluorescent labeling
nucleotides allowing hybridization. The bound coordinates on the and allows for at most one incorporation before the polymerase is
flow cell are determined using the fluorescently labeled adenines. washed away. The flow cell is then imaged, the fluorescent dyes
Having the coordinates identified, the fluorescent label of the 30 removed, and the reaction continued with another nucleotide.
adenines is removed. Polymerases are washed through with one
poly-A-tail is synthesized onto each one incorporation before the polymer- molecules may be irreversibly termi-
single-stranded molecule using a poly- ase is washed away together with the nated by the incorporation of incorrectly
adenylate polymerase. In the last step of non-incorporated nucleotides (termed synthesized nucleotides. Overall, reads
polyadenylation, a fluorescently labeled virtual termination [48, 49]). The flow are between 24 and 70 nt long (average
adenine is added. The library is washed cell is then imaged again, the fluor- 32 nt) [50] and thus shorter than for the
over a flow cell where the poly-A tails escent dyes are removed, and the reac- other platforms. Due to the higher num-
bind to poly-T oligonucleotides. The tion continued with another nucleotide. ber of sequences determined in parallel,
bound coordinates on the flow cell are By this process not every molecule is the total throughput per day (4150 Mb/day
determined using a fluorescence-based extended in every cycle, which is why with a cost of #0.33$/Mb [50]) is in the
read out of the flow cell. Having these it is an asynchronous sequencing proc- same range as for the GA and SOLiD
coordinates identified, the fluorescent ess resulting in sequences of different systems. The average error rate, which
label of the 30 adenine is removed and length (as is the case for the 454/Roche is in the range of a few percent, is
the sequencing reaction started. platform). slightly higher than for all other instru-
Polymerases are washed through the Since single molecules are ments and biased toward InDels rather
flow cell with one type of fluorescently sequenced, the signals being measured than substitutions.
labeled nucleotide (A, C, G, or T) at a are weak, and there is no possibility that
time and the polymerases extend the misincorporation errors can be cor-
reverse strand of the sequences starting rected by an ensemble effect. Due to Applications and general
from the poly-T oligonucleotides. The the fact that molecules are attached to considerations
nucleotide incorporation of the poly- the flow cell by hybridization only, there
merases is slowed down by the fluor- is a chance that template molecules can All current high-throughput technol-
escent labeling and allows for at most be lost in the wash steps. In addition, ogies have an average error rate that

is considerably higher than the typical amounts are available or being devel- molecule is then processed using restric-
1/10,000 to 1/100,000 observed for oped for each of the platforms, and pub- tion enzymes or fragmentation before
high-quality Sanger sequences. lications demonstrate that, while vendor outer library adapters are added around
Further, the GS FLX Titanium, GA, protocols indicate the need for higher the two combined molecule ends. The
SOLiD, and HeliScope platforms each sample quantities (microgram range), internal adapter can then be used as a
have very specific biases and limita- many users are proceeding successfully second priming site for an additional
tions, making it necessary to choose a with low input DNA amounts (nanogram sequencing reaction on the same
platform appropriate for a specific proj- to picogram range), as, for example, from immobilized molecules. Thus, mate-pair
ect or application (for a summary see ancient DNA specimens [60–62]. sequencing provides distance infor-
Table 1). A combination of technologies Like Sanger sequencing, the GS FLX mation useful for assembly, but does
[51–54] and experimental protocols [55– Titanium provides a read length span- not allow the merging of the two over-
57] may also be appropriate, and even ning many of the short repeat sequences lapping end reads, since by design the
complementary, for specific projects. – an important feature for accurate molecules will not overlap in sequenc-
High-quality Sanger sequencing sequence mapping and assembly of ing. However, merging of two overlap-
is now commonly used to generate genomes [63]. Despite the InDel errors, ping forward and reverse paired end
low-coverage sequencing of individual this technology has very low rates of reads from short insert libraries allows
positions and regions (e.g., diagnostic misidentifying individual bases, making the reconstruction of a complete con-
genotyping) or the sequencing of it perfectly suited for the identification secutive molecule sequence, longer
virus- and phage-sized whole genomes. of single nucleotide polymorphisms than the individual read length, and
As the Sanger sequence length is (SNPs). Also geared to the identification with reduced average error rates in the
longer than most abundant short of SNPs, at least for samples with an overlapping sequence part [60, 66].
repeat classes, it allows the unambigu- existing reference genome, is the Due to the large amounts of sequen-
ous assembly of most genomic SOLiD instrument with its dinucleotide ces created, there is interest in sequenc-
regions – something that is generally encoding scheme [46]. Considerably ing targeted regions (e.g. a genomic
not possible using the shorter read higher coverage is needed to perform locus, from sequence capture exper-
platforms. However, the technology is SNP calling with similar accuracy using iments [67–69]) in multiple individuals/
expensive and too slow for sequencing the Illumina GA [64]. Neither the samples instead of sequencing one
large samples, extended genomic re- Illumina GA nor the ABI SOLiD sequenc- sample in excessive depth. All tech-
gions, or the many molecules required ing systems are prone to generate high nologies therefore provide a separation
for quantitative applications [e.g., rates of small InDels, making them well of their sequencing plate into defined
gene expression quantification; chro- suited for studying InDel variation. regions or channels. However, at most,
matin immunoprecipitation sequenc- As mentioned earlier, the drawback 16 such regions/channels are available
ing (ChIP-Seq); and methylation- of short reads (below about 75 nt) (GS FLX Titanium and HeliScope plates),
dependent immunoprecipitation sequen- obtained from Helicos, SOLiD, or GA which may not be sufficient for some
cing (MeDip-Seq)]. For quantitative instruments is in genome assembly applications. Using different library con-
applications the HeliScope provides and mapping applications, where the struction protocols, some platforms
the highest throughput in terms of placement of repeated or very similar allow addition of sample specific barcode
sequence number and has the sequences cannot be resolved unambig- (sometimes called ‘‘index’’) sequences to
advantage of not requiring a multistep uously. The correct placement is further the library molecules. These molecules
library preparation protocol. On the ot- complicated by high error rates intro- can then be sequenced in the same
her hand, the HeliScope provides the ducing a requirement for a minimum region/channel, and later separated
lowest resolution in mapping accuracy sequence distance of an unambiguous (computationally) based on their bar-
for complex genomes due its short read placement. Paired-end or mate-pair pro- code sequence [70–73]. This facilitates
length and error profile. The GA or tocols help to overcome some of these highly parallel sequencing of a large
SOLiD platforms may thus provide limitations of short reads [65] by provid- number of samples beyond that possible
equivalent results for quantitative appli- ing information about relative location using the physical lane/channel separ-
cations, while providing fewer but lon- and orientation of a pair of reads. ation. Currently such protocols (mostly
ger reads and requiring a more Currently a paired-end protocol is only non-vendor protocols) are available for
elaborate library preparation. commonly applied on the GA, while the GS FLX Titanium, GA, and SOLiD
While it has not yet been fully ana- mate-pair protocols are available for instrument.
lyzed, it is possible (and even likely) that SOLiD, GS FLX Titanium, and GA. In Although sequencing prices per giga-
library preparation protocols could bias paired-end sequencing the actual ends base have fallen considerably in recent
the sequence representation in a sample of rather short DNA molecules (<1 kb) years, making projects like the 1000
[42, 58, 59], making the replacement of are determined, while mate-pair Human Genome Variation Project, 1001
this step an important goal. Further, sequencing requires the preparation of Arabidopsis thaliana Genomes Project,
multistep library preparation protocols special libraries. In these protocols, the the Mammalian Genome Project, or
require higher amounts of input ends of longer, size-selected molecules the International Cancer Genome
material, limiting their general appli- (e.g., 8, 12, or 20 kb) are connected with Consortium possible, high-throughput
cation. However, protocols for library an internal adapter sequence in a circu- sequencing still has high acquisition,
construction from limited sample larization reaction. The circular running and maintenance costs, which

Table 1. Comparison of high-throughput sequencing technologies available
Throughput Length Quality Costs Applications Main sources of errors

The table summarizes throughput, length, quality, and costs for the current versions of the mentioned technologies. These approximate
numbers are constantly improving and based on figures available in January 2010. Costs do not include instrument acquisition and
maintenance; further they may be affected by discounts and scale effects for multiple instruments. Where numbers are very similar, colors
ranging from red (low performance) to green (good performance) indicate a general trend. In the last column, example applications fitting the
throughput and error profiles of each of the platforms are given. Typically, this does not mean that the technology is limited to these
applications, but that it is currently best suited to such applications.
þ
High sequencing depth/number of runs required.
are not included in Table 1. Further, each increasing and the numbers given here Pacific Biosciences’ SMRT technology
of these platforms requires a substantial are rapidly outdated. However, in performs the sequencing reaction on
investment in data management and addition to the improvements of current silicon dioxide chips with a 100 nm
analysis, time, and personnel [74–77]. technologies, including the January metal film containing thousands of
Smaller research groups may still find 2010 announcement of the Illumina tens-of-nanometer diameter holes, so-
prohibitive the costs of the infrastructure HiSeq 2000 system, which determines called zero-mode waveguides (ZMWs)
needed for storing, handling, and ana- sequences of clusters on bottom and top [79]. Each ZMW is used as a nano-visual-
lyzing several tens of gigabytes of pure of the flow cell and processes two flow ization chamber, providing a detection
sequence data and terabytes of several cells in parallel, a new generation of volume of 20 zeptoliters (10"21 l). At this
thousand intermediate files generated by sequencers is already on the horizon. volume, a single molecule can be illu-
these instruments each week. Even for What started with the Helicos minated while excluding other labeled
larger, experienced genome centers this system – the sequencing of single mol- nucleotides in the background – saving
aspect remains an ever-increasing chal- ecules without prior library preparation time and sequencing chemistry by omit-
lenge for the ongoing use of these or amplification – will likely become a ting wash steps. A single DNA polymer-
platforms. popular paradigm. Specifically, three ot- ase is fixed to the bottom of the surface
her systems have captured media and within the detection volume, and
scientific attention well in advance of nucleotides, with different dyes
Upcoming developments their actual availability: Pacific attached to the phosphate chain, are
Bioscience’s Single Molecule Real used in concentrations allowing
Motivated by the goal of a $1,000- Time (SMRT) sequencing technology normal enzyme processivity. As the pol-
genome set by NIH/NHGRI to enable [18], Oxford Nanopore’s BASE technol- ymerase incorporates complementary
personalized medicine, the throughput ogy [14] and, recently, IBM’s proposal of nucleotides, the nucleotide is held
of all systems described is constantly silicon-based nanopores [78]. within the detection volume for tens

of milliseconds, orders of magnitude the Nanopore technologies developed at demonstrate the major future directions
longer than for unspecific diffusion Harvard University [81] or the previously in the field of DNA sequencing: the abil-
events. This way the fluorescent dye of described BASE technology where it ity to use individual molecules without
the incorporated nucleotide can be may overcome the destructive approach any library preparation or amplification,
identified during normal speed reverse followed so far. the identification of specific nucleotide
strand synthesis [79]. In pilot exper- modifications, and the ability to gener-
iments, Pacific Biosciences has shown ate longer sequence reads. These devel-
that its technology allows for direct Conclusion opments will facilitate future research in
sequencing of a few thousand bases many fields, make data analysis easier,
before the polymerase is denatured Current high-throughput sequencing and further reduce sequencing costs,
due to the laser read out of the dyes. technologies provide a huge variety of hopefully achieving the aim of a
The SMRT technology is intended for sequencing applications to many $1,000 human genome suggested by
release in 2010. Even though further researchers and projects. Given the NIH/NHGRI to be required for personal-
development is needed to create a more immense diversity, we have not dis- ized medicine.
robust system, the omission of library cussed these applications in depth here;
preparation and amplification as well other reviews with a stronger focus on
as the long sequences generated will specific applications and data analysis Acknowledgments
undoubtedly provide an advantage are available [24, 82–88]. The discussed We thank the members of the Depart-
over the current systems for many technologies make it possible for even ment of Evolutionary Genetics, and
applications. single research groups to generate large particularly members of the sequencing
Oxford Nanopore’s BASE technology amounts of sequence data very rapidly group, for providing sequencing data
is unlikely to be released as soon as and at substantially lower costs than from multiple platforms, as well
the SMRT technology. BASE offers traditional Sanger sequencing. While as interesting discussions and useful
the potential to identify individual costs have been reduced to less than insights. We are also indebted to A.
nucleotide modifications (e.g. 5-methyl- 4–0.1% and time has been shortened Wilkins and the three anonymous
cytosine vs. cytosine) during the by a factor of 100–1,000 based on daily reviewers for critical reading of the
sequencing process [14]. The idea behind throughput, the error profiles and manuscript and thoughtful comments.
this technology is the identification of limitations observed for the new plat- This work was supported by the Max
individual nucleotides using a change forms differ significantly from Sanger Planck Society.
in the membrane potential as they pass sequencing and between approaches.
through a modified a-hemolysin mem- Further, each of these new sequencing
brane pore with a cyclodextrin sensor platforms requires substantial additional
[14, 80]. However, to apply this technol- investments – factors that have often References
ogy for sequencing, the pore has to be not be sufficiently stressed in research
1. Sanger F, Air GM, Barrell BG, et al. 1977.
fused to an exonuclease, which degrades publications describing a specific appli- Nucleotide sequence of bacteriophage phi
single-stranded DNA sequences and cation. Some vendors have recently X174 DNA. Nature 265: 687–95.
releases individual nucleotides into the started to offer budget versions of their 2. Gilbert W, Maxam A. 1973. The nucleotide
sequence of the lac operator. Proc Natl Acad
pore. In addition, the technology needs instruments (e.g. Illumina GA IIe or 454/ Sci USA 70: 3581–4.
to be parallelized in array format, before Roche GS Junior) with lower sequencing 3. Sanger F, Nicklen S, Coulson AR. 1977.
its release as a high-throughput sequenc- capacity. However, while the instru- DNA sequencing with chain-terminating
ing platform. While the sensitivity for ment price is lower, the financial invest- inhibitors. Proc Natl Acad Sci USA 74:
5463–7.
individual nucleotide modifications ment remains high. Costs per base are 4. Sanger F, Coulson AR. 1975. A rapid method
seems to be a major advantage, the generally higher than for the standard for determining sequences in DNA by primed
destructive fashion of the outlined instrument, and very similar overall synthesis with DNA polymerase. J Mol Biol 94:
441–8.
sequencing process might be considered infrastructure is still required. Often 5. Wu R, Kaiser AD. 1968. Structure and base
a hindrance for applications with pre- the choice of an appropriate sequencing sequence in the cohesive ends of bacterio-
cious samples, and it does not allow a platform is project specific and some- phage lambda DNA. J Mol Biol 35: 523–37.
6. Smith LM, Sanders JZ, Kaiser RJ, et al.
second read cycle for error reduction. times combinations can be advan-
1986. Fluorescence detection in automated
In early October 2009, IBM issued a tageous. This may open the market DNA sequence analysis. Nature 321: 674–9.
press release [78] describing a method to further to companies providing 7. Swerdlow H, Gesteland R. 1990. Capillary
slow down the speed of an individual sequencing-on-demand services, but gel electrophoresis for rapid, high resolution
DNA sequencing. Nucleic Acids Res 18:
DNA strand passing through a nano- will not replace the need for laboratories 1415–9.
pore. For this purpose they developed to invest considerable time and exper- 8. Zagursky RJ, McCormick RM. 1990. DNA
a multilayer metal/dielectric nanopore tise in both the production of libraries sequencing separations in capillary gels on a
modified commercial DNA sequencing instru-
device that utilizes the interaction of the and analysis of the vast quantities of ment. Biotechniques 9: 74–9.
DNA backbone charges with a modu- data that will be generated. 9. Huang XC, Quesada MA, Mathies RA. 1992.
lated electric field to trap and slowly New technologies on the horizon, DNA sequencing using capillary array electro-
releases an individual DNA molecule. SMRT by Pacific Biosciences, BASE by phoresis. Anal Chem 64: 2149–54.
10. Kambara H, Takahashi S. 1993. Multiple-
The technology described could theor- Oxford Nanopore, and other technol- sheathflow capillary array DNA analyser.
etically be combined with, for example, ogies such as that suggested by IBM, Nature 361: 565–6.

11. Ueno K, Yeung ES. 1994. Simultaneous 384 multicapillary sequencer. Genome Res 49. Bowers J, Mitchell J, Beer E, et al. 2009.
monitoring of DNA fragments separated by 10: 1757–71. Virtual terminator nucleotides for next-
electrophoresis in a multiplexed array of 100 31. Hert DG, Fredlake CP, Barron AE. 2008. generation DNA sequencing. Nat Methods
capillaries. Anal Chem 66: 1424–31. Advantages and limitations of next-gener- 6: 593–5.

12. Kim S, Yoo HJ, Hahn JH. 1996. ation sequencing technologies: a comparison 50. Pushkarev D, Neff NF, Quake SR. 2009.
Postelectrophoresis capillary scanning of electrophoresis and non-electrophoresis Single-molecule sequencing of an individual
method for DNA sequencing. Anal Chem methods. Electrophoresis 29: 4618–26. human genome. Nat Biotechnol 27: 847–52.
68: 936–9. 32. Ewing B, Hillier L, Wendl MC, et al. 1998. 51. Reinhardt JA, Baltrus DA, Nishimura MT,
13. Bentley DR, Balasubramanian S, Swerdlow Base-calling of automated sequencer traces et al. 2009. De novo assembly using low-cov-
HP, et al. 2008. Accurate whole human using phred. I. Accuracy assessment. erage short read sequence data from the rice
genome sequencing using reversible termin- Genome Res 8: 175–85. pathogen Pseudomonas syringae pv. oryzae.
ator chemistry. Nature 456: 53–9. 33. Ewing B, Green P. 1998. Base-calling of Genome Res 19: 294–305.
14. Clarke J, Wu HC, Jayasinghe L, et al. 2009. automated sequencer traces using phred. 52. Diguistini S, Liao NY, Platt D, et al. 2009.
Continuous base identification for single-mol- II. Error probabilities. Genome Res 8: De novo genome sequence assembly of a
ecule nanopore DNA sequencing. Nat 186–94. filamentous fungus using Sanger, 454 and
Nanotechnol 4: 265–70. 34. Ronaghi M, Karamohamed S, Pettersson Illumina sequence data. Genome Biol 10:
15. Harris TD, Buzby PR, Babcock H, et al. B, et al. 1996. Real-time DNA sequencing R94.
2008. Single-molecule DNA sequencing of a using detection of pyrophosphate release. 53. Miller JR, Delcher AL, Koren S, et al. 2008.
viral genome. Science 320: 106–9. Anal Biochem 242: 84–9. Aggressive assembly of pyrosequencing
16. Margulies M, Egholm M, Altman WE, et al. 35. Quinlan AR, Stewart DA, Stromberg MP, reads with mates. Bioinformatics 24: 2818–
2005. Genome sequencing in microfabricated et al. 2008. Pyrobayes: an improved base 24.
high-density picolitre reactors. Nature 437: caller for SNP discovery in pyrosequences. 54. Chen W, Ullmann R, Langnick C, et al. 2009.
376–80. Nat Methods 5: 179–81. Breakpoint analysis of balanced chromosome
17. Shendure J, Porreca GJ, Reppas NB, et al. 36. Wicker T, Schlagenhauf E, Graner A, et al. rearrangements by next-generation paired-
2005. Accurate multiplex polony sequencing 2006. 454 sequencing put to the test using the end sequencing. Eur J Hum Genet DOl: 10.
of an evolved bacterial genome. Science 309: complex genome of barley. BMC Genomics 7: 1038/ejhg .2009.21118 [Epub ahead of print].
1728–32. 275. 55. Zimin AV, Delcher AL, Florea L, et al. 2009.
18. Korlach J, Marks PJ, Cicero RL, et al. 2008. 37. Green RE, Malaspinas AS, Krause J, et al. A whole-genome assembly of the domestic
Selective aluminum passivation for targeted 2008. A complete Neandertal mitochondrial cow, Bos taurus. Genome Biol 10: R42.
immobilization of single DNA polymerase mol- genome sequence determined by high- 56. Zhou X, Su Z, Sammons RD, et al. 2009.
ecules in zero-mode waveguide nanostruc- throughput sequencing. Cell 134: 416– Novel software package for cross-platform
tures. Proc Natl Acad Sci USA 105: 1176–81. 26. transcriptome analysis (CPTRA). BMC
19. Ansorge WJ. 2009. Next-generation, DNA 38. Green RE, Krause J, Ptak SE, et al. 2006. Bioinf. 11: S16.
sequencing techniques. Nat Biotechnol 25: Analysis of one million base pairs of 57. Kim JI, Ju YS, Park H, et al. 2009. A highly
195–203. Neanderthal DNA. Nature 444: 330–6. annotated whole-genome sequence of a
20. Mardis ER. 2008. Next-generation, DNA 39. Turcatti G, Romieu A, Fedurco M, et al. Korean individual. Nature 460: 1011–5.
sequencing methods. Annu Rev Genomics 2008. A new class of cleavable fluorescent 58. Linsen SE, de Wit E, Janssens G, et al. 2009.
Hum Genet 9: 387–402. nucleotides: synthesis and optimization as Limitations and possibilities of small RNA
21. Schuster SC. 2008. Next-generation reversible terminators for DNA sequencing digital gene expression profiling. Nat
sequencing transforms today’s biology. Nat by synthesis. Nucleic Acids Res 36: e25. Methods 6: 474–6.
Methods 5: 16–8. 40. Fedurco M, Romieu A, Williams S, et al. 59. Quail MA, Swerdlow H, Turner DJ. 2009.
22. Shendure J, Ji H. 2008. Next-generation, 2006. BTA, a novel reagent for DNA attach- Improved protocols for the Illumina Genome
DNA sequencing. Nat Biotechnol 26: 1135– ment on glass and efficient generation of Analyzer sequencing system. Curr Protoc
45. solid-phase amplified DNA colonies. Nucleic Hum Genet Chapter 18: Unit 18.2.
23. Shendure JA, Porreca GJ, Church GM. Acids Res 34: e22. 60. Briggs AW, Stenzel U, Meyer M, et al. 2009.
2008. Overview of DNA sequencing strat- 41. Kircher M, Stenzel U, Kelso J. 2009. Removal of deaminated cytosines and detec-
egies. Curr Protoc Mol Biol Chapter 7: Unit Improved base calling for the Illumina tion of in vivo methylation in ancient DNA.
7.1. Genome Analyzer using machine learning Nucleic Acids Res 38(6): e87 [Epub ahead
24. Metzker ML. 2010. Sequencing technologies strategies. Genome Biol 10: R83. of print].
– the next generation. Nat Rev Genet 11: 31– 42. Dohm JC, Lottaz C, Borodina T, et al. 2008. 61. Maricic T, Paabo S. 2009. Optimization of
46. Substantial biases in ultra-short read data 454 sequencing library preparation from small
25. George KS, Zhao X, Gallahan D, et al. 1997. sets from high-throughput DNA sequencing. amounts of DNA permits sequence determi-
Capillary electrophoresis methodology for Nucleic Acids Res 36: e105. nation of both DNA strands. Biotechniques
identification of cancer related gene expres- 43. Erlich Y, Mitra PP, delaBastide M, et al. 46: 51–2, 54–7.
sion patterns of fluorescent differential display 2008. Alta-Cyclic: a self-optimizing base 62. Rohland N, Hofreiter M. 2007. Comparison
polymerase chain reaction. J Chromatogr B caller for next-generation sequencing. Nat and optimization of ancient DNA extraction.
Biomed Sci Appl 695: 93–102. Methods 5: 679–82. Biotechniques 42: 343–52.
26. Blazej RG, Kumaresan P, Mathies RA. 44. Rougemont J, Amzallag A, Iseli C, et al. 63. Wheeler DA, Srinivasan M, Egholm M, et al.
2006. Microfabricated bioprocessor for inte- 2008. Probabilistic base calling of Solexa 2008. The complete genome of an individual
grated nanoliter-scale Sanger DNA sequenc- sequencing data. BMC Bioinf. 9: 431. by massively parallel DNA sequencing. Nature
ing. Proc Natl Acad Sci USA 103: 7240–5. 45. Drmanac R, Sparks AB, Callow MJ, et al. 452: 872–6.
27. Mariella R Jr. 2008. Sample preparation: the 2010. Human genome sequencing using 64. Harismendy O, Ng PC, Strausberg RL, et al.
weak link in microfluidics-based biodetection. unchained base reads on self-assembling 2009. Evaluation of next generation sequenc-
Biomed Microdevices 10: 777–84. DNA nanoarrays. Science 327: 78–81. ing platforms for population targeted
28. Roper MG, Easley CJ, Legendre LA, et al. 46. Applied Biosystems. A Theoretical sequencing studies. Genome Biol 10: R32.
2007. Infrared temperature control system for Understanding of 2 Base Color Codes and 65. Chaisson MJ, Brinza D, Pevzner PA. 2009.
a completely noncontact polymerase chain Its Application to Annotation, Error De novo fragment assembly with short mate-
reaction in microfluidic chips. Anal Chem Detection, and Error Correction. White paired reads: Does the read length matter?
79: 1294–1300. Paper SOLiDTM System; 2008. Genome Res 19: 336–46.
29. Emrich CA, Tian H, Medintz IL, et al. 2002. 47. Dimalanta ET, Zhang L, Hendrickson CL, 66. Krause J, Briggs AW, Kircher M, et al. 2009.
Microfabricated 384-lane capillary array elec- et al. 2009. Increased Read Length on the A complete mtDNA genome of an early mod-
trophoresis bioanalyzer for ultrahigh-through- SOLiDTM Sequencing Platform. Poster ern human from Kostenki, Russia. Curr Biol
put genetic analysis. Anal Chem 74: 5076– SOLiDTM System. 20: 231–6.
83. 48. Zhu Z, Waggoner AS. 1997. Molecular 67. Gnirke A, Melnikov A, Maguire J, et al. 2009.
30. Shibata K, Itoh M, Aizawa K, et al. 2000. mechanism controlling the incorporation of Solution hybrid selection with ultra-long oli-
RIKEN integrated sequence analysis (RISA) fluorescent nucleotides into DNA by PCR. gonucleotides for massively parallel targeted
system – 384-format sequencing pipeline with Cytometry 28: 206–11. sequencing. Nat Biotechnol 27: 182–9.

68. Hodges E, Rooks M, Xuan Z, et al. 2009. 75. Richter BG, Sexton DP. 2009. Managing and nanotube interactions: activation enthalpies
Hybrid selection of discrete genomic intervals analyzing next-generation sequence data. and assembly-disassembly control. Nano-
on custom-designed microarrays for mas- PLoS Comput Biol 5: e1000369. technology 20: 395101.
sively parallel sequencing. Nat Protoc 4: 76. Quail MA, Kozarewa I, Smith F, et al. 2008. 82. Medvedev P, Stanciu M, Brudno M. 2009.
960–74. A large genome center’s improvements to the Computational methods for discovering
69. Briggs AW, Good JM, Green RE, et al. 2009. Illumina sequencing system. Nat Methods 5: structural variation with next-generation
Targeted retrieval and analysis of five 1005–10. sequencing. Nat Methods 6: S13–20.
Neandertal mtDNA genomes. Science 325: 77. Batley J, Edwards D. 2009. Genome 83. Pepke S, Wold B, Mortazavi A. 2009.
318–21. sequence data: management, storage, and Computation for ChIP-seq and RNA-seq
70. Meyer M, Stenzel U, Hofreiter M. 2008. visualization. Biotechniques 46: 333–4, 336. studies. Nat Methods 6: S22–32.
Parallel tagged sequencing on the 454 plat- 78. IBM Research. 2009. IBM research aims to 84. Flicek P, Birney E. 2009. Sense from
form. Nat Protoc 3: 267–78. build nanoscale DNA sequencer to help drive sequence reads: methods for alignment and
71. Meyer M, Stenzel U, Myles S, et al. 2007. down cost of personalized genetic analysis. In assembly. Nat Methods 6: S6–12.
Targeted high-throughput sequencing of Loughran M, ed.; Press Releases, Vol. 2009. 85. Park PJ. 2009. ChIP-seq: advantages and
tagged nucleic acid samples. Nucleic Acids New York: IBM. challenges of a maturing technology. Nat
Res 35: e97. 79. Eid J, Fehr A, Gray J, et al. 2009. Real-time Rev Genet 10: 669–80.
72. Erlich Y, Chang K, Gordon A, et al. 2009. DNA sequencing from single polymerase mol- 86. Wall PK, Leebens-Mack J, Chanderbali
DNA Sudoku – harnessing high-throughput ecules. Science 323: 133–8. AS, et al. 2009. Comparison of next generation
sequencing for multiplexed specimen analy- 80. Astier Y, Braha O, Bayley H. 2006. Toward sequencing technologies for transcriptome
sis. Genome Res 19: 1243–53. single molecule DNA sequencing: direct characterization. BMC Genomics 10: 347.
73. Meyer M, Kircher M. 2010. Illumina sequenc- identification of ribonucleoside and deoxyri- 87. Holt RA, Jones SJ. 2008. The new paradigm
ing library preparation for highly multiplexed bonucleoside 50 -monophosphates by using of flow cell sequencing. Genome Res 18: 839–
target capture and sequencing. Cold Spring an engineered protein nanopore equipped 46.
Harb Protoc DOI: 10.1101/pdb.prot5448. with a molecular adapter. J Am Chem Soc 88. Dalca AV, Brudno M. 2010. Genome
74. Pop M, Salzberg SL. 2008. Bioinformatics 128: 1705–10. variation discovery with high-throughput
challenges of new sequencing technology. 81. Albertorio F, Hughes ME, Golovchenko JA, sequencing data. Brief Bioinf. 11: 3–14.
Trends Genet 24: 142–9. et al. 2009. Base dependent DNA-carbon

High-Throughput DNA Sequencing - Concepts and Limitations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High-Throughput DNA Sequencing - Concepts and Limitations

Uploaded by

Copyright:

Available Formats

Methods, Models & Techniques

Methods, Models & Techniques