Professional Documents
Culture Documents
Recent advances in DNA sequencing have revolutionized the field of genomics, modern biology and completely trans-
making it possible for even single research groups to generate large amounts of formed the field of genetics.
sequence data very rapidly and at a substantially lower cost. These high- At the time of the sequencing of
wX174, and for almost another decade,
throughput sequencing technologies make deep transcriptome sequencing
DNA sequencing was a barely auto-
and transcript quantification, whole genome sequencing and resequencing mated and very tedious process which
available to many more researchers and projects. However, while the cost involved determining only a few hun-
and time have been greatly reduced, the error profiles and limitations of the dred nucleotides at a time. In the late
new platforms differ significantly from those of previous sequencing technol- 1980s, semi-automated sequencers
with higher throughput became avail-
ogies. The selection of an appropriate sequencing platform for particular types
able [6, 7], still only able to determine
of experiments is an important consideration, and requires a detailed under- a few sequences at a time. A break-
standing of the technologies available; including sources of error, error rate, as through in the early 1990s was the
well as the speed and cost of sequencing. We review the relevant concepts and development of capillary array electro-
compare the issues raised by the current high-throughput DNA sequencing phoresis and appropriate detection sys-
technologies. We analyze how future developments may overcome these limita- tems [8–12]. As recently as 1996, these
developments converged in the pro-
.
tions and what challenges remain.
duction of a commercial single capillary
sequencer (ABI Prism 310). In 1998, the
Keywords: GE Healthcare MegaBACE 1000 and the
ABI/Life Technologies SOLiD; Helicos HeliScope; Illumina Genome Analyzer; ABI Prism 3700 DNA Analyzer became
Roche/454 GS FLX Titanium; Sanger capillary sequencing the first commercial 96 capillary
sequencers, a development which was
termed high-throughput sequencing.
Introduction sequenced [1] using a technology Over the last decade, alternative
invented just a few years earlier [2–5]. sequencing strategies have become
In 1977 the first genome, that of the Since then the sequencing of whole available [13–18] which force us to com-
5,386 nucleotide (nt), single-stranded genomes as well as of individual regions pletely redefine ‘‘high-throughput
bacteriophage wX174, was completely and genes has become a major focus of sequencing.’’ These technologies out-
perform the older Sanger-sequencing
DOI 10.1002/bies.200900181 technologies by a factor of 100–1,000
in daily throughput, and at the same
Department of Evolutionary Genetics, Max ChIP-Seq, Chromatin Immuno-Precipitation sequen- time reduce the cost of sequencing
Planck Institute for Evolutionary Anthropology, cing; CNV, Copy Number Variation; dNTPs/NTPs, one million nucleotides (1 Mb) to
Leipzig, Germany deoxy-nucleotides; ddNTPs, dideoxy-nucleotides
(modified nucleotides missing a hydroxyl group at 4–0.1% of that associated with Sanger
*Corresponding author: the third carbon atom of the sugar); GA, Short sequencing. To reflect these huge
Janet Kelso for Illumina Genome Analyzer; InDel, Insertion/ changes, several companies, research-
E-mail: kelso@eva.mpg.de Deletion; kb/Mb/Gb, kilo base (l03 nt)/mega
base (106 nt)/giga base (l09 nt); MeDIP-Seq,
ers, and recent reviews [19–24] use the
Abbreviations: Methyl- ation-Dependent Immuno-Precipitation term ‘‘next-generation sequencing’’
A/C/G/T, Deoxyadenosine, Deoxycytosine, Deoxy- sequen- cing; nt nucleotide(s); PCR, Polymerase instead of high-throughput sequencing,
guanosine, Deoxythymidine; ATP, Adenosine tri- Chain Reaction; RNA-Seq, Sequencing of yet this term itself may soon be outdated
phosphate; dATPaS, Deoxy-adenosine-5’-(alpha- mRNAs/transcripts; SAGE, Serial Analysis of
thio)-triphosphate; CCD, Charge-coupled Device, Gene Expression; SNP, Single Nucleotide Poly- considering the speed of ongoing
i.e. semi-conductor device used in digital cameras; morphism; mRNA, messenger RNA/transcripts. developments.
Here we review the five sequencing modified nucleotides missing a hydro- Unfortunately, there is still little auto-
technologies currently available on the xyl group at the third carbon atom of the mation for creation of the high copy
market (capillary sequencing, pyrose- sugar). The dNTP/ddNTP mixture input DNA with known priming sites.
Figure 1. Schematic representation of the Sanger sequencing proc- to be non-reversibly terminated, creating differently extended mol-
ess. Input DNA is fragmented and cloned into bacterial vectors for ecules. Subsequently, after denaturation, clean up of free nucleo-
in vivo amplification. Reverse strand synthesis is performed on the tides, primers, and the enzyme, the resulting molecules are sorted
obtained copies starting from a known priming sequence and using a using capillary electrophoresis by their molecular weight (corre-
mixture of deoxy-nucleotides (dNTPs) and dideoxy-nucleotides sponding to the point of termination) and the fluorescent label
(ddNTPs). The dNTP/ddNTP mixture randomly causes the extension attached to the terminating ddNTPs is read out sequentially.
rate when done in vivo), natural var- oped by Pål Nyrén and Mostafa Ronaghi converted to ATP by an ATP sulfurylase.
iance, and contamination in the sample at the Royal Institute of Technology, The ATP drives the light reaction of luci-
used, as well as polymerase slippage at Stockholm in 1996 [34]. In contrast to ferases present and the emitted light
low complexity sequences like simple the Sanger technology, pyrosequencing signal is measured. To prevent the
repeats (short variable number tandem is based on iteratively complementing dATP provided for sequencing reaction
repeats) and homopolymers (stretches single strands and simultaneously from being used directly in the light
of the same nucleotide). Further, lower reading out the signal emitted from reaction, deoxy-adenosine-50 -(a-thio)-
intensities and missing termination the nucleotide being incorporated triphosphate (dATPaS), which is not a
variants tend to lead to sequencing (also called sequencing by synthesis, substrate of the luciferase, is used
errors accumulating toward the end of sequencing during extension). Electro- for the base incorporation reaction.
long sequences. In combination with phoresis is therefore no longer required Standard deoxyribose nucleotides are
reduced separation by the electrophore- to generate an ordered read out of the used for all other nucleotides. After cap-
sis, base miscalls [32] and deletions nucleotides, as the read out is now turing the light intensity, the remaining
increase with read length. However, done simultaneously with the sequence unincorporated nucleotides are washed
the average error rate (the average over extension. away and the next nucleotide is
all bases of a sequence) after sequence In the pyrosequencing process provided.
end trimming is typically very low, with (Fig. 2), one nucleotide at a time is In 2005, pyrosequencing technology
an error every 10,000–100,000 nt [33]. washed over several copies of the was parallelized on a picotiter plate by
sequence to be determined, causing 454 Life Sciences (later bought by Roche
Roche/454 GS FLX Titanium polymerases to incorporate the nucleo- Diagnostics) to allow high-throughput
sequencer tide if it is complementary to the tem- sequencing [16]. The sequencing plate
plate strand. The incorporation stops if has about two million wells – each of
The 454 sequencing platform was the the longest possible stretch of comp- them able to accommodate exactly
first of the new high-throughput lementary nucleotides has been one 28-mm diameter bead covered with
Figure 2. The pyrosequencing process. One of four nucleotides is incorporation, one pyrophosphate per nucleotide is released and
washed sequentially over copies of the sequence to be determined, converted to ATP by an ATP sulfurylase. The ATP drives the light
causing polymerases to incorporate complementary nucleotides. reaction of luciferases present and a light signal proportional
The incorporation stops if the longest possible stretch of the avail- (within limits) to the number of nucleotide incorporations can be
able nucleotide has been synthesized. In the process of measured.
single-stranded copies of the sequence different sequences. These ‘‘mixed are washed over the plate) as well as
to be determined. The beads are incubated beads’’ will participate in a high number by the base composition and the order of
with a polymerase and single-strand of incorporations per flow cycle, result- the bases in the sequence to be deter-
Figure 3. Reversible terminator chemistry applied by the Illumina GA. nucleotides are washed away and the label of the bases incorporated
Sequencing primers are annealed to the adapters of the sequences for each sequence is read with four images taken through different
to be determined. Polymerases are used to extend the sequencing filters (T nucleotide filter is indicated in the figure) and using two
primers by incorporation of fluorescently labeled and terminated different lasers (red: A, C and green: G, T) to illuminate fluorophores.
nucleotides. The incorporation stops immediately after the first Subsequently, the fluorophores and terminators are removed and
nucleotide due to the terminators. The polymerases and free the sequencing continued with the incorporation of the next base.
stranded library molecules. By reverse the sequencing primer onto the adapter library and flow cell preparation
strand synthesis starting from the hybri- sequences and starting the reversible includes several in vitro amplification
dized (double-stranded) part, the new terminator chemistry. steps, which cause a high background
strand being created is covalently ‘‘Solexa sequencing’’, as it was error rate and contribute to the average
bound to the flow cell. If this new strand introduced in early 2007, initially error rate of about 10"2–10"3 [41, 42].
bends over and attaches to another oli- allowed for the simultaneous sequenc- Further, the flow cell preparation
gonucleotide complementary to the sec- ing of several million very short sequen- creates a fraction of ordinary-looking
ond adapter sequence on the free end of ces (at most 26 nt) in a single clusters that are initiated from more
the strand, it can be used to synthesize a experiment. In recent years there have than one individual sequence. These
second covalently bound reverse strand. been several technical, chemical, and results in mixed signals and mostly
This process of bending and reverse software updates. The product, which low quality sequences for these clusters.
strand synthesis, called bridge amplifi- is now called the Illumina Genome Similar to the 454 ghost wells, the
cation, is repeated several times and Analyzer, has increased flow cell cluster Illumina image analysis may identify
creates clusters of several 1,000 copies densities (more than 200 million clus- chemistry crystals, dust, and lint
of the original sequence in very close ters per run), a wider range of the particles as clusters and call sequences
proximity to each other on the flow cell flow cell is imaged, and sequence from these. In such cases the resulting
[13, 40]. reads of up to 100 nt can be generated. sequences typically appear to be of low
These randomly distributed clusters A technical update also enabled the sequence complexity.
contain molecules that represent the sequencing of the reverse strand of As is the case for the other platforms,
forward as well as reverse strands each molecule. This is achieved by the error rate increases with increasing
of the original sequences. Before deter- chemical melting and washing away position in the determined sequence.
mining the sequence, one of the strands the synthesized sequence, repeating a This is mainly due to phasing, which
has to be removed to prevent it few bridge amplification cycles for increases the background noise as
from hindering the extension reaction reverse strand synthesis, and then selec- sequencing progresses. While the
sterically or by complementary base tively removing the starting strand ensemble sequencing process for pyro-
pairing. Strands are selectively cleaved (again using base modifications of the sequencing creates uni-directional
at base modifications of oligonucleoti- flow cell oligonucleotide populations), phasing, reversible terminator sequenc-
des on the flow cell. Following strand before annealing another sequencing ing creates bi-directional phasing [41,
removal, each cluster on the flow cell primer for the second read. Using this 43] as some incorporated nucleotides
consists of single stranded, identically ‘‘paired-end sequencing’’ approach, may also fail to be correctly terminated –
oriented copies of the same sequence; approximately twice the amount of allowing the extension of the sequence
which can be sequenced by hybridizing data can be generated. The Illumina by another nucleotide in the same cycle.
With increasing cycle numbers, the on the Church lab sequencing-by- a similar fashion to that described ear-
intensities extracted from the clusters ligation concept, but combines it with lier for the 454/Roche platform. In con-
decline [41, 43, 44]. This is due to fewer a new strategy of sequencing library trast to the 454/Roche technology, the
Figure 4. Applied Biosystem’s SOLiD sequencing by ligation. A free 50 phosphate for further ligations. After multiple ligations, the
sequencing primer is annealed to single-stranded copies of sequen- synthesized strands are melted and the ligation product is washed
ces to be determined. Octamer probes are hybridized, ligated to the away before a new, by-one-nucleotide-shifted sequencing primer is
sequencing primer, and a fluorescent dye at the 50 end of the ligated annealed. Starting from the new sequencing primer the ligation
8-mer probes, encoding the two 30 -most nucleotides of the probe, is reaction is repeated. The same is done for three other primers,
read out. Non-extended primers are dephosphorylated. Three allowing the read out of the dinucleotide label for every position in
nucleotides of the probe including the dye are cleaved, creating a the sequence.
system in terms of throughput and price Helicos HeliScope The HeliScope, as the Helicos sequencer
per million nucleotides (#5,000 Mb/ is called, was first sold in March 2008,
day, #0.50$/Mb). Average error rates Helicos is the first company to sell a and by the end of the first quarter of
are, however, dependent on the avail- sequencer able to sequence individual 2009 only four machines have been
ability of a reference genome for error molecules instead of molecule ensem- installed worldwide. This might be sur-
correction (10"3–10"4 vs. 10"2–10"3). In bles created by an amplification proc- prising given the advantages of single
the absence of a reference genome, ess. Single molecule sequencing has the molecule sequencing, but probably
assembly and consensus calling may advantage that it is not affected by reflects both the specific limitations of
be performed based on dye read outs biases or errors introduced in a library this platform, the price (about one
(so-called color space sequences) to preparation or amplification step, and million dollars), and a relatively small
reduce the errors before conversion to may facilitate sequencing of minimal market that has already invested exten-
the nucleotide sequence. If no reference amounts of input DNA. Using methods sively in new sequencing technologies.
genome is available for error correction, able to detect non-standard nucleotides, The technology applied (Fig. 5)
and no assembly and consensus calling it could also allow for the identification could be termed asynchronous virtual
is performed, then the average error rate of DNA modifications, commonly lost in terminator chemistry [15]. Input DNA
is higher than for the Illumina GA. the in vitro amplification process. is fragmented and melted before a
poly-A-tail is synthesized onto each one incorporation before the polymer- molecules may be irreversibly termi-
single-stranded molecule using a poly- ase is washed away together with the nated by the incorporation of incorrectly
adenylate polymerase. In the last step of non-incorporated nucleotides (termed synthesized nucleotides. Overall, reads
polyadenylation, a fluorescently labeled virtual termination [48, 49]). The flow are between 24 and 70 nt long (average
adenine is added. The library is washed cell is then imaged again, the fluor- 32 nt) [50] and thus shorter than for the
over a flow cell where the poly-A tails escent dyes are removed, and the reac- other platforms. Due to the higher num-
bind to poly-T oligonucleotides. The tion continued with another nucleotide. ber of sequences determined in parallel,
bound coordinates on the flow cell are By this process not every molecule is the total throughput per day (4150 Mb/day
determined using a fluorescence-based extended in every cycle, which is why with a cost of #0.33$/Mb [50]) is in the
read out of the flow cell. Having these it is an asynchronous sequencing proc- same range as for the GA and SOLiD
coordinates identified, the fluorescent ess resulting in sequences of different systems. The average error rate, which
label of the 30 adenine is removed and length (as is the case for the 454/Roche is in the range of a few percent, is
the sequencing reaction started. platform). slightly higher than for all other instru-
Polymerases are washed through the Since single molecules are ments and biased toward InDels rather
flow cell with one type of fluorescently sequenced, the signals being measured than substitutions.
labeled nucleotide (A, C, G, or T) at a are weak, and there is no possibility that
time and the polymerases extend the misincorporation errors can be cor-
reverse strand of the sequences starting rected by an ensemble effect. Due to Applications and general
from the poly-T oligonucleotides. The the fact that molecules are attached to considerations
nucleotide incorporation of the poly- the flow cell by hybridization only, there
merases is slowed down by the fluor- is a chance that template molecules can All current high-throughput technol-
escent labeling and allows for at most be lost in the wash steps. In addition, ogies have an average error rate that
Further, the GS FLX Titanium, GA, protocols indicate the need for higher the two combined molecule ends. The
SOLiD, and HeliScope platforms each sample quantities (microgram range), internal adapter can then be used as a
have very specific biases and limita- many users are proceeding successfully second priming site for an additional
tions, making it necessary to choose a with low input DNA amounts (nanogram sequencing reaction on the same
platform appropriate for a specific proj- to picogram range), as, for example, from immobilized molecules. Thus, mate-pair
ect or application (for a summary see ancient DNA specimens [60–62]. sequencing provides distance infor-
Table 1). A combination of technologies Like Sanger sequencing, the GS FLX mation useful for assembly, but does
[51–54] and experimental protocols [55– Titanium provides a read length span- not allow the merging of the two over-
57] may also be appropriate, and even ning many of the short repeat sequences lapping end reads, since by design the
complementary, for specific projects. – an important feature for accurate molecules will not overlap in sequenc-
High-quality Sanger sequencing sequence mapping and assembly of ing. However, merging of two overlap-
is now commonly used to generate genomes [63]. Despite the InDel errors, ping forward and reverse paired end
low-coverage sequencing of individual this technology has very low rates of reads from short insert libraries allows
positions and regions (e.g., diagnostic misidentifying individual bases, making the reconstruction of a complete con-
genotyping) or the sequencing of it perfectly suited for the identification secutive molecule sequence, longer
virus- and phage-sized whole genomes. of single nucleotide polymorphisms than the individual read length, and
As the Sanger sequence length is (SNPs). Also geared to the identification with reduced average error rates in the
longer than most abundant short of SNPs, at least for samples with an overlapping sequence part [60, 66].
repeat classes, it allows the unambigu- existing reference genome, is the Due to the large amounts of sequen-
ous assembly of most genomic SOLiD instrument with its dinucleotide ces created, there is interest in sequenc-
regions – something that is generally encoding scheme [46]. Considerably ing targeted regions (e.g. a genomic
not possible using the shorter read higher coverage is needed to perform locus, from sequence capture exper-
platforms. However, the technology is SNP calling with similar accuracy using iments [67–69]) in multiple individuals/
expensive and too slow for sequencing the Illumina GA [64]. Neither the samples instead of sequencing one
large samples, extended genomic re- Illumina GA nor the ABI SOLiD sequenc- sample in excessive depth. All tech-
gions, or the many molecules required ing systems are prone to generate high nologies therefore provide a separation
for quantitative applications [e.g., rates of small InDels, making them well of their sequencing plate into defined
gene expression quantification; chro- suited for studying InDel variation. regions or channels. However, at most,
matin immunoprecipitation sequenc- As mentioned earlier, the drawback 16 such regions/channels are available
ing (ChIP-Seq); and methylation- of short reads (below about 75 nt) (GS FLX Titanium and HeliScope plates),
dependent immunoprecipitation sequen- obtained from Helicos, SOLiD, or GA which may not be sufficient for some
cing (MeDip-Seq)]. For quantitative instruments is in genome assembly applications. Using different library con-
applications the HeliScope provides and mapping applications, where the struction protocols, some platforms
the highest throughput in terms of placement of repeated or very similar allow addition of sample specific barcode
sequence number and has the sequences cannot be resolved unambig- (sometimes called ‘‘index’’) sequences to
advantage of not requiring a multistep uously. The correct placement is further the library molecules. These molecules
library preparation protocol. On the ot- complicated by high error rates intro- can then be sequenced in the same
her hand, the HeliScope provides the ducing a requirement for a minimum region/channel, and later separated
lowest resolution in mapping accuracy sequence distance of an unambiguous (computationally) based on their bar-
for complex genomes due its short read placement. Paired-end or mate-pair pro- code sequence [70–73]. This facilitates
length and error profile. The GA or tocols help to overcome some of these highly parallel sequencing of a large
SOLiD platforms may thus provide limitations of short reads [65] by provid- number of samples beyond that possible
equivalent results for quantitative appli- ing information about relative location using the physical lane/channel separ-
cations, while providing fewer but lon- and orientation of a pair of reads. ation. Currently such protocols (mostly
ger reads and requiring a more Currently a paired-end protocol is only non-vendor protocols) are available for
elaborate library preparation. commonly applied on the GA, while the GS FLX Titanium, GA, and SOLiD
While it has not yet been fully ana- mate-pair protocols are available for instrument.
lyzed, it is possible (and even likely) that SOLiD, GS FLX Titanium, and GA. In Although sequencing prices per giga-
library preparation protocols could bias paired-end sequencing the actual ends base have fallen considerably in recent
the sequence representation in a sample of rather short DNA molecules (<1 kb) years, making projects like the 1000
[42, 58, 59], making the replacement of are determined, while mate-pair Human Genome Variation Project, 1001
this step an important goal. Further, sequencing requires the preparation of Arabidopsis thaliana Genomes Project,
multistep library preparation protocols special libraries. In these protocols, the the Mammalian Genome Project, or
require higher amounts of input ends of longer, size-selected molecules the International Cancer Genome
material, limiting their general appli- (e.g., 8, 12, or 20 kb) are connected with Consortium possible, high-throughput
cation. However, protocols for library an internal adapter sequence in a circu- sequencing still has high acquisition,
construction from limited sample larization reaction. The circular running and maintenance costs, which
are not included in Table 1. Further, each increasing and the numbers given here Pacific Biosciences’ SMRT technology
of these platforms requires a substantial are rapidly outdated. However, in performs the sequencing reaction on
investment in data management and addition to the improvements of current silicon dioxide chips with a 100 nm
analysis, time, and personnel [74–77]. technologies, including the January metal film containing thousands of
Smaller research groups may still find 2010 announcement of the Illumina tens-of-nanometer diameter holes, so-
prohibitive the costs of the infrastructure HiSeq 2000 system, which determines called zero-mode waveguides (ZMWs)
needed for storing, handling, and ana- sequences of clusters on bottom and top [79]. Each ZMW is used as a nano-visual-
lyzing several tens of gigabytes of pure of the flow cell and processes two flow ization chamber, providing a detection
sequence data and terabytes of several cells in parallel, a new generation of volume of 20 zeptoliters (10"21 l). At this
thousand intermediate files generated by sequencers is already on the horizon. volume, a single molecule can be illu-
these instruments each week. Even for What started with the Helicos minated while excluding other labeled
larger, experienced genome centers this system – the sequencing of single mol- nucleotides in the background – saving
aspect remains an ever-increasing chal- ecules without prior library preparation time and sequencing chemistry by omit-
lenge for the ongoing use of these or amplification – will likely become a ting wash steps. A single DNA polymer-
platforms. popular paradigm. Specifically, three ot- ase is fixed to the bottom of the surface
her systems have captured media and within the detection volume, and
scientific attention well in advance of nucleotides, with different dyes
Upcoming developments their actual availability: Pacific attached to the phosphate chain, are
Bioscience’s Single Molecule Real used in concentrations allowing
Motivated by the goal of a $1,000- Time (SMRT) sequencing technology normal enzyme processivity. As the pol-
genome set by NIH/NHGRI to enable [18], Oxford Nanopore’s BASE technol- ymerase incorporates complementary
personalized medicine, the throughput ogy [14] and, recently, IBM’s proposal of nucleotides, the nucleotide is held
of all systems described is constantly silicon-based nanopores [78]. within the detection volume for tens
the incorporated nucleotide can be may overcome the destructive approach any library preparation or amplification,
identified during normal speed reverse followed so far. the identification of specific nucleotide
strand synthesis [79]. In pilot exper- modifications, and the ability to gener-
iments, Pacific Biosciences has shown ate longer sequence reads. These devel-
that its technology allows for direct Conclusion opments will facilitate future research in
sequencing of a few thousand bases many fields, make data analysis easier,
before the polymerase is denatured Current high-throughput sequencing and further reduce sequencing costs,
due to the laser read out of the dyes. technologies provide a huge variety of hopefully achieving the aim of a
The SMRT technology is intended for sequencing applications to many $1,000 human genome suggested by
release in 2010. Even though further researchers and projects. Given the NIH/NHGRI to be required for personal-
development is needed to create a more immense diversity, we have not dis- ized medicine.
robust system, the omission of library cussed these applications in depth here;
preparation and amplification as well other reviews with a stronger focus on
as the long sequences generated will specific applications and data analysis Acknowledgments
undoubtedly provide an advantage are available [24, 82–88]. The discussed We thank the members of the Depart-
over the current systems for many technologies make it possible for even ment of Evolutionary Genetics, and
applications. single research groups to generate large particularly members of the sequencing
Oxford Nanopore’s BASE technology amounts of sequence data very rapidly group, for providing sequencing data
is unlikely to be released as soon as and at substantially lower costs than from multiple platforms, as well
the SMRT technology. BASE offers traditional Sanger sequencing. While as interesting discussions and useful
the potential to identify individual costs have been reduced to less than insights. We are also indebted to A.
nucleotide modifications (e.g. 5-methyl- 4–0.1% and time has been shortened Wilkins and the three anonymous
cytosine vs. cytosine) during the by a factor of 100–1,000 based on daily reviewers for critical reading of the
sequencing process [14]. The idea behind throughput, the error profiles and manuscript and thoughtful comments.
this technology is the identification of limitations observed for the new plat- This work was supported by the Max
individual nucleotides using a change forms differ significantly from Sanger Planck Society.
in the membrane potential as they pass sequencing and between approaches.
through a modified a-hemolysin mem- Further, each of these new sequencing
brane pore with a cyclodextrin sensor platforms requires substantial additional
[14, 80]. However, to apply this technol- investments – factors that have often References
ogy for sequencing, the pore has to be not be sufficiently stressed in research
1. Sanger F, Air GM, Barrell BG, et al. 1977.
fused to an exonuclease, which degrades publications describing a specific appli- Nucleotide sequence of bacteriophage phi
single-stranded DNA sequences and cation. Some vendors have recently X174 DNA. Nature 265: 687–95.
releases individual nucleotides into the started to offer budget versions of their 2. Gilbert W, Maxam A. 1973. The nucleotide
sequence of the lac operator. Proc Natl Acad
pore. In addition, the technology needs instruments (e.g. Illumina GA IIe or 454/ Sci USA 70: 3581–4.
to be parallelized in array format, before Roche GS Junior) with lower sequencing 3. Sanger F, Nicklen S, Coulson AR. 1977.
its release as a high-throughput sequenc- capacity. However, while the instru- DNA sequencing with chain-terminating
ing platform. While the sensitivity for ment price is lower, the financial invest- inhibitors. Proc Natl Acad Sci USA 74:
5463–7.
individual nucleotide modifications ment remains high. Costs per base are 4. Sanger F, Coulson AR. 1975. A rapid method
seems to be a major advantage, the generally higher than for the standard for determining sequences in DNA by primed
destructive fashion of the outlined instrument, and very similar overall synthesis with DNA polymerase. J Mol Biol 94:
441–8.
sequencing process might be considered infrastructure is still required. Often 5. Wu R, Kaiser AD. 1968. Structure and base
a hindrance for applications with pre- the choice of an appropriate sequencing sequence in the cohesive ends of bacterio-
cious samples, and it does not allow a platform is project specific and some- phage lambda DNA. J Mol Biol 35: 523–37.
6. Smith LM, Sanders JZ, Kaiser RJ, et al.
second read cycle for error reduction. times combinations can be advan-
1986. Fluorescence detection in automated
In early October 2009, IBM issued a tageous. This may open the market DNA sequence analysis. Nature 321: 674–9.
press release [78] describing a method to further to companies providing 7. Swerdlow H, Gesteland R. 1990. Capillary
slow down the speed of an individual sequencing-on-demand services, but gel electrophoresis for rapid, high resolution
DNA sequencing. Nucleic Acids Res 18:
DNA strand passing through a nano- will not replace the need for laboratories 1415–9.
pore. For this purpose they developed to invest considerable time and exper- 8. Zagursky RJ, McCormick RM. 1990. DNA
a multilayer metal/dielectric nanopore tise in both the production of libraries sequencing separations in capillary gels on a
modified commercial DNA sequencing instru-
device that utilizes the interaction of the and analysis of the vast quantities of ment. Biotechniques 9: 74–9.
DNA backbone charges with a modu- data that will be generated. 9. Huang XC, Quesada MA, Mathies RA. 1992.
lated electric field to trap and slowly New technologies on the horizon, DNA sequencing using capillary array electro-
releases an individual DNA molecule. SMRT by Pacific Biosciences, BASE by phoresis. Anal Chem 64: 2149–54.
10. Kambara H, Takahashi S. 1993. Multiple-
The technology described could theor- Oxford Nanopore, and other technol- sheathflow capillary array DNA analyser.
etically be combined with, for example, ogies such as that suggested by IBM, Nature 361: 565–6.
11. Ueno K, Yeung ES. 1994. Simultaneous 384 multicapillary sequencer. Genome Res 49. Bowers J, Mitchell J, Beer E, et al. 2009.
monitoring of DNA fragments separated by 10: 1757–71. Virtual terminator nucleotides for next-
electrophoresis in a multiplexed array of 100 31. Hert DG, Fredlake CP, Barron AE. 2008. generation DNA sequencing. Nat Methods
capillaries. Anal Chem 66: 1424–31. Advantages and limitations of next-gener- 6: 593–5.
960–74. A large genome center’s improvements to the Computational methods for discovering
69. Briggs AW, Good JM, Green RE, et al. 2009. Illumina sequencing system. Nat Methods 5: structural variation with next-generation
Targeted retrieval and analysis of five 1005–10. sequencing. Nat Methods 6: S13–20.
Neandertal mtDNA genomes. Science 325: 77. Batley J, Edwards D. 2009. Genome 83. Pepke S, Wold B, Mortazavi A. 2009.
318–21. sequence data: management, storage, and Computation for ChIP-seq and RNA-seq
70. Meyer M, Stenzel U, Hofreiter M. 2008. visualization. Biotechniques 46: 333–4, 336. studies. Nat Methods 6: S22–32.
Parallel tagged sequencing on the 454 plat- 78. IBM Research. 2009. IBM research aims to 84. Flicek P, Birney E. 2009. Sense from
form. Nat Protoc 3: 267–78. build nanoscale DNA sequencer to help drive sequence reads: methods for alignment and
71. Meyer M, Stenzel U, Myles S, et al. 2007. down cost of personalized genetic analysis. In assembly. Nat Methods 6: S6–12.
Targeted high-throughput sequencing of Loughran M, ed.; Press Releases, Vol. 2009. 85. Park PJ. 2009. ChIP-seq: advantages and
tagged nucleic acid samples. Nucleic Acids New York: IBM. challenges of a maturing technology. Nat
Res 35: e97. 79. Eid J, Fehr A, Gray J, et al. 2009. Real-time Rev Genet 10: 669–80.
72. Erlich Y, Chang K, Gordon A, et al. 2009. DNA sequencing from single polymerase mol- 86. Wall PK, Leebens-Mack J, Chanderbali
DNA Sudoku – harnessing high-throughput ecules. Science 323: 133–8. AS, et al. 2009. Comparison of next generation
sequencing for multiplexed specimen analy- 80. Astier Y, Braha O, Bayley H. 2006. Toward sequencing technologies for transcriptome
sis. Genome Res 19: 1243–53. single molecule DNA sequencing: direct characterization. BMC Genomics 10: 347.
73. Meyer M, Kircher M. 2010. Illumina sequenc- identification of ribonucleoside and deoxyri- 87. Holt RA, Jones SJ. 2008. The new paradigm
ing library preparation for highly multiplexed bonucleoside 50 -monophosphates by using of flow cell sequencing. Genome Res 18: 839–
target capture and sequencing. Cold Spring an engineered protein nanopore equipped 46.
Harb Protoc DOI: 10.1101/pdb.prot5448. with a molecular adapter. J Am Chem Soc 88. Dalca AV, Brudno M. 2010. Genome
74. Pop M, Salzberg SL. 2008. Bioinformatics 128: 1705–10. variation discovery with high-throughput
challenges of new sequencing technology. 81. Albertorio F, Hughes ME, Golovchenko JA, sequencing data. Brief Bioinf. 11: 3–14.
Trends Genet 24: 142–9. et al. 2009. Base dependent DNA-carbon