You are on page 1of 18

J Mol Evol (2004) 58:424–441

DOI: 10.1007/s00239-003-2564-9

Dating the Monocot–Dicot Divergence and the Origin of Core Eudicots


Using Whole Chloroplast Genomes

Shu-Miaw Chaw,1 Chien-Chang Chang,1 Hsin-Liang Chen,1 Wen-Hsiung Li2

1
Institute of Botany, Academia Sinica, 128 Sec. 2, Academy Road, Taipei 115, Taiwan
2
Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA

Received: 31 July 2003 / Accepted: 23 October 2003

Abstract. We estimated the dates of the monocot– the monocot–dicot divergence and the core eudicot’s
dicot split and the origin of core eudicots using a age are older than their respective fossil records.
large chloroplast (cp) genomic dataset. Sixty-one
protein-coding genes common to the 12 completely Key words: Chloroplast genome — Divergence of
sequenced cp genomes of land plants were concate- monocot and dicot — Angiosperm phylogeny —
nated and analyzed. Three reliable split events were Age of core eudicots — Molecular clock — Un-
used as calibration points and for cross references. equal rate
Both the method based on the assumption of a con-
stant rate and the Li–Tanimura unequal-rate method
were used to estimate divergence times. The phylo-
genetic analyses indicated that nonsynonymous sub- Introduction
stitution rates of cp genomes are unequal among
tracheophyte lineages. For this reason, the constant- Fossil evidence suggests that flowering plants (an-
rate method gave overestimates of the monocot–dicot giosperms) first appeared 140 million years (Myr)
divergence and the age of core eudicots, especially ago in the early Cretaceous (Willis and McElwain
when fast-evolving monocots were included in the 2002). They soon diversified and expanded globally in
analysis. In contrast, the Li–Tanimura method gave the mid-Cretaceous (90–100 Myr ago) (Nicholas et al.
estimates consistent with the known evolutionary 1983). Although the angiosperm phylogeny has now
sequence of seed plant lineages and with known fossil been largely established (Mathews and Donoghue
records. Combining estimates calibrated by two 1999; PS Soltis et al. 1999; Qiu et al. 1999; Parkinson
known fossil nodes and the Li–Tanimura method, we et al. 1999; DE Soltis et al. 2000; Chase et al. 2000),
propose that monocots branched off from dicots 140– the question of why the oldest unequivocal fossil for
150 Myr ago (late Jurassic–early Cretaceous), at least angiosperms is nearly 300 and 170 Myr later than the
50 Myr younger than previous estimates based on the first vascular plants (ca. 440 Myr ago [Taylor and
molecular clock hypothesis, and that the core eudi- Taylor 1993]) and their extant sister group, the
cots diverged 100–115 Myr ago (Albian–Aptian of gymnosperms (late Carboniferous, ca. 310 Myr ago
the Cretaceous). These estimates indicate that both [Doyle 1998]), respectively, remains an ‘‘abominable
mystery’’ (Darwin et al. 1903). A number of hy-
potheses have been proposed to explain the late ar-
rival of angiosperms in the fossil record. These
include (1) the escape of fossilization in the initial
Correspondence to: Shu-Miaw Chaw; email: smchaw@sinica. stage of angiosperm evolution (Thomas and Spicer
edu.tw 1987), (2) bias in the fossil record (i.e., angiosperms
425

Table 1. Comparison of previous estimates of divergence between monocots and dicots

Reference point
Reference Gene used (timea of divergence; Myr) Estimated timea (Myr)

Ramshaw et al. (1972) Cytochrome cb Mammals–birds (280) 220–240


Martin et al. (1989) nrc GapC, CHS Animal–yeast (1000) 319 ± 35
Drosophila–vertebrates (600)
Mammals–chicken (270)
Humane–rat (85)
Wolfe et al. (1989) 12 cpc genes Bryophyte–angioaperm (350–450) 170–230
Maize–wheat (50–70) 150–260
nr 26S, 18S rRNA Plant–animal (1000) 200–250, 200–210
Brandl et al. (1992) cp tRNA Maize–wheat (50–70)
Tracheophyte–bryophyte (350–450) 230–350
Plant–animal (1000)
Martin et al. (1993) nr GapC, cp rbcL Bryophyte–spermatophyte (450) 300
Conifer–angiosperm (330)
Laroche et al. (1995) 12 mtc genes Maize–wheat (50–70) 170–238
Vicieae–Phaseoleae (45–65) 157–226
Goremykin et al. (1997) 58 cp genesb Bryophyte–spennatophyte (450) 160 ± 16
Sanderson (1997) rbcL Marchantia (450) [160–215]d
Yang et al. (1999) mt 1st intron of nad Maize–wheat (50–70) 170–235
Sanderson & Doyle (2001) rbcL, 18S rRNA Land plant (450) 140–190
Wikström et al. (2001) cp rbcL, atpB Fagales–Cucurbitales (84) [158–179]
nr 18S rDNA [131–147]e
a
The unit of time is millon years ago (or before present).
b
The translated amino acid sequence was used.
c
cp, chloroplast; mt, mitochondrial; nr, nuclear.
d
The age of extant angioaperms.
e
The origin date of eudicots.

evolved much earlier but went undetected), and (3) from the three plant genomes (Mathews and Do-
the suggestion that the evolution of angiosperms was noghue 1999; Parkinson et al. 1999; Qiu et al. 1999;
triggered by a particular set of environmental con- PS Soltis et al. 1999; DE Soltis et al. 2000; Chaw et al.
ditions, and/or biotic interactions (such as co-evolu- 2000). These phylogenetic analyses have led to the
tion with faunal groups) (Willis and McElwain 2002). conclusion that the dicots were split into the basal
Is the origin of angiosperms actually much older dicots (or the magnoliids) and the eudicots and that
than the known fossil record? Since Ramshaw et al.’s the monocot lineage was derived from one of the
(1972) first application of molecular data to address basal magnoliids (Fig. 1A). Parallel to the molecular
this question, three decades have passed. In the in- data has been the accumulation of pollen fossils of
terim, molecular phylogenetic studies and critical eudicots, which began in the late Barremian (of
fossils of derived angiosperms from older geological Cretaceous, ca. 120 Myr ago) and spread globally in
deposits (Magallón et al. 1999; Wikström et al. 2001) the Albian (ca. 110 Myr ago) (Doyle 1992; Hughes
have opened up an opportunity to readdress the age 1994). In addition, many new megafossils of basal
and evolution of angiosperms. Although all previous eudicots have appeared, such as Tetracentraceae
estimates of the monocot–dicot divergence (Table 1) from the Barremian (110–118 Myr ago) (Magallón
predate angiosperms’ fossil records, they are highly et al. 1999), as well as core eudicots, such as a pos-
variable, ranging from 140–190 Myr (Goremykin sible Rhamnaceae/Rosaceae (rosids) from the early
et al. 1997; Sanderson 1997; Wikström et al. 2001; Cenomanian (94–97 Myr ago [Basinger and Dilcher
Sanderson and Doyle 2001) to 200 Myr (Ramshaw 1984]). It has also been suggested that the date of
1972; Wolfe et al. 1989; Laroche et al. 1995; Yang diversification of core eudicots was underestimated.
et al. 1999) or even 300–320 Myr (Martin et al. 1989, Wikström et al. (2001) have examined this issue
1993; Brandl et al. 1992). (Table 1) with nuclear 18S rDNA and two cp (rbcL
Traditionally, the angiosperms were subdivided and atpB) genes. We now provide additional evidence
into two classes, Liliopsida (the monocots) and by analyzing whole chloroplast (cp) genomic DNA
Magnoliopsida (the dicots) (Cronquist 1988). How- sequences.
ever, this subdivision was first refuted by rbcL and Cp DNA sequences are useful for studying the
18S rRNA gene phylogenies (Chase et al. 1993; Chaw plant phylogeny at deep levels of evolution because of
et al. 1997) and later by analyses of multiple genes their lower rates of silent nucleotide substitution
426

Fig. 1. Rooted phylogenetic tree for the 12 sampled species. A B). Gene loss (open bar), loss but with known transfer to nucleus
Phylogeny of angiosperms based on Qiu et al. (1999) and P. S. (hatched bar), retention (gray bar), and likely gain with no simi-
Soltis et al.’s (1999) phylogenetic trees. Solid lines lead to taxa larity to prokaryotic genes (filled bar) are plotted on the branches
sampled in this study. B Rooted NJ tree using the Pamilo–Bianchi– leading to each lineage. The upper numbers at each node denote the
Li distances based on the Ka values concatenated from 61 cp bootstrap percentages (where applicable, values of the interior
protein-coding genes. The branches leading to nodes C2 and A are branch test indicated after the slash). Total gene number in the cp
not drawn to scale. Lengths are indicated. The calibration points genome is given after each species, in parentheses. Branch lengths
(nodes C1, C2, C3) were used to estimate the divergence between and the scale bar are Ka values per 100 sites.
monocots and dicots (node A) and the origin of core eudicots (node

(Palmer 1985a, b; Wolfe et al. 1989; Clegg et al. was due to rate variation across lineages. In order to
1994). Moreover, concatenating sequences from mitigate this problem we used mean branch lengths of
many genes may overcome the problem of multiple the sampled monocots and dicots.
substitutions that cause the loss of phylogenetic in- The focus of this study is to estimate the dates of
formation between cp lineages (Lockhart et al. 1999) the monocot–dicot split and the origin of core eudi-
and can reduce ‘‘sampling errors due to substitutional cots using a large cp genomic dataset. The date of the
noise and the finite number of characters within a monocot–dicot divergence can be calculated by ex-
gene’’ (Sanderson and Doyle 2001). In this study we trapolation from the reliable dates of other speciation
analyzed 39,507 sites of cp DNA genomic sequences events by means of phylogeny based on DNA se-
from 61 protein-coding genes common to the 12 quence distances (Wolfe et al. 1989). Three diver-
complete cp genomes of land plants (Table 2). Our gence events with well-supported fossil dates were
dataset is larger than those used in previous studies, used as calibration points and cross references. Both
including that of Goremykin et al. (1997; see also the method based on the assumption of a constant
Table 1), who analyzed 40 proteins of cp genomes rate and Li and Tanimura’s (1987) unequal-rate
from fewer taxa (five land plants, including only one method (hereafter the Li–Tanimura method) were
dicot and two monocots). used to estimate divergence times, and the estimates
Molecular dating often assumes rate constancy, were compared with known fossil dates. Although
but this is frequently violated (PS Soltis et al. 2002 several other methods without the rate constancy
and references herein). For example, substitution assumption, such as the nonparametric rate
rates of cp genes vary greatly among and within smoothing method (NPRS), have been proposed
tracheophyte (or vascular plant) lineages (Bousquet (Sanderson 1997 and references cited herein), we
et al. 1992; Gaut et al. 1992, 1993; Clegg et al. 1994; chose the Li–Tanimura method for its simplicity. The
Sanderson and Doyle 2001; PS Soltis et al. 2002), method uses lineages in which the molecular clock
between protein-coding loci (Muse and Gaut 1997; holds better than the others to estimate the diver-
Matsuoka et al. 2002), and between nonsynonymous gence time at a particular node. We also discuss
and synonymous sites (Gaut et al. 1997; Matsuoka et possible reasons for discrepancies among estimates of
al. 2002). Sanderson and Doyle (2001) believed that divergence dates obtained in this study and previous
much of the conflict in estimating divergence times studies.
427

Table 2. Scientific names, classification, and NCBI accessions of species in the dataset

Classificationa Scientific name NCBI accession No. (version date)b/Reference

Bryophyte
Marchantiaceae Marchantia polymorpha NC_001319 (Aug 2002)/Ohyama et al. (1986)
Petridophyte
Psilotaceae Psilotum nudum AP004638 (Nov 2002)/Wakasugi et al. (2000)
Gymnoaperm
Pinaceae Pinus thunbergii NC_001631 (Sep 2002)/Wakasugi et al. (1994)
Angiosperms
Monocots
Poaceae
Andropogoneae Zea mays NC_001666 (Sep 2002)/Maier et al. (1995)
Oryzeae Oryza sativa NC_001320 (Sep 2002)/Hiratsuka et al. (1989)
Triticeae Triticum aestivum NC_002762 (Sep 2002)/Ikeo and Ogihara (2000)
Dicots
Eudicots
Caryophyllidae
Chenopodiaceae Spinacia oleracea NC_002202 (Aug 2002)/Schmitz-Linneweber et al. (2001)
Asteridae
Solanaceae Nicotiana tabacum NC_001879 (Sep 2002)/Shinozaki et al. (1986)
Rosidae
Brassicaceae Arabidopsis thaliana NC_000932 (Sep 2002)/Sato et al. (1999)
Onagraceae Oenothera elata subsp. hookeri NC_002693 (Sep 2002)/Hupfer et al. (2000)
Fabaceae
Papillionoideae
Loteae Lotus japonicus NC_002694 (Sep 2002)/Kato et al. (2000)
Trifolieae Medicago truncatula AC093544c(Nov 2001)/Lin et al. (2001)
a
Ranks of species follow the NCBI’s Taxonomy Guide.
b
Data modified from http://megasun.bch.umontreal.ca/ogmp/projects/other/cp_list.html (vers. 20 Dec 2002).
c
No gene annotation in this accession.

Data and Methods from the rice cp genome either. We used the reviews of Millen et al.
(2001) and Martin et al. (2002) cp genes as guides to confirm our
BLAST searches, especially for those genes lost or with unknown
Database Search for Cp Genome Sequences functions.
After careful comparison and annotation, a total of 98 protein-
Individual genes of the 12 published cp genome sequences (Table coding genes was found in the cp genomes of the 12 sampled
2) were downloaded from GenBank, National Center for Bio- species (Table 2). The lengths as well as the presence or absence of
technology Information (NCBI). Nomenclatures of the cp pro- those genes in each taxon are presented in Appendix 1. An open
tein-coding genes complied by Hallick and Bairoch (1994), Stoebe reading frame homologous to a known gene was given the same
et al. (1998), Martin et al. (2002), and Swiss-Prot Protein name to facilitate comparison and alignment. For some unanno-
Knowledgebase (2003) were used as guides. When synonyms were tated genes filtered by using BLASTX search, their positions in the
encountered, their sequence homologies with the typified names corresponding genomes were indicated. We excluded pseudogenes
were carefully verified. Two homology criteria were considered: and genes duplicated in the inverted repeat regions. The cp encoded
(1) the alignable length between two proteins is larger than 80% RNA genes were previously shown to be problematic in early cp
of the longer sequence, and (2) the sequence identity in the phylogeny (Martin et al. 1998; Lockhart et al. 1999) and in the
aligned region is at least 40% if L > 150, or at least 0.06 + present study as well (data not shown). Therefore, RNA genes were
4.8L)0.032 (1 þ exp()L/1000)) (Rost 1999; Gu et al. 2002). Note that we excluded from analysis.
raise the identity to 40% instead of 30% because the taxa we
sampled are comparatively recent and cp genes are highly con-
served (Wolfe et al. 1989). Alignment of All Cp Genes and Phylogenetic Analyses
Since the sequence of Medicago was not annotated, its protein-
coding genes were annotated using the Nucleotide query–Protein Amino acid sequences of each gene from the 12 taxa were first
database (BLASTX) algorithm at NCBI with each known gene aligned one by one using GeneDoc (Nicholas and Nicholas 1997)
from Lotus as query. If a particular gene was missing from Lotus, with minor adjustments. The alignment was then used as a guide
that gene from the rest of the 10 taxa was used instead. Open for aligning the corresponding nucleotide sequences. Unknown
reading frames annotated by us were also verified using the BLAST sites, start and stop codons, and regions difficult to align were
2 sequences algorithm and the Nucleotide query–Translated db removed from each gene alignment. All aligned individual gene
algorithm in NCBI against the corresponding gene and the whole sequences were then assembled using the Text Editor in MEGA 2.1
genome of Arabidopsis, respectively. A query sequence with more (Kumar et al. 2001). Gaps were completely deleted from the as-
than 40% identity to the specific known genes was then considered sembled alignment concatenated from the 61 cp protein-coding
as a putative homologous gene. A remnant of the accD gene in the genes common to the 12 sampled taxa (see also Results). The
rice was reported previously (Hiratsuka et al. 1989) but could not working data file (in MEGA format) is shown in Appendix 2,
be detected by Katayama and Ogihara (1996) or Ogihara et al. available in the Supplementary Material Section at the JME Web
(2002) using Southern hybridization. We were not able to locate it site.
428

Nucleotide sequence divergence between a pair of taxa (or Pennsylvanian to the upper Triassic (215–310 Myr ago). The
groups) was calculated in terms of the numbers of substitutions per earliest fossil evidence of trees bearing the typical conifer’s bisac-
synonymous site (Ks) or per nonsynonymous site (Ka), using the cate pollen that germinates distally dates from the late Carbonif-
Pamilo–Bianchi–Li method implemented in MEGA 2.1. Diver- erous to early Permian (ca. 250–290 Myr ago) and conifer relatives
gence value between two groups is presented as average distance ± are known from ca. 310 Myr ago (Rai et al. 2003). Gymnosperms
standard error, obtained from the option Compute Between and angiosperms are the two major taxa of seed plants, distinct
Groups Means in MEGA 2.1. Average distance between two since the end of the Carboniferous, 300 Mya (Bow et al. 2003).
groups is the arithmetic mean of all pairwise distances between taxa From the above considerations, we took 280–310 Myr as an upper
in the intergroup comparisons. To date the divergence between the bound for the split between the conifer and the angiosperm line-
monocot and the dicot lineages, Saito and Nei’s (1987) neighbor- ages.
joining (NJ) method and the Ka values (not Ks, because substitu- Fossil leaves of rice (belonging to the grass family Poaceae)
tions at the third codon positions are saturated across sampled land have been described from the upper Eocene, about 40 Myr ago
plant lineages; see Results) were used to reconstruct the phylo- (Stebbins 1981), and the earliest unequivocal evidence of grass
genetic trees, rooted at the top of the Pinus lineage. Because the six fossils (including spikelets and inflorescence with pollen) were
sampled dicots (Table 2) represent the two large clades (the rosids found in Paleocene–Eocene deposits, about 50–60 Myr ago (Crepet
and the asterids) of core eudicots and one of the remaining four and Feldman 1991). Initial radiation of the grass family was sug-
small core eudicot clades, they can be used to infer the age or gested to be 65 Myr ago (Stebbins 1987; Thomasson 1987). Bremer
diversification date of core eudicots. The NJ trees reconstructed by (2000) regarded the 50–70 Myr ago estimate of a maize–wheat
Ka values and Ks values were rooted at the monocot lineage (see divergence used by Wolfe et al. (1989) as rather uncertain. More-
Results). Relative support for each node was evaluated using the over, phylogenetic analyses of the cp rpl16 intron sequences (Zhang
bootstrap test and the interior branch test implemented in MEGA 2000), eight character sets (GPWG 2001), cp genome structure
2.1 with 2000 replicates. The latter test is constructed based on the (Ogihara et al. 2002), and cp genomic comparison (Matsuoka et al.
interior branch length and its standard error. If this value is higher 2002) indicated that in the grass family, Oryzoideae (rice) and
than 95% for a given branch, then the inferred length for that Pooideae (wheat) diverged after the subfamily Panicoideae (maize),
branch is considered significantly higher than 0 (Kumar et al. which was preceded by four other subfamilies (Zhang 2000).
2001). To compare the evolutionary rates of sampled fern, pine, Therefore, we took 50–60 Myr as a reasonable estimate of the
monocot, and dicot lineages, Tajima’s relative rate test (1993) im- maize–wheat split.
plemented in MEGA 2.1 was applied. Because the method does not
distinguish between Ka, and Ks, the first and second codon posi-
tions of the combined 61 cp protein-coding genes were used
instead. Results

Calibration Points Cp Genome Data

To date the divergence between the monocots and the dicots, three The concatenated lengths of all known cp functional
split events (see Figs. 1 and 4) with reliable fossil dates were used as protein-coding genes (Appendix 1) in the 12 sampled
reference nodes: (C1) the Psilotum (fern)–seed plant split (400–420 species (Table 2) range from 58,095 bp in the Triticum
Myr old [Pryer et al. 2001]); (C2) the Pinus (conifer)–angiosperm
to 71,509 bp in the Marchantia; the average is 63,661
split (280–310 Myr old); and (C3) the maize–wheat split (50–60
Myr old). Since uncertainties about the age of the reference node ± 4,764 bp. Sixty-one cp protein-coding genes, which
were a probable reason behind the discrepancies among previous encode two envelope membrane proteins (cemA,
estimates of angiosperm origin (Bremer 2000; Sanderson and Doyle ycf9), 1 maturase (matK), 1 protease (clpP), 34 pho-
2001), we have carefully examined the dates of our calibration tosynthetic light reactions (atpA, atpB, atpE, atpF,
nodes.
atpH, atpI, petA, petB, petD, petG, petL, petN, psaA,
psaB, psaC, psaI, psaJ, psbA, psbB, psbC, psbD, psbE,
Fossil Dates psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN,
psbT), 18 ribosomal proteins (rpl2, rpl14, rpl16, rpl20,
Psilotum has been repeatedly suggested as a member of ferns by rpl32, rpl33, rpl36, rps2, rps3, rps4, rps7, rps8, rps11,
molecular data (e.g., Nickrent et al. 2000; Pryer et al. 2001), but the rps12, rps14, rps15, rps18, rps19), 4 RNA polym-
architecture of its sperm cell suggests that Psilotum is an early erases (rpoA, rpoB, rpoC1, rpoC2), and 1 cytochrome
divergent fern (Renzaglia et al. 2001) with relatively remote affin-
ities to Ophioglossaceae (a basal fern family) and Equisetaceae
c biogenesis protein (ccsA), are in common to all 12
(sphenopsids). Kenrick and Crane (1997) considered that the basal taxa. After elimination of unknown sites, regions
dichotomy of Euphyllophytina occurred in the early–mid Devo- difficult to align, start and stop codons, and all gaps,
nian (ca. 400–420 Myr ago) and resulted in two clades: one con- 39,507 sites were used for comparison and tree
taining the extinct Psilophyton and the other ferns, horsetails, and reconstruction.
seed plants. We took this splitting date as the lower bound for the
divergence between Psilotum and seed plants.
As shown in Table 3 (the first row), the 12 cp ge-
Pinus is a genus of Pinaceae, which contains over 230 species nomic sequences are AT-rich. This bias is particularly
and is the largest and most basal family of conifers (Hart 1987; strong at the third codon positions, primarily because
Price et al. 1993; Chaw et al. 1997). Delevoryas and Hope (1973) of the high T nucleotide contents. These data are
and Miller (1977, 1988) proposed that the Triassic (206–248 Myr consistent with the high AT content found earlier in
ago) period may represent a time when modern conifers were
evolving. Cladistic and stratigraphic analyses of living seed plants
the plastid genome (Whitfeld and Bottemley 1983).
(Doyle and Donoghue 1987; Crane 1988; Doyle 1998) suggested Across the 11 tracheophytes nucleotide base compo-
that diversification of modern seed plants occurred from the lower sitions are homogeneous at the first and second co-
429

Table 3. Nucleotide base composition (%) of the concatenated 61 cp protein-coding genes in Marchantia and 11 sampled tracheophytes

Codon positiona A C G T pb

All 33.2/29.2 ± 0.4 (1.4%) 14.4/18.4 ± 0.5 (2.7%) 17.8/21.5 ± 0.4 (1.9%) 34.6/31.0 ± 0.5 (1.6%) 0.000
1st 30.3/28.2 ± 0.4 (1.4%) 16.9/19.6 ± 0.3 (1.5%) 29.1/30.2 ± 0.3 (1.0%) 23.7/22.0 ± 0.3 (1.4%) 0.793
2nd 29.1/27.2 ± 0.3 (1.1%) 20.8/21.7 ± 0.2 (0.9%) 17.4/19.1 ± 0.3 (1.6%) 32.6/32.0 ± 0.3 (0.9%) 0.981
3rd 40.1/32.3 ± 0.8 (2.5%) 5.3/13.8 ± 1.0 (7.2%) 7.0/15.0 ± 0.8 (5.3%) 47.5/39.0 ± 1.1 (2.8%) 0.000
a
The start and stop codons were not included in analysis. Data on Marchantia are before the slash and the average of 11 sampled
tracheopytes is after the slash and presented as mean ± SE (coefficient of variation).
b
Probabihty (p) was based on v2 tests for homogeneity across the 11 sampled tracheophytes using PAUP 4.0b1 (Swofford 1998).

Fig. 2. Uncorrected pairwise sequence


divergence (P distance) plotted against
corrected distances (Kimura two-pa-
rameter) for transitions (Ts) and trans-
versions (Tv) at first and second codon
positions (A) and third codon position
(B). Each plot presents 66 data points.

don positions (v2 test, p = 0.793 and 0.981) but not tracheophyte lineages are 100%, and the latter test
so at the third codon positions (p < 0.000). The G yielded a higher percentage support for the rice +
content is particularly high at the first codon posi- wheat clade. The phylogenetic relationships of the
tions in all taxa and Marchantia much prefers the use monocot lineage and the six core eudicots generally
of synonymous codons ending with A or T. agree well with those in recent multigene trees (Fig.
The mean Ka/Ks ratio for all species pairs is 0.19. 1A [Qiu et al. 1999; PS Soltis et al. 1999; DE Soltis et
The mean Ka/Ks ratio difference between the mono- al. 2000]) except that in our NJ tree the Caryophyll-
cot (0.156) and the dicot (0.158) lineages is small. ales (represented by Spinacia) and asterid (repre-
These data are suggestive of stringent selective con- sented by Nicotiana) are well resolved as sister clades.
straints on amino acid substitutions and correlate This relationship was previously revealed in the trees
well with the observed higher GC contents at the first made by Wolfe et al. (1989) and Goremykin et al.
and second positions (Table 3). (2003).
The NJ tree reconstructed from the Ks values ap-
The Inferred Phylogenetic Trees pears to be unreliable because it placed Arabidopsis as
basal to the remaining dicots (data not shown),
Figure 1A was simplified from the topology of the contrary to the most recent multigene phylogenies of
maximum parsimony (MP) trees reconstructed with angiosperms (Mathews and Donoghue 1999; Qiu et
multigenes by Qiu et al. (1999) and by S. P. Soltis al. 1999; PS Soltis et al. 1999; DE Soltis et al. 2000;
et al. (1999). Figure 1B is a NJ tree reconstructed with Chase et al. 2000). These data caused us to question if
Ka values using Marchantia as the outgroup. The the third codon position, where most synonymous
topology of this tree strongly indicates that, to the substitution occurs, is saturated with substitutions.
exclusion of the fern (Psilotum) lineage, the seed To assess levels of sequence saturation with the
plants form a monophyletic clade, within which the concatenated cp genes, pairwise uncorrected numbers
conifer (Pinus) lineage and the angiosperms comprise of transitions and transversions (uncorrected P) were
two separate subgroups. The sampled angiosperms plotted against corrected (Kimura’s two-parameter)
are subdivided into two well-supported lineages, the sequence distance (Fig. 2). Sixty-six paired points [12
monocots and the eudicots. Both bootstrap and in- · (12 ) 1)/2] are presented in Fig. 2. The curves of
terior branch tests for the above-mentioned major both uncorrected transitions and transversions
430

limited number of eudicots, Fig. 3A suggests that the


six core eudicots first split (at node B) into two well-
supported monophyletic clades, the rosids (repre-
sented by Oenothera, Arabidopsis, Lotus, and Medi-
cago) and the asterids + Caryophyllales (represented
by Nicotiana and Spinacia, separately).
Both NJ trees reconstructed from Ka (Fig. 1B) and
Ks (Fig. 3A) values suggested a close relationship
between the Ehrhartoideae (rice) and the Pooideae
(wheat) with the maize as an outgroup, but the
bootstrap values for the rice–wheat clade are low to
moderate. Recently, using the NJ method with the
variable sites of 98 genes (including not only all
protein-coding but also RNA genes) common to the
cp genomes of these three cereals and rooting at the
Nicotiana lineage, Matsuoka et al. (2002) also placed
maize as sister to the rice–wheat clade.

Phylogenetic Distribution of Cp Genes During


Tracheophyte Evolution

We examined a total of 98 protein-coding genes


(Appendix 1) that are present among the 12 studied
taxa. Figure 1B also presents the protein-coding gene
numbers held in each sampled species and specific
gene loss, transfer, and retention in the 11 lineages of
tracheophytes. Although Martin et al. (2002) have
done a similar evolutionary analysis for deeper
Fig. 3. Relative branch lengths (based on Ka values) of monocots groups. Fig. 1B is focused on the tracheophyte line-
and dicots, using Marchantia (A), Psilotum (B), and Pinus (C) as
ages using Marchantia as the outgroup with addi-
outgroup, respectively. Gray branches are with Aratbidopsis, Spi-
nacia, and Nicotiana excluded (for their slower rates). Branch tional 5 core eudicots and 1 fern.
lengths are Ka values per 100 sites. D Rooted NJ tree of the core Compared with the cp genome of bryophyte
eudicot lineages based on Ks values, using the three grasses as (Marchantia), those of tracheophytes have lost three
outgroup. Italicized numbers denote bootstrap percentages (boot- genes: two transporters (cysA and cysT) and one with
strap test before the slash, interior branch test after the slash).
unknown function (cys66) (Fig. 1B). During trache-
Branch lengths are Ks values per 100 sites.
ophyte evolution, the fern (Psilotum) and angiosperm
lineages have parallel losses of three chlorophyll
biosynthesis genes, chlB, chlB, and chlN. The seed
against sequence divergence at the first two codon plant lineage has commonly transferred the rpl21
positions were nearly linear (Fig. 2A). In contrast, the (Martin et al. 2002 and references cited therein) to its
curves at the third codon position revealed a signifi- nucleus. In the spinach and Arabidopsis lineages that
cant trend toward asymptotic saturation (Fig. 2B), gene, however, has been unusually replaced by
indicating that substitutions at the third codon posi- a nuclear RPL21c gene of mitochondrial origin
tion are saturated and not suitable for inferring (Martin et al. 1990; Gallois et al. 2001).
phylogenetic relationships among the sampled taxa Within seed plants, the conifer (Pinus) lineage has
or for dating purposes. For this reason, we used only uniquely lost all 11 NADH dehydrogenase subunit
the NJ tree based on the Ka values. genes (ndhA–K; 4 are completely missing and 7 are
Because Fig. 1B did not resolve the relationships pseudogenes [Wakasugi et al. 1994]) but has gained a
among sampled eudicots, we reconstructed a phylo- new gene of unknown function, ycf68 (Martin et al.
genetic tree of eudicots using the three monocots as 2002). The angiosperm lineage has further lost two
the outgroup. The NJ tree based on the Ks values genes, one, psaM, involved in the photosynthetic light
yielded a reasonable topology (i.e., in agreement with reaction and the other, ycf12, of uncertain function.
the phylogenetic relationships of the orders of flow- Within angiosperms the three grasses have lost
ering plants compiled by APG [1998]) for the sampled three genes: one metabolism related, accD (Hiratsuka
six eudicots (Fig. 3A), whereas the NJ tree based on et al. 1989; Maier et al. 1995; Ogihara et al. 2002), and
Ka values did not (data not shown). Based on the two genes of unknown function, ycf1 and ycf2. How-
431

Table 4. Estimates of the monocot–dicot divergence and the age of core eudicots based on the constant rate method

Outgroup Calibration event (fossil dates; Myr) Ka Rateb (·10)9) Ka Time (Myr)

Monocot–dicot divergence
Marchantia C1: Fern–seed plant divergence (400–420) Ka: 18.03 ± 0.22 0.215–0.225 Ka: 9.30 ± 0.24 206 ± 5–217 ± 6
Psilotum C2: Conifer–angiospenn divergence (280–310) Ka: 14.40 ± 0.29 0.232–0.257 Ka: 9.28 ± 0.34 180 ± 7–200 ± 7
Pinus C3: Maize–wheat divergence (50–60) Ka: 1.97 ± 0.23 0.164–0.197 Ka: 9.35 ± 0.29 237 ± 5–285 ± 9
Origin of core eudicots
Pinus C3: Maize–wheat divergence (50–60) Ka: 1.97 ± 0.23 0.164–0.197 Ka: 6.08 ± 0.26 154 ± 7–185 ± 8
Monocots C3: Maize–wheat divergence (50–60) Ks: 12.10 ± 0.11 1.008–1.210 Ks: 36.06 ± 0.51 149 ± 2–181 ± 3
a
K denotes the number of substitutions per 100 synonymous (Ks) or nonsynonymous (Ka) sites between pair of taxa or groups.
b
Rate (r) is defined as the number of substitutions per site per year, r = K/(2T) (Li and Grauer 1991).

ever, we found that the grass lineage has also recruited tutions per nonsynonymous site per year, respec-
nine novel genes; one of them, ycf68, is shared with the tively. Clearly, these three calibrated Ka rates are
pine lineage, and the remaining eight, ycf69–76, are unequal, differing from one another by from 8%
unique. Functions of these genes are not known yet [(0.232 ) 0.215)/0.215] to nearly 42% [(0.232 ) 0.164)/
and they have no detectable homology to prokaryotic 0.164], and the conifer–angiosperm’s Ka rate is the
genes (Martin et al. 2002). Except for spinach, all highest.
sampled eudicots have lost the translational initiation
factor 1 (infA). According to an extensive survey of Dates of the Monocot–Dicot Divergence and the
more than 300 diverse angiosperms by Millen et al. Origin of Core Eudicots
(2001), the infA gene of the cp genome has repeatedly
become defunct in about 24 separate angiosperm lin-
eages, including almost all rosid species. Molecular Clock or Rate Constancy Method. The
date of the monocot–dicot divergence was estimated
by applying the equation T = K/(2r). As indicated in
Nucleotide Substitution Rates Table 4, based on the entire dataset and the calibra-
tion points C1, C2, and C3, three time estimates for
Before applying molecular calibration, we assessed the monocot–dicot divergence, 206 ± 5–217 ± 6, 180
the assumption of rate constancy. Fig. 1B shows that ± 7 –200 ± 7, and 237 ± 5–285 ± 9 Myr, were
the branches from the calibration point C1 leading to obtained. These estimates suggest that the monocot–
the Psilotum (fern) lineage and the Pinus lineage are dicot divergence took place 220 ± 40 Myr ago.
not equal in length. The NJ trees in Figs. 3B, C, and Using either the Ka or the Ks rates of the maize–
D, using Marchantia, Pinus, and Psilotum as the wheat divergence and the mean Ka values (see node B
outgroup, respectively, also indicate that the Ka rates in Fig. 1B and Fig. 4) of all six eudicots or the Ks
in the monocot and the dicot lineages are unequal. values between the rosid clade and the asterid +
The monocot lineage has evolved faster than the di- Caryophyllales clade (see node B in Fig. 3A), the
cots, by 39.6, 37.3, and 32.3%, respectively, for the divergence for core eudicots was estimated to be 154
three outgroups. In Fig. 1B the branches from node ± 7–185 ± 8 and 149 ± 2–181 ± 3 Myr ago (Table
A leading to Arabidopsis, Spinacia, and Nicotiana are 4), respectively. These two estimates are close to each
strikingly shorter than those leading to the other other, and their average is 170 Myr ago.
three dicots and the monocots. Tajima’s relative rate
test using rice, Marchantia, Psilotum, and Pinus as Li–Tanimura Method. Figure 4 was simplified
outgroups, respectively, confirmed this observation from the phylogenetic tree Fig. 1B with all branch
(all p’s < 0.001). However, exclusion of the above lengths indicated. The branch length of core eudicots
three slower dicot lineages (gray lines in Figs. 3B–D) was calculated as the mean length of the branch
led to even higher estimates of divergence dates (data leading from their emergence point (node B) to the six
not shown). We therefore used the entire dataset. core eudicots. We then used the Li–Tanimura meth-
By applying the equation, r = K/(2T), where K is od, which uses lineages in which the molecular clock
the distance and T is the divergence time between the holds better than the others, to estimate the diver-
two taxa compared, nonsynonymous rates were cal- gence time at points A and B. For example, we know
ibrated and are shown in Table 4. Based on the three that the branching date for Pinus (node C2) is 280–
divergence events, C1, C2, and C3, and the dataset 310 Myrs ago and want to estimate the branching
with all six dicots, the Ka rates are 0.215–0.225 · 10)9, dates between the monocot and the dicot lineages.
0.232–0.257 · 10)9, and 0.164–0.197 · 10)9 substi- The distances from node C2 to Pinus, monocots, and
432

Table 5. Ages of nodes (Myr) inferred from the phylogenetic tree


in Fig. 3 using the Li–Tanimura method (1987)

Node C1 (400–420 Myr) C2 (280–310 Myr)

C1 —a 380–421
C2 295–309 —
A l44–151 137–152
B 110–115 104–115
a
Nonapplicable.

genomes are unequal among the six eudicot line-


ages, between the two angiosperm lineages (i.e.,
monocots and dicots), and among the tracheophyte
lineages (i.e., all sampled seed plants and a fern,
Psilotum). These observations were confirmed by
Tajima’s relative rate test using Marchantia as the
Fig. 4. A phylogenetic tree simplified from Fig. 1B. Nodes and
outgroup and the first two coding positions (data
lineages correspond to those in Fig. 1B. C1 was used to estimate the not shown).
divergence dates of C2, the monocot–dicot divergence (node A), Using 40 cp proteins, Goremykin et al. (1997)
and the origin of core eudicots (node B) (refer to Table 5 and the found that the average substitution rates (equiva-
text for detail). The number on each branch is the Ka value per 100 lent to the Ka rate) along the branches from the
sites.
common node (equivalent to node C1 in our
Fig. 1B) of seed plants to Nicotiana and to Pinus
dicots are 6.05, 9.02 (=3.91 + 4.22 + 0.89), and 7.66 were quite similar. However, our cp genomic data
(=3.91 + 0.90 + 2.85), respectively. Since the (Fig. 1B) suggest that the former branch is signifi-
monocot lineage has a longer branch length than do cantly longer than the latter when Psilotum is
the Pinus and dicot lineages, it is not used. Based on used as the outgroup (Tajima’s relative rate test:
the branch length of the dicot lineage, the monocot– p < 0.001).
dicot divergence (node A in Fig. 1B and Fig. 4) was As revealed in Fig. 1B and Figs. 3B–D, the Ka
estimated to be 137–152 Myr ago, which is derived rate in the grass lineage has evolved much faster
from (280 or 310) · (0.90 + 2.85)/7.66; the origin of than in the dicots. In addition, Fig. 1B (Ka rate)
core eudicots (node B in Fig. 1B and Fig. 4) was and Fig. 3A (Ks rate) also indicate that among the
estimated as 104–115 Myr ago, which is calculated six annual dicots sampled, the Nicotiana and Spi-
from (280 or 310) · 2.85/7.66. As shown in Fig. 4 the nacia evolved more slowly than the rest. Extensive
distance from node C1 to dicots is 10.4 (=2.74 + rate variation among annual plants has also been
3.91 + 3.75), and we assume that the molecular clock observed in other cp genes at nonsynonyrnous sites
along the dicot lineage is approximately constant. (Wolfe et al. 1987, 1989). Generally, there is a
Similarly, using the branching date of Psilotum, 400– resonance of Bousquet et al.’s observation (1992)
420 Myr ago (node C1 in Fig. 1 and Fig. 4), the on the rbcL gene of seed plant lineages. They
monocot–dicot divergence (node A in Fig. 1 and Fig. found that the annual form evolved more rapidly
4) was estimated to be (400 or 420) · (0.90 + 2.85)/ on average than the perennial form (represented in
10.4 = 144–151 Myr ago, and the origin of core our study by Psilotum and Pinus) and that the
eudicots was estimated to be (400 or 420) · 2.85/10.4 grass family has the fastest evolution rate. Com-
= 110–115 Myr ago. Table 5 shows that these dates paring the rbcL and ndhF loci in the grass family,
are highly close to those estimated from C2. Com- Gaut et al. (1997) found that at Ka sites rate var-
bining estimates calibrated from both C1 and C2, we iation was not correlated between those two plastid
estimated that monocots and dicots diverged at 140– loci. Most recently, examining the whole cp ge-
150 Myr ago and the core eudicot lineages originated nomes (106 genes) of maize, rice, and wheat,
100–115 Myr ago. Matsuoka et al. (2002) also found variation in Ka
rates. The Ka rate variation seems to correlate well
Discussion with the evolutionary divergence pattern depicted
in the cp genomic NJ tree (Fig. 1B), which shows
Rate Variation Among Tracheophyte Lineages that the angiosperm lineage has evolved faster than
the gymnosperm lineage and that the latter in turn
Our phylogenetic analyses (Figs. 1B, 3, and 4) in- has evolved more rapidly than the fern lineage.
dicate that nonsynonymous substitution rates of cp Since the number of completely sequenced cp ge-
433

nomes is quickly increasing, this trend may be re- Comparison of Estimates from the Molecular Clock
tested soon. and Li–Tanimura Methods
Significant rate variation in the cp genomes of the
tracheophyte lineages is also consistent with the fin- Tables 4 and 5 show that the dates of the monocot–
ding of P. S. Soltis et al. (2002), who studied one dicot split and the origin of core eudicots estimated
nuclear and three plastid genes using MP analyses. In by the rate constancy and Li–Tanimura methods
summary, the molecular clock hypothesis does not differ greatly, with estimates from the former method
hold for the Ka rates among the cp genomes of tra- predating the latter by 50 Myr. Estimates calibrated
cheophyte plants. from nodes C1 and C2 using the molecular clock
method vary more than those obtained from the Li–
Tanimura method.
Reference Fossil Dates and the Phylogenetic Tree In the rate constancy method we used the arith-
Obtained metic mean of all pairwise Ka distances between
monocots and dicots to estimate the divergence date
In the Data and Methods section we have carefully (Table 4). As a result, the obtained dates for the
cross-examined the three fossil dates by adopting monocot–dicot split (220 Myr) and for the origin of
updated phylogenies and documented fossil records. the core eudicots (170 Myr) appear to be severely
Bremer (2000, pp 4709, 4710) suggested that in overestimated because the three high-rate grasses
phylogenetic dating rate calibration rather than were included in the distance calculation. This was
unequal substitution rates is the major source of also the case in most previous estimates (Wolfe et al.
error and is behind the discrepancies in earlier es- 1989; Martin et al. 1989, 1993; Brandl et al. 1992;
timates of monocot and flowering plant evolution. Laroche et al. 1995; Yang et al. 1999), which not only
Indeed, in Table 4 the three calibrated rates based used the molecular clock hypothesis but also included
on the molecular clock are discrete, and the ob- one (maize) or several fast-evolving grass species (or
tained dates for monocot–dicot divergence do not annual Liliales [such as Ramshaw 1972]).
agree with one another. To evaluate if the fossil Together with the preceding age estimates for the
dates and the cp Ka rates corresponded well with monocot–dicot split and the origin of core eudicots,
each other with respect to the two dating methods, we concluded that the Li–Tanimura method can
we also used the divergence rates and the Ka dis- substantially reduce the effect of rate variation among
tances (Table 4) from the fern–seed plant and con- lineages and provide an estimate more in line with
ifer–seed plant splits to date the other’s divergences. known fossil data.
The rate constancy method led to an estimate of
350–390 Myr ago for the former event and 320–335
Myr ago for the latter. These two estimates differ Comparison of Our Estimates with Previous Estimates
widely from the fossil records. In contrast, the di-
vergence times (Table 5) of these two events esti- Since our estimates based on the rate constancy
mated from the Li–Tanimura method are highly method seem unreliable, we shall compare only esti-
compatible with the paleobotanical data. mates obtained from the Li–Tanimura method with
Sanderson and Doyle (2001) proposed that (1) those from other methods. Goremykin et al. (1997)
biases in the data or the statistical estimation used a very similar framework of the Li–Tanimura
method used, (2) variation in rate across sites which method (1987) and claimed that their approach is
‘‘causes sequence divergences to be estimated in- ‘‘independent of the rate fluctuation on the grass
correctly,’’ and (3) incorrect phylogenies are the (high rate) and Marchantia (low rate) branches.’’
underlying sources of error in molecular dating. P. They estimated the divergence time between the Zea–
S. Soltis et al. (2002) added that ‘‘inadequate sam- Oryza lineage and the Nicotiana lineage to be 160
pling of taxa...can compound the problem.’’ The Myr ago, which predates ours by about 10–20 Myr
same concern could be raised about the results we (Table 5). Based on our cp genome data (Figs. 1B and
present here. However, the effect of these problems 3A), the Nicotiana lineage has the slowest Ka and Ks
is likely to have been considerably reduced by the rates among the sampled dicots but was used as half
sampling of 12 evolutionary successive land plants of the denominator by Goremykin et al. (1997) in
(Table 2; including all three living subclasses of estimating the monocot–dicot divergence time. Since
eudicots), the use of 61 genes (>39,000 bp long) our estimates of the dates for the monocot–dicot
with different functions from the complete cp ge- lineage and the origin of core eudicots were based on
nomes, and the highly reliable NJ tree (Fig. 1B), the mean branch length of six dicots, our data should
which is consistent with the NJ tree of Goremykin be more reliable than data based on single species.
et al. (1997), inferred from concatenating 14,295 In order to reduce the effects of unequal rates,
amino acids of cp genomes. Bremer (2000) used the mean branch lengths from a
434

group of terminal taxa to their common node (which sample: Oenothera), and 83 Myr for the Caryophyll-
has a known fossil age) for calculating the change ales clade (represented by our sample: Spinacia)
rate (distance/age by Bremer’s definition). Using the (Magallón et al. 1999). In addition, our estimate for
rbcL gene, the MP tree of monocots, and the eight the age of core eudicots is reasonably shorter than the
reference nodes with known fossil dates, Bremer fossil age of a basal eudicot, Tetracentraceae, from
(2000) estimated the split between Acorus, presuma- the Barremian (110–118 Myr ago [Magallón et al.
bly the basalmost extant monocot (APG 1998; Chase 1999]). Collectively, our cp genomic data indicate
et al. 2000), and the remaining monocots at 134 Myr that the core eudicots’ age is also older than known
ago. According to the integrated and widely used fossil records indicate.
phylogenetic tree for the orders of flowering plants
(APG 1998), the separation of the monocot lineage Conclusions
from the other magnoliids predated the branching-off
of eudicots. Therefore, Bremer’s estimate is compat- We observed significant Ka rate variation in cp ge-
ible with ours for the monocot–dicot split (140–150 nome data among major tracheophyte lineages.
Myr ago) and the core eudicot divergence (100–115 Therefore, the rate constancy method is not appro-
Myr ago). priate for dating the divergence between monocots
Using a single gene (rbcL) and the NPRS meth- and dicots or the age of eudicots, especially if fast-
od, two genes (plus 18S rRNA) and maximum evolving monocots are included. Using cp genome
likelihood analyses, and the calibration date of data, we demonstrated that the Li–Tanimura method
Marchantia (450 Myr ago), Sanderson (1997) and gives estimates that better reflect the known evolu-
Sanderson and Doyle (2001) estimated that the age tionary sequence of tracheophyte lineages and cor-
of crown angiosperms originated 160 and 140–190 respond well with the fossil records of calibration
Myr ago, respectively. Combining a three-gene da- points we used.
taset (rbcL, atpB, and 18S rRNA), the NPRS Combining our estimates calibrated by two known
method, and the split between Fagales and Cu- fossil nodes and the Li–Tanimura method, we pro-
curbitales (84 Myr ago), Wikström et al. (2001) pose that the monocot lineage branched off from
proposed the origin of the extant angiosperms to be dicots 140–150 Myr ago, in the late Jurassic to early
158–179 Myr old and that of eudicots to be 131–147 Cretaceous, and that the core eudicots radiated 100–
Myr old. These estimates are in good agreement 115 Myr ago, between the Albian and the Aptian of
with ours, as the dicots we sampled are all eudicots. Cretaceous. These estimates are in accordance with
Recent multigene analyses of angiosperm evolution those of Sanderson (1997) and P. S. Soltis et al.
have revealed that the monocot–dicot divergence (2002), who analyzed one to three genes and used
was preceded by five living basal dicot lineages, the MP and ML branch lengths with the NPRS method.
Amborellaceae, the Nymphaeales, and a group in- In summary, methods that accommodate unequal
cluding Illiciaceae, Trimeniaceae, and Austrobailey- rates give smaller estimates than the rate constancy
aceae (i.e., the so-called ANITA group) (Qiu et al. method and appear to agree well not only with one
1999; PS Soltis et al. 1999; DE Soltis et al. 2000; see another, but also with the recently documented fossil
Fig. 1A), and an extinct basal angiosperm lineage, evidence.
the Archaefructaceae (Sun et al. 2002). Therefore, Our results confirm all previous conclusions that
previous estimates for angiosperms’ origin based on molecular data indicate a pre-Cretaceous origin for
the monocot–dicot split have underestimated the age angiosperms, but our estimates for the monocot–
of angiosperms themselves. The above authors’ es- dicot divergence postdate previous estimates based
timates are consistent with ours if we postulate that on the molecular clock hypothesis by at least 50 Myr
approximately 20 (=160 ) 140) to 40 (=190 ) 150) (=200–150 Myr ago).
Myr separates the angiosperm origin and the split
between the ancestors of the monocot and eudicot
lineages. Acknowledgments. We thank Robert Friedman for critical com-
ments on an early version of the manuscript and Yoshihiro
Our estimated date for the origin of core eudicot
Matsuoka and Shu-Shin Wu for help with the gene group assign-
lineages is 100–115 Myr ago (Table 5). This is earlier ment for the three grasses and other taxa. We also thank the two
than the many documented fossil-based estimates for reviewers’ critical and valuable comments and suggestions. This
core eudicots, such as a possible Rhamnaceae/Rosa- work was supported in part by National Science Council Grant
ceae (both are rosids, represented here by our sam- NSC912311B001103, and Academia Sinica Grant IB91 to S.M.C.,
and NIH Grant GM30998 to W.H.L.
ples: Lotus, Medicago, Arabidopsis, and Oenothera)
from the early Cenomanian (94–97 Myr [Basinger
and Dilcher 1984]), 89 Myr for the Caparales (rep- Appendix
resented by our sample: Arabidopsis), 84 Myr for
Myrtales (Magallon et al. 1999) (represented by our Appendix Table A1 continues on next page.
Table A1. Names and sizes of protein-coding genes in the chloroplast genomes of the 12 species (Table 2) analyzed in this study

Taxon

Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera
b c
accD 951 933 966 — — — 1,506 1,512 1,467 1,569 1,539 1,317
(ORF316)
atpA 1,524 1,527 1,485 1,515 1,524 1,524 1,533 1,536 1,524 1,296 1,524 1,518
atpB 1,479 1,479 1,479 1,497 1,497 1,497 1,497 1,497 1,497 1,497 1,497 1,497
atpE 408 402 414 414 414 414 402 402 399 405 402 402
atpF 555 555 555 552 543 552 555 552 555 555 555 555
atpH 246 246 246 246 246 246 246 246 246 246 246 246
atpI 747 747 747 744 744 744 744 738 750 744 744 744
ccsA 963 933 963 969 966 966 972 972 987 972 942 960
(ORF320) (ORF320) (ycf5) (ORF321) (ORF321) (ycf5) (ycf5) (ycf5) (ycf5) (ycf5) (ycf5)
cemA 1,305 1,350 786 693 693 693 690 690 690 702 690 690
(ORF434) (ycf10) (ORF261) (ORF230) (ycf10) (ORF229) (ycf10)
chlB 1,542 — 1,533 — — — — — — — — —
(ORF513)
chlL 870 — 876 — — — — — — — — —
(frxC)
chlN 1,398 — 1,404 — — — — — — — — —
(108667–110064)
chlP 612 597 591 651 651 651 591 588 591 591 591 750
(ORF203) (ORF216)
cysA 1,113 — — — — — — — — — — —
(mbpX)
cysT 867 — — — — — — — — — — —
(ORF288)
infA 237 243 237 342 324 324 — — — 177 Wd —
matK 1,113 1,512 1,548 1,629 1,629 1,635 1,527 1,521 1,581 1,518 1,530 1,539
(ORF370i) (ORF542)
ndhA 1,107 1,116 — 1,089 1,089 1,089 1,092 1,077 1,083 1,098 1,092 1,092
(ndh1)
ndhB 1,504 1,491 Ye 1,533 1,533 1,533 1,164 1,164 1,164 1,533 1,533 1,533
(ndh2)
ndhC 363 363 Y 363 363 363 363 363 363 363 363 363
(ndh3)
ndhD 1,500 1,545 Y 1,503 1,503 1,503 1,494 1,494 1,503 1,383 1,503 1,503
(ndh4)
ndhE 303 321 Y 306 306 306 306 306 306 306 306 306
(ndh4L)
ndhF 2,079 2,223 — 2,220 2,205 2,217 2,244 2,235 2,241 2,229 2,223 2,211
(ndh5)
ndhG 576 567 — 531 531 531 531 531 531 531 531 531
(ORF191)
435

Continued
Table A1. Continued
436

Taxon

Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera

ndhH 1,179 1,182 Y 1,182 1,182 1,182 1,182 1,182 1,182 1,182 1,182 1,182
(ORF392) (ORF393)
ndhI 552 498 Y 543 537 543 486 486 519 513 504 498
(frxB) (ORF178)
ndhJ 510 477 — 480 480 480 477 477 477 477 477 477
(ORF169) (ORF480)
ndhK 732 624 Y 738 741 747 693 684 678 846 744 744
(psbG) (psbG) (psbG) (psbG)
petA 963 966 960 963 963 963 963 963 963 963 963 957
petB 648 648 648 648 648 705 648 648 648 648 636 648
petD 483 483 543 483 483 483 483 483 483 483 483 483
petE 114 114 114 114 114 114 114 114 114 114 114 114
(ORF37) (petG) (petG) (petG) (petG) (petG) (petG) (petG) (petG) (petG)
petL 96 96 189 96 96 96 96 96 96 96 96 96
(ORF31) (ORF62b) (ORF31) (ORF31) (ycf7) (ORF31)
petN 90 90 90 90 90 90 90 90 90 90 90 90
(5168–5257) (ORF29) (ycf6) (ORF29) (ORF29) (ycf6) (ycf6) (ycf6) (ycf6) (ycf6) (ycf6)
psaA 2,253 2,253 2,262 2,253 2,253 2,253 2,253 2,277 2,253 2,253 2,253 2,256
psaB 2,205 2,205 2,205 2,205 2,205 2,208 2,205 2,205 2,205 2,205 2,205 2,205
psaC 246 246 246 246 246 246 246 246 246 246 246 246
(frxA)
psaI 111 111 159 111 111 111 105 105 114 102 111 105
(ORF36b) (ORF36)
psaJ 129 129 135 129 135 129 135 135 135 135 135 132
(ORF42b) (ORF44)
psaM 99 99 93 — — — — — — — — —
(ORF32)
psbA 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062
psbB 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527
psbC 1,422 1,386e 1,422 1,422 1,422 1,422 1,422 1,422 1,422 1,422 1,386 1,422
(1422)
psbD 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062
psbE 252 252 252 252 252 252 252 252 252 252 252 252
psbF 120 120 120 120 120 120 120 120 120 120 120 120
psbH 225 225 228 222 222 222 222 219 222 222 222 222
(ORF74)
psbI 111 111 158 111 111 111 111 111 111 111 111 111
(ORF36a) (8398–8508)
psbJ 123 123 123 123 123 123 123 123 123 123 123 123
(ORF40) (ORF40)
psbK 168 177 180 186 186 186 186 186 180 180 186 180
(ORF55) (ORF98)
psbL 117 117 117 117 117 117 117 117 117 117 117 117
(ORF38)
psbM 105 105 114 105 105 105 105 105 105 105 105 105
(ORF34)
psbN 132 132 132 132 132 132 132 132 132 132 132 132
(ORF43)
psbT 108 99 108 117 108 102 102 108 102 102 105 108
(ORF35) (ORF35) (ORF33)
rbcL 1,428 1,428 1,428 1,434 1,434 1,431 1,428 1,428 1,440 1,428 1,434 1,428
rpl2 612 834 831 822 822 822 825 792 825 819 825 825
(ORF203)
rpl14 369 369 369 372 372 372 369 369 369 366 372 369
rpl16 432 423 405 411 411 411 408 408 408 408 405 408
rpl20 351 345 360 360 360 360 366 360 354 387 387 393
rpl21 351 390 — — — — — — — — — —
rpl22 360 351 429 447 450 447 — — 483 600 468 414
rpl23 276 273 276 282 282 282 282 282 282 W 282 282
(ORF42)
rpl32 210 183 213 192 192 180 153 180 159 174 168 156
(ORF69) (ORF63)
rpl33 198 198 207 201 201 201 201 201 201 201 201 201
rpl36 114 114 114 114 114 114 114 114 114 114 114 114
(secX)
rpoA 1,023 1,023 1,008 1,020 1,014 1,020 1,002 1,002 990 1,008 1,014 1,104
rpoB 3,198 3,201 3,228 3,321 3,228 3,228 3,213 3,213 3,219 3,213 3,213 3,219
rpoC1 2,055 2,025 2,091 2,052 2,049 2,052 2,049 2,061 2,043 2,034 2,067 2,040
rpoC2 4,161 4,227 3,675 4,440 4,542 4,584 3,999 4,145 4,031 4,086 4,179 4,161
rps2 708 720 705 711 711 711 711 711 711 711 711 711
rps3 654 663 654 720 720 675 657 636 657 657 657 657
rps4 609 600 597 606 606 606 606 606 606 606 606 612
rps7 468 468 468 471 471 471 468 474 468 468 468 468
rps8 399 399 399 411 411 411 405 405 405 405 405 417
rps11 393 393 393 432 432 432 417 417 417 417 426 435
rps12 372 372 372 369 375 375 372 372 372 372 372 372
rps14 303 303 300 312 312 312 303 303 303 303 303 303
rps15 267 261 267 273 273 237 273 276 267 273 264 264
rps16 — — — 258 189 258 243 — 240 267 258 267
rps18 228 228 303 513 492 513 315 297 306 306 306 306
rps19 279 279 279 282 282 282 279 279 279 279 279 279
ycf1f 3,207 5,112 5,271 — — — Y — 1,032 5,502 1,053 7,305
(ORF1068) (ORF1756) (ORF350)
ycf2g 6,411 6,942 6,165 — — — 6,897 5,658 6,885 6,396 6,843 6,843
(ORF2136) (ORF2054)
ycf3 513e 516 510 513 510 513 381 381 381 498 507 477
(ORF167) (ORF169) (ORF170) (ORF170)
(378)
ycf4 555 555 555 558 558 558 603 576 555 555 555 558
(ORF184) (ORF184) (ORF185) (ORF185)
437

Continued
Table A1. Continued
438

Taxon

Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera

ycf9 189 189 189 189 189 189 189 189 189 189 189 189
(ORF62) (ORF62) (ORF62) (ORF62)
ycf12 102 102 102 — — — — — — — — —
(ORF33) (ORF33)
ycf15 — — — — — 300 204 — 234 192 264 Y
(ORF99) (140818–141021) (ORF77) (90848–91039)
ycf66 408 — — — — — — — — — — —
(ORF135)
ycf68 — — 228 435 402 405 — — — — — —
(ORF75a) (92991–93425) (ORF133) (ORF133)
ycf69 — — — 177 216 177 — — — — 396 —
(124696–124872) (ORF72) (ORF58) (ORF131)
ycf70 — — — 129 270 210 — — — — — —
(14538–14666) (ORF91) (ORF69)
ycf71 — — — 153 249 225 — — — — — —
(80773–80925) (ORF82) (ORF75)
ycf72 — — — 414 414 414 — — — — — —
(81048–81461) (ORF137) (ORF137)
ycf73 — — — 750 750 522 — — — — — —
(83758–84507) (ORF249) (ORF173)
ycf74 — — — 150 330 150 — — — — — —
(94467–94616) (ORF109) (ORF49)
ycf75 — — — — 192 192 — — — — — —
(ORF63) (ORF63)
ycf76 — — — 255 258 258 — — — — — —
(124382–124636) (ORF85) (ORF85)
Total No. genes 87 81 73 84 85 86 77 74 79 79 80 78
Total length 71,509 68,355 60,470 58,095 58,677 58,581 61,908 60,296 63,543 67,839 64,551 70,110
Total No. genes 98 Average length, 63,661 ± 4,764
a
Gene names follow those of Martin et al. (2002) and Swiss-Prot Protein Knowlegebase (2003) and each NCBI accession of a given taxon (refer to Table 2).
b
Gene length (bp) is given after each gene name under each species. Within parentheses are the position ranges (where no annotation was available but a putatively respective gene homologue was
detected using the BLASTX in NCBI), or original gene names, or ORF names in a given taxon, respectively.
c
Absence of the gene in a given chloroplast genome.
d
Pseudogene.
e
The gene length we used was different from the NCBI annotation of a given species due to an earlier stop or longer reading frame detected.
f
Martin et al. (2002) considered that this gene is not related to prokaryotic genes and designated it ycf78.
g
A FtsH-like protein gene designated ycf77 by Martin et al. (2002).
439

References the chloroplast and involved the gain of an intron. EMBO


J 10:3073–4078
APG (Angiosperm Phylogeny Group) (1998) An ordinal classifi- Gaut BS, Muse SV, Clark WD, Clegg MT (1992) Relative rates of
cation for the families of flowering plants. Annal Mo Bot Gard nucleotide substitution at the rbcL locus of monocotyledonous
85:531–533 plants. J Mol Evol 35:292–303
Basinger JF, Dilcher DL (1984) Ancient bisexual flowers. Science Gaut BS, Muse SV, Clegg MT (1993) Relative rates of nucleotide
224:511–513 substitution in the chloroplast genome. Mol Phylogenet Evol
Bousquet J, Strauss SH, Doerksen AH, Price RA (1992) Extensive 2:89–96
variation in evolutionary rate of rbcL gene sequences among Gaut BS, Clark LG, Wendel JF, Clegg MT, Muse SV (1997)
seed plants. Proc Natl Acad Sci USA 89:7844–7848 Comparisons of the molecular evolutionary process at rbcL and
Bowe LM, Coat G, dePamphilis CW (2000) Phylogeny of seed ndhF in the grass family (Poaceae). Mol Biol Evol 14:769–777
plants based on all three genomic compartments: Extant gym- Goremykin VV, Hansmann S, Martin WF (1997) Evolutionary
nosperms are monophyletic and Gnetales’ closest relatives are analysis of 58 proteins encoded in six completely sequenced
conifers. Proc Natl Acad Sci USA 97:4092–4097 chloroplast genomes: Revised molecular estimates of two seed
Brandl R, Mann W, Sprintzl M (1992) Estimation of the monocot– plant divergence times. Pl Syst Evol 206: 337–351
dicot age through t-RNA sequences from the chloroplast. Proc Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2003)
R Soc Lond B 249:13–17 Analysis of the Amborella trichopoda chloroplast genome se-
Bremer K (2000) Early Cretaceous lineages of monocots flowering quence suggests that Amborella is not a basal angiosperm. Mol
plants. Proc Natl Acad Sci USA 97:4707–4711 Biol Evol 20:1499–1505
Chase MW, et al. (1993) Phylogenetics of seed plants: An analysis GPWG (Grass Phylogeny Working Group) (2001) Phylogeny and
of nucleotide sequences from the plastid gene rbcL. Annal Ma subfamilial classification of the grasses (Poaceae). Ann Mo Bot
Bot Gard 80:528–580 Gard 88:373–373
Chase MW, et al. (2000) Higher-level systematics of the mono- Gu Z, Cavalcanti ARO, Chen FC, Bouman P, Li WH (2002) Ex-
cotyledons: An assessment of current knowledge and a new tent of gene duplication in the genomes of Drosophila, nema-
classification. In: Wilson KL, Morrison DA (eds) Monocots: tode, and yeast. Mol Biol Evol 19:256–262
Systematics and evolution. Commonwealth Scientific and In- Hallick RB, Bairoch A (1994) Proposals for the naming of chlo-
dustrial Research Organization, Collingwood, Australia, pp 3– roplast genes. III. Nomenclature for open reading frames
16 encoded in chloroplast genomes. Plant Mol Biol Rep 12:S29–
Chaw SM, Zharkikh HA, Sung HM, Lau TC, Li WH (1997) S30
Molecular phylogeny of extant gymnosperms and seed plant Hart JA (1987) A cladistic analysis of conifers: Preliminary results.
evolution: analysis of nuclear 18S rRNA sequences. Mol Biol J Arnold Arbor 68:296–307
Evol 14:56–68 Herendeen PS, Crane, PR (1995) The fossil history of the mono-
Chaw SM, Parkinson CL, Cheng Y, Vincent TM, Palmer JD cotyledons. In: Rudall PJ, Cribb PJ, Cutler DF, Humphries CJ
(2000) Seed plant phylogeny inferred from all three plant ge- (eds) Monocotyledons: Systematics and evolution. Royal Bo-
nomes: Monophyly of extant gymnosperms and origin of tanic Gardens, Kew, pp 1–21
Gnetales from conifers. Proc Natl Acad Sci USA 97:4086–4091 Hiratsuka J, et al. (1989) The complete sequence of the rice (Oryza
Clegg MT, Gaut BS, Learn GH Jr, Morton BR (1994) Rates and sativa) chloroplast genome: Intermolecular recombination be-
patterns of chloroplast DNA evolution. Proc Natl Acad Sci tween distinct tRNA genes accounts for a major plastid DNA
USA 91:6795–6801 inversion during the evolution of the cereals. Mol Gen Genet
Crane PR (1988) Major clades and relationships in ‘‘higher’’ 217:185–194
gymnosperms. In: Beck CB (ed) Origin and evolution of gym- Hughes NF (1994) The enigma ofangiosperm origins. Cambridge
nosperms. Columbia University Press, New York, pp 218–272 University Press, Cambridge
Crepet WL, Feldman GD (1991) The earliest remains of grasses in Hupfer H, Swiatek M, Hornung S, Hermann RG, Maier RM, Chiu
the fossil record. Am J Bot 78:1010–1014 WL, Sears B (2000) Complete nucleotide sequence of the Oe-
Cronquist A (1988) The evolution and classification of fowering nothera elata plastid chromosome, representing plastome I of
plants, 2nd ed. New York Botanical Garden, Bronx, NY the five distinguishable euoenothera plastomes. Mol Gen Genet
Darwin C, Darwin F, Seward AC (eds) (1903) More letters from 263:581–585
Charles Darwin. D. Appleton, New York Ikeo K, Ogihara Y (2000) Triticum aestivum chloroplast, complete
Delevoryas T, Hope RC (1973) Fertile coniferophyte remains from genome (unpublished)
the Late Triassic Deep River Basin, North Carolina. Am J Bot Katayama H, Ogihara Y (1996) Phylogenetic affinities of the
60:810–818 grasses to other monocots as revealed by molecular analysis of
Doyle JA (1992) Revised palynological correlations of the lower chloroplast DNA. Curr Genet 29:572–581
Potomac Group (USA) and the Cocobeach sequence of Gabon Kato T, Kaneko T, Sato S, Nakamura Y, Tabata S (2000) Com-
(Barremian-Aptian). Cretaceous Res 13:337–349 plete structure of the chloroplast genome of a legume, Lotus
Doyle JA (1998) Molecules, morphology, fossils, and the rela- japonicus. DNA Res 7:323–330
tionship of angiosperms and Gnetales. Mol Phylogenet Evol Kenrick P, Crane PR (1997) The origin and early evolution of land
9:448–462 plants. Nature 389:33–39
Doyle JA, Donoghue MJ (1987) The origin of angiosperms: a Kumar S, Tamura K, Jakobsen IB, Nei M (2001) MEGA 2: Mo-
cladistic approach. In: Friis EM, Chaloner WG, Crane PR (eds) lecular evolutionary genetics analysis software. Arizona State
The origins of angiosperms and their biological consequences. University, Tempe
Cambridge University Press, Cambridge, pp 17–49 Laroche J, Li P, Bousquet J (1995) Mitochondrial DNA and
Gallois JL, Achard P, Green G, Mache R (2001) The Arabidopsis monocot–dicot divergence time. Mol Biol Evol 12:1151–1156
chloroplast ribosomal protein L21 is encoded by a nuclear gene Li WH, Graur D (1991) Fundamentals of molecular evolution.
of mitochondrial origin. Gene 274:179–185 Sinauer Associates, Sunderland, MA
Gantt JS, Baldauf SL, Caline PJ, Weeden NF, Palmer JD (1991) Li WH, Tanimura M (1987) The molecular clock runs more slowly
Transfer of rpl22 to the nucleus greatly preceded its loss from in man than in apes and monkeys. Nature 326:93–96
440

Lin S, Wu H, Jia H, Zhang P, Dixon R, May G, Gonzales R, Roe H, Ozeki H (1986) Chloroplast gene organization deduced from
BA (2000) Medicago truncatula variety Jema Long A-17 chlo- complete sequence of liverwort Marchantia polymorpha chlo-
roplast, complete sequence (unpublished) roplast DNA. Nature 322:572–574
Lockhart PJ, Howe CJ, Barbrook AC, Larkum AWD, Penny D Palmer JD (1985a) Comparative organization of chloroplast ge-
(1999) Spectral analysis, systematic bias, and the evolution of nomes. Annu Rev Genet 19:325–354
chloroplasts. Mol Biol Evol 16:573–576 Palmer JD (1985b) Evolution of chloroplast and mitochondrial
Magallón S, Sanderson MJ (2001) Absolute diversification rates in DNA in plants and algae. In: MacIntyre RJ (ed) Molecular
angiosperm clades. Int J Org Evol 55:1762–1780 evolutionary genetics. Plenum Press, New York, pp 131–240
Magallón S, Crane PR, Herendeen PS (1999) Phylogenetic pattern, Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses
diversity, and diversification of eudicots. Ann Mo Bot Gard identify the three earliest lineages of extant flowering plants.
86:297–372 Curr Biol 9:1485–1488
Maier RM, Neckermann K, Igloi GL, Kossel H (1995) Complete Price RA, Thomas J, Strauss SH, Gadek PA, Quinn CJ, Palmer JD
sequence of the maize chloroplast genome: Gene content, hot- (1993) Familial relationships of the conifers from rbcL sequence
spots of divergence and fine tuning of genetic information by data. Am J Bot 80:172
transcript editing. J Mol Biol 251:614–628 Pryer KM, Schneider H, Smith AR, Cranfill R, Wolf PG, Hunt JS,
Martin W, Gierl A, Saedler H (1989) Molecular evidence for pre- Sipes SD (2001) Horsetails and ferns are a monophyletic group
Cretaceous angiosperm origin. Nature 339:46–48 and the closest living relatives to seed plants. Nature 409:618–622
Martin W, Lagrange T, Li YF, Bisanz-Seyer C, Mache R (1990) Qiu YL, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis
Hypothesis for the evolutionary origin of the chloroplast ri- M, Chen Z, Savolainen V, Chase MW (1999) The earliest an-
bosomal protein L21 of spinach. Curr Genet 18:553–556 giosperms: Evidence from mitochondrial, plastid and nuclear
Martin W, Lydiate D, Brinkmann H, Forkmann G, Saedler H, genomes. Nature 402:404–407
Cerff R (1993) Molecular phylogenies in angiosperm evolution. Rai HS, O’Brien HE, Reeves PA, Olmstead RG, Graham SW
Mol Biol Evol 10:140–162 1 (2003) Inference of higher-order relationships in the cycads
Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M, from a large chloroplast data set. Mol Phylogenet Evol 29:350–
Kowallik KV (1998) Gene transfer to the nucleus and the 359
evolution of chloroplasts. Nature 393:162–165 Ramshaw JAM, Richardson DL, Meatyard BT, Brown RH, Ri-
Martin W, et al. (2002) Evolutionary analysis of Arabidopsis, cy- chardson M, Thompson EW, Boulter D (1972) The time of
anobacterial, and chloroplast genomes reveals plastid phylog- origin of the flowing plants determined by using amino acid
eny and thousands of cyanobacterial genes in the nucleus. Proc sequence data of cytochrome C. New Phytol 71:773–779
Natl Acad Sci USA 99:12246–12251 Renzaglia KS, Johnson TH, Gates HD, Whittier DP (2001) Ar-
Mathews S, Donoghue MJ (1999) The root of angiosperm phy- chitecture of the sperm cell of Psilotum. Am J Bot 88:1151–1163
logeny inferred from duplicate phytochrome genes. Science Rost B (1999) Twilight zone of protein sequence alignments. Pro-
286:947–950 tein Eng 12:85–94
Matsuoka Y, Yamazaki Y, Ogihara Y, Tsunewaki K (2002) Whole Saito N, Nei M (1987) The neighbor-joining method: A new
chloroplast genome comparison of rice, maize, and wheat: im- method for reconstructing phylogenetic trees. Mol Biol Evol
plications for chloroplast gene diversification and phylogeny of 4:406–425
cereals. Mol Biol Evol 19:2084–2091 Sanderson MJ (1997) A nonparametric approach to estimating
Millen RS, Olmstead RG, Adams KL, Palmer JD, Lao NT, Heggie divergence times in the absence of rate constancy. Mol Biol
L, Kavanagh TA, Hibberd JM, Gray JC, Morden CW, Calie Evol 14:1218–1231
PJ, Jermiin LS, Wolfe KH (2001) Many parallel losses of infA Sanderson MJ, Doyle JA (2001) Sources of error and confidence
from chloroplast DNA during angiosperm evolution with intervals in estimating the age of angiosperms from rbcL and
multiple independent transfers to the nucleus. Plant Cell 18S rDNA data. Amer J Bot 88:1499–1516
13:645–658 Sato S, Nakamura Y, Kaneko T, Asamizu E, Tabata S (1999)
Miller Jr CN (1977) Mesozoic conifers. Bot Rev 43:217–280 Complete structure of the chloroplast genome of Arabidopsis
Miller Jr CN (1988) The origin of modern conifer families. In: Beck thaliana. DNA Res 6:283–290
CB (ed) Origin and evolution of gymnosperms. Columbia Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herr-
University Press, New York, pp 448–486 mann RG, Mache R (2001) The plastid chromosome of spinach
Muse SV, Gaut BS (1997) Interlocus comparisons of the nucleotide (Spinacia oleracea): Complete nucleotide sequence and gene
substitution process in the chloroplast genome. Genetics organization. Plant Mol Biol 45:307–315
146:393–399 Shinozaki K, et al. (1986) The complete nucleotide sequence of
Nicholas KB, Nicholas HB Jr (1997) GeneDoc: Analysis and vis- tobacco chloroplast genome: Its gene organization and ex-
ualization of genetic variation. http://www.cris.com/Ketchup/ pression. EMBO J 5:2043–2049
genedoc.shtml Soltis DE, et al. (2000) Angiosperm phylogeny inferred from 18S
Nicholas KJ, Tiffney BH, Knoll AH (1983) Patterns in vascular rDNA, rbcL, and atpB sequendes. Bot J Linn Soc 133:381–461
land plant diversification. Nature 303:614–616 Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny
Nickrent DL, Parkinson CL, Palmer JD, Duff RJ (2000) Multigene inferred from multiple genes: A research tool for comparative
phylogeny of land plants with special reference to bryophytes biology. Nature 402:402–404
and the earliest land plants. Mol Biol Evol 17:1885–1895 Soltis PS, Soltis DE, Savolainen V, Crane PR, Barraclough TG
Ogihara Y, Isono K, Kojima T, Endo A, Hanaoka M, Shiina T, (2002) Rate heterogeneity among lineages of tracheophytes:
Terachi T, Utsugi S, Murata M, Mori N, Takumi S, Ikeo K, Integration of molecular and fossil data and evidence for
Gojobori T, Murai R, Murai K, Matsuoka Y, Ohnishi Y, Tajiri molecular living fossils. Proc Natl Acad Sci USA 99:4430–
H, Tsunewaki K (2002) Structural features of a wheat plastome 4435
as revealed by complete sequencing of chloroplast DNA. Mol Stebbins GL (1981) Coevolution of grasses and herbivores. Ann
Gen Genomics 266:740–746 Mo Bot Gard 68:75–76
Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Stebbins GL (1987) Grass systematics and evolution: Past, present
Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi and future. In: Sonderstrom TR, Hilu KH, Campbell CS,
441

Varkworth ME (eds) Grass systematics and evolution. Wakasugi, T, Tsudzuki J, Ito S, Nakashima K, Tsudzuki T,
Smithsonian Institution Press, Washington, DC, pp 359–367 Sugiura M (1994) Loss of all ndh genes as determined by se-
Stewart WN, Rothwell GW (1993) Paleobotany and the evolution quencing the entire chloroplast genome of the black pine Pinus
of plants, 2nd ed. Cambridge University Press, Cambridge thunbergii. Proc Natl Aad Sci USA 91:9794–9798
Stoebe B, Martin W, Kowallik KV (1998) Distribution and no- Wakasugi, T, Nishikawa A, Yamada K, Sugiura M (2002) Com-
menclature of protein-coding genes in 12 chloroplast genomes. plete nucleotide sequence of the chloroplast genome from a
Plant Mol Biol Rep 16:243–255 fern, Psilotum nudum (unpublished; available from NCBI, ac-
Sun G, Ji Q, Dilcher DL, Zheng S, Nixon KC, Wang X (2002) cession No. AP004638)
Archaefructaceae, a new basal angiosperm family. Science Whitfeld PR, Bottemley W (1983) Organization and structure of
296:899–904 chloroplast genes. Annu Rev Plant Physiol 34:279–310
Swiss-Prot Protein Knowledgebase (2003) List of chloroplast and Wikström N, Savolainen V, Chase M (2001) Evolution of the an-
cyanelle encoded proteins. http://bioinformatics.weizmann. giosperms: Calibrating the family tree. Proc R Soc Lond B
ac.il/databases/swiss-prot/release/plastid.txt, released 28 Feb 268:2211–2220
Swofford DL (1998) PAUP 4.0 b1: Phylogenetic analysis using Willis KJ, McElwain JC (2002) The evolution of plants. Oxford
parsimony (and other methods). Sinauer Associates, Sunder- University Press, New York
land, MA Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substi-
Tajima F (1993) Unbiased estimate of evolutionary distance be- tution vary greatly among plant mitochondrial, chloroplast,
tween nucleotide sequences. Mol Biol Evol 10:677–688 and nuclear DNAs. Proc Natl Acad Sci USA 84:9054–9058
Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the Wolfe KH, Gouy MY, Yang W, Sharp PM, Li WH (1989) Date of
molecular clock and linearized trees. Mol Biol Evol 12:823–833 the monocot–dicot divergence estimated from chloroplast
Taylor TN, Taylor EL (1993) The biology and evolution of fossil chloroplast DNA sequence data. Proc Natl Acad Sci USA
plants, 1st ed. Prentice Hall, Englewood Cliffs, NJ 86:6201–6205
Thomas BA, Spicer RA (1987) The evolution and paleobiology of Yang YW, Lai KN, Tai PY, Li WH (1999) Rates of nucleotide
land plants. Croom Helm, London substitution in angiosperm mitochondrial DNA sequences and
Thomasson JR (1987) Fossil grasses. In: Sonderstrom TR, Hilu dates of divergence between Brassica and the other angiosperm
KH, Campbell CS, Varkworth ME (eds) Grass systematics and lineages. J Mol Evol 48:597–560
evolution. Smithsonian Institution Press, Washington, DC, pp Zhang W (2000) Phylogeny of the grass family (Poaceae) from
1820–1986 rpl16 intron sequence data. Mol Phylogenet Evol 15:135–146

You might also like