Professional Documents
Culture Documents
14, 8605–8613 63 Karita, S., Sakka, K. and Ohmiya, K. (1997) in Rumen Microbes
57 Hayashi, H. et al. (1997) J. Bacteriol. 179, 4246–4253 and Digestive Physiology in Ruminants (Onodera, R. et al., eds),
58 Cornet, P. et al. (1983) Biotechnology 1, 589–594 pp. 47–57, Japan Sci. Soc. Press
59 Lemaire, M. and Béguin, P. (1993) J. Bacteriol. 175, 3353–3360 64 Tamaru, Y. et al. (1997) J. Ferment. Bioeng. 83, 201–205
60 Zverlov, V.V. et al. (1994) Biotechnol. Lett. 16, 29–34 65 Fanutti, C. et al. (1995) J. Biol. Chem. 270, 29314–29322
61 Kirby, J. et al. (1997) FEMS Microbiol. Lett. 149, 213–219 66 Li, X., Chen, H. and Ljungdahl, L. (1997) Appl. Environ.
62 Ohmiya, K. et al. (1997) Biotechnol. Genet. Eng. Rev. 14, 365–414 Microbiol. 63, 4721–4728
C
ompletely sequenced The presence of genes encoding enzymes display of the reaction steps
genomes have provided involved in the citric-acid cycle has been for which genes can be found
a new way of analysing studied in 19 completely sequenced in the selected genomes is
the biochemical pathways in a genomes. In the majority of species, the given in Fig. 1. The first strik-
species: using the presence of cycle appears to be incomplete or absent. ing feature in most of the
genes encoding the enzymes Several distinct, incomplete cycles reflect genomes is the incompleteness
that catalyse its reactions1,2. By adaptations to different environments. of the CAC. Only the four
studying the variation in meta- Their distribution over the phylogenetic largest genomes, those of
bolic pathways and the way tree hints at precursors in the evolution of Escherichia coli, Bacillus sub-
that they are encoded in a the citric-acid cycle. tilis, Mycobacterium tubercu-
rapidly growing set of se- losis and Saccharomyces cere-
quenced genomes, we can elu- M.A. Huynen*, T. Dandekar and P. Bork are in the visiae, and the small genome
European Molecular Biology Laboratory,
cidate their evolution. Here, Meyerhofstrasse of Rickettsia prowazekii, en-
1, 69117 Heidelberg, Germany, and
we present an investigation of in the Max-Delbrück-Centre for Molecular Medicine, code the genes for a complete
the presence and absence of 13122 Berlin-Buch, Germany; CAC. In the other genomes,
genes, in prokaryotes and yeast, M.A. Huynen is also in the Bioinformatics Group, the cycle has gaps or is com-
that code for the enzymes in- Utrecht University, Padualaan 8, 3584 CH Utrecht, pletely absent. In these incom-
The Netherlands.
volved in the citric-acid cycle *tel: 149 6221 387372, plete cycles, the genes that are
(CAC), including variations fax: 149 6221 387517, present generally code for
such as the reductive CAC and e-mail: huynen@embl-heidelberg.de reactions that are connected to
the branched citric-acid path- each other, suggesting there
way, the glyoxylate shunt, and in the reactions con- are functional connections between the genes. In in-
necting the CAC to pyruvate and phosphoenolpyruvate. complete cycles, the last part of the oxidative cycle
Our analysis has combined a thorough exami- (steps 6–8 in Fig. 1a), leading from succinate to
nation of sequence data, which included improving oxaloactetate, is the most highly conserved, whereas
the annotation of genes in the GenBank genome data- the initial steps (steps 1–3), from acetyl CoA to
base, with an analysis of the biochemical data on the 2-ketoglutarate, show the least conservation.
compared species. We examined the genomes of uni- When interpreting the role of incomplete CACs, it
cellular organisms published to date, including those is important to realize that, as well as the oxidation of
of four Archaea, 14 Bacteria and one Eukaryote. For acetyl CoA, the CAC also plays a role in the gener-
an overview of the published genomes, including ation of intermediates for anabolic pathways. Specifi-
references, see http://www.tigr.org/tdb/mdb/ cally, 2-ketoglutarate (between steps 3 and 4), ox-
mdb.html. aloacetate (between steps 8 and 1) and succinyl CoA
(between steps 5 and 6) are starting points for the
Variability of the pathway synthesis of glutamate, aspartate and porphyrin, re-
The genes involved in the CAC and its connections to spectively. The autotrophic species that are missing
pyruvate and phosphoenolpyruvate in the various a small part of the CAC are still able to generate 2-
genomes are indicated in Table 1 and a graphical ketoglutarate, oxaloacetate and succinyl CoA from
0966-842X/99/$ - see front matter © 1999 Elsevier Science. All rights reserved. PII: S0966-842X(99)01539-5
korC – – HP0591 – – – –
Succinyl-CoA–acetoacetate-
CoA-transferase
EC 2.8.3.5 scoA – – HP0691 – yxjD – TB2504
scoB – – HP0692 – yxjE – TB2503
– – – – – – – – YLR304C
AQ1784
– – slr0665 – – – – – –
CT054 – – – – – – – YIL125W
CT055 – – – – – – – YDR148C
CT400
– – – – – – – – –
– – – – – – – – –
– – frdA – – – – – YJL045W
– YKL148C
– – sdhB – – – – – YLL041C
sdhB –
– – – – – – – – YMR118C
– – – – – – – – –
continued
Isocitrate lyase
(glyoxylate pathway) (9)
EC 4.1.3.1 aceA EC4015 – – – – – TB0467
Malate synthase
(glyoxylate pathway) (10)
EC 4.1.3.2 aceB EC4014 – – – – – –
Phosphoenol pyruvate glcB EC2976 – – – – – TB1837
carboxykinase (ATP) (11)
EC 4.1.1.49 pckA EC3403 HI0809 – – pckA – –
Phosphoenol pyruvate
carboxykinase (GTP) (11)
EC 4.1.1.32 pck – – – – – – TB0211
Phosphoenol pyruvate
carboxylase (11)
EC 4.1.1.31 ppc EC3956 HI1636 – – – – –
Malic enzyme (12)
EC 1.1.1.40 mez EC2463 HI1245 – RP373 ytsJ – –
yqkJ
EC 1.1.1.38 sfcA EC1479 – – – malS – TB2332
Pyruvate carboxylase/
oxaloacetate decarboxylaseg
(13)
EC 4.1.1.3 oadB – – – – – – –
EC 6.3.1.4 pycA/ – – – – pycA – TB2967 –
accC
Pyruvate
ferredoxin oxidoreductase
EC 1.2.7.1 porA EC1378 – HP1110 – – – –
a
If a gene numbering system is available, the genes are indicated with their gene numbers, and the initials of the species (TB for M. tuberculosis); otherwise, gene
names are used. Multiple copies of a gene that are a result of recent gene duplications or that do not have separate orthologs in multiple species are in one column.
Genes that align only with part of a larger protein are indicated with a (c) for a carboxy-terminal alignment and an (n) for an amino-terminal alignment.
b
Paralogous gene displacements in the citric acid cycle are shown by a lightly shaded background. Non-homologous gene displacements are indicated with a darker
background.
c
The numbers after the enzyme names correspond to those in Fig. 1.
d
Indicates tentative identifications.
– – – – – – – – YER065C
– – – – – – – – YNL117W
– – – – – – – –
– – – – – – – – YKR097W
– – – – – – – – –
– – – – – – – – –
(a) Fig. 1. (a) A graphical representation of the reactions of the citric-acid cycle (CAC), includ-
12 ing the connections with pyruvate and phosphoenolpyruvate, and the glyoxylate shunt.
11 14 When there are two enzymes that are not homologous to each other but that catalyse the
same reaction (non-homologous gene displacement), one is marked with a solid line and
13 the other with a dashed line. The oxidative direction is clockwise. The enzymes with their
1 EC numbers are as follows: 1, citrate synthase (4.1.3.7); 2, aconitase (4.2.1.3); 3, isoci-
8 trate dehydrogenase (1.1.1.42); 4, 2-ketoglutarate dehydrogenase (solid line; 1.2.4.2 and
2 2.3.1.61) and 2-ketoglutarate ferredoxin oxidoreductase (dashed line; 1.2.7.3); 5, suc-
cinyl-CoA synthetase (solid line; 6.2.1.5) or succinyl-CoA–acetoacetate-CoA transferase
(dashed line; 2.8.3.5); 6, succinate dehydrogenase or fumarate reductase (1.3.99.1); 7,
10 fumarase (4.2.1.2) class I (dashed line) and class II (solid line); 8, bacterial-type malate
dehydrogenase (solid line) or archaeal-type malate dehydrogenase (dashed line)
7 3 (1.1.1.37); 9, isocitrate lyase (4.1.3.1); 10, malate synthase (4.1.3.2); 11, phospho-
9
enolpyruvate carboxykinase (4.1.1.49) or phosphoenolpyruvate carboxylase (4.1.1.32);
12, malic enzyme (1.1.1.40 or 1.1.1.38); 13, pyruvate carboxylase or oxaloacetate decar-
boxylase (6.4.1.1); 14, pyruvate dehydrogenase (solid line; 1.2.4.1 and 2.3.1.12) and
pyruvate ferredoxin oxidoreductase (dashed line; 1.2.7.1). (b) Individual species might
6 4 not have a complete CAC. This diagram shows the genes for the CAC for each unicellular
species for which a genome sequence has been published, together with the phylogeny of
5 the species. The distance-based phylogeny was constructed using the fraction of genes
shared between genomes as a similarity criterion29. The major kingdoms of life are indi-
cated in red (Archaea), blue (Bacteria) and yellow (Eukarya). Question marks represent
reactions for which there is biochemical evidence in the species itself or in a related
species but for which no genes could be found. Genes that lie in a single operon are
shown in the same color. Genes were assumed to be located in a single operon when
they were transcribed in the same direction and the stretches of non-coding DNA separating
them were less than 50 nucleotides in length.
(b)
Methanobacterium
thermoautotrophicum
Rickettsia
prowazekii
Archaeoglobus fulgidus
Haemophilus
influenzae
Escherichia
Borrelia coli
Mycoplasma
burgdorferi genitalium
Pyrococcus Bacillus
Treponema subtilis
Trends in Microbiology
horikoshii
pallidum
Synechocystis icd
Archaeoglobus
Aquifex aeolicus 1512
fulgidus icd
Helicobacter 100
pylori 27
Escherichia coli icd Escherichia coli leuB
Haemophilus
100 influenzae 987
Methanococcus
jannaschii 1596
Fig. 2. A neighbour-joining clustering of isopropylmalate-dehydrogenase genes (leuB) and isocitrate-dehydrogenase genes (icd) in
completely sequenced genomes. The genes with experimentally characterized functions are shown in bold, for the others the
gene number has been indicated. A set of genes from the Archaea and from M. tuberculosis (black) is nearly equidistant from the
bacterial isopropylmalate dehydrogenases (green) and the bacterial isocitrate dehydrogenases (red). Two of the archaeal genes
(MT1388 and AF0628) are located in leucine operons, identifying them and the other archaeal genes in the lower two-thirds of
the tree as isopropylmalate dehydrogenases. Note that the A. fulgidus icd gene, a ‘true’ isocitrate dehydrogenase in the Archaea,
was probably horizontally transferred from the Bacteria to A. fulgidus30. Bootstrap values larger than 90 are indicated.
pyruvate; however, the path from pyruvate to 2-keto- and in the methanogens Methanococcus jannaschii
glutarate varies among the species. The autotrophic and Methanobacterium thermoautotrophicum3. The
Bacteria generate 2-ketoglutarate from pyruvate via pathway in the two Archaea is a particularly interesting
the right branch of the CAC, activating it in the candidate for an intermediate stage in CAC evolution,
oxidative direction (clockwise in Fig. 1a), whereas the as two of the missing enzymes, isocitrate dehydrogen-
methanogenic Archaea3 and Archaeoglobus fulgidus ase and aconitase, have homologues in the leucine-
generate it via the left branch of the pathway, activat- biosynthesis pathway (Fig. 2). Thus, the duplication of
ing it in the reductive direction (counterclockwise in two genes from a single pathway could have completed
Fig. 1a). two gaps in the CAC simultaneously.
The methanogenic lifestyle is presumably very old,
Precursors of the CAC as reflected by its presence throughout the eury-
The variation in the CAC provides clues to its evo- archaeal phylogeny6. The only archaeal isocitrate de-
lutionary origins, as it shows which incomplete cycles hydrogenase found, the one from A. fulgidus, prob-
are feasible in which environments. It has been argued ably arose via a horizontal gene transfer from the
that the CAC evolved originally as two separate path- Bacteria (Fig. 2). This supports the hypothesis that the
ways stemming from pyruvate, with an oxidative last common ancestor of the Euryarchaeota did not
branch leading to 2-ketoglutarate and a reductive have aconitase nor isocitrate dehydrogenase, and that
branch leading to succinyl CoA (Ref. 4). Alternative its CAC was incomplete.
potential intermediate stages are present in Synecho- In the Bacteria, the CAC appears to have been
cystis, which lacks 2-ketoglutarate dehydrogenase5, complete by the time the proteobacteria, the low
G1C and the high G1C Gram-positive organisms, into operons throughout the phylogenetic tree sug-
diverged from each other, as it is complete in at least gests that the lack of conservation of which particular
one species in each of these taxa. The incomplete cycles genes are located in the same operon is rather a result of
in the bacterial pathogens therefore appear to be the the high rate of evolution of the (co)regulation of genes.
result of a secondary loss of genes. Biochemical results
from Aquifex pyrophilus7 and other species from the Gene displacement
root of the bacterial phylogeny suggest that the The analysis of a biochemical pathway based on its
complete CAC in the Bacteria was originally oper- genes assumes that we know at least one example se-
ating in the reductive direction as a pathway for carbon quence for every gene involved in that pathway, to
assimilation8. provide a search image. Non-homologous gene dis-
placement (Box 1) cautions us in this assumption. For
Operon structure example, the fact that Helicobacter pylori does not
Operon organization provides information about the have homologues of the 2-ketoglutarate dehydrogen-
functional relatedness of genes and thus suggests ase of E. coli does not mean that the transformation
pathways. For example, in M. thermoautotrophicum, between 2-ketoglutarate dehydrogenase and succinyl
pyruvate ferredoxin oxidoreductase (step 14 in CoA (step 4 in Fig. 1a) is absent. H. pylori has an
Fig. 1a) and fumarase (step 7) are located in a single alternative enzyme complex for this reaction: 2-keto-
operon, indicating that they are part of the same glutarate ferredoxin oxidoreductase12. Non-homolo-
pathway of acetate assimilation that starts with acetyl gous gene displacement, apart from being interesting
CoA and operates part of the CAC in the reductive from an evolutionary perspective, is also of interest in
direction. Such a metabolic flux has indeed been ob- finding species-specific drug targets. Indeed, 2-keto-
served in methanogens3. The operon organization of glutarate ferredoxin oxidoreductase is thought to be
genes also suggests that Aquifex aeolicus has a com- essential for the viability of H. pylori12.
plete CAC, as there are two operons connecting genes In the CAC of the published genomes, non-homol-
in the first half of the reductive cycle (the ‘left’ part in ogous gene displacement can also be observed for
Fig. 1) to the second half (the ‘right’ part in Fig. 1). malate dehydrogenase, fumarase, pyruvate dehydrog-
However, we cannot confirm the presence of a enase and succinyl-CoA synthetase (Table 1). As is
2-ketoglutarate ferredoxin oxidoreductase (step 4) in the case for 2-ketoglutarate dehydrogenase, the func-
A. aeolicus. tion of pyruvate dehydrogenase is taken over by a
It has been argued that, as there is little conser- ferredoxin oxidoreductase, in this case pyruvate ferre-
vation of operon structure in prokaryotic evolution doxin oxidoreductase (POR). Of the two types of
beyond genes of proteins that interact physically9,10 malate dehydrogenase, the second was first observed
(Table 1), its role in the (co)regulation of genes is in Archaea, but it does, however, have orthologues in
rather limited11. The organization of genes of the CAC E. coli, Haemophilus influenzae and B. subtilis
(Table 1). Two non-homologous classes of fumarase
have been distinguished. In the genomes analyzed
Box 1. Glossary here, Class I is present in E. coli and in the Archaea,
whereas Class II are only observed in the Bacteria and
Gene relationships
Homologous: Genes are homologous if they evolved from a common S. cerevisiae. A succinyl-CoA–acetoacetate-CoA
ancestor. transferase has been shown to be active in H. pylori 13,
Orthologous: Genes are orthologous if their divergence reflects a where it takes over the role of succinyl-CoA synthetase
speciation event31. For example, a globin in humans is orthologous in the CAC.
to a globin in chimpanzees. Orthology is used for function predic- Non-homologous gene displacement of subunits of
tion as orthologous genes are, in comparison to paralogous enzyme complexes also occurs. There are two non-
genes, relatively likely to perform the same function. An orthologous homologous E1 subunits of the pyruvate dehydro-
relationship between two genes does not guarantee, however, that genase complex. One type is found in Gram-positive
they have the same function. bacteria and eukaryotes, and the second is present in
Paralogous: Genes are paralogous if their divergence reflects a
E. coli, H. influenzae and M. tuberculosis. A fre-
gene duplication event within a species31. For example, a-globin in
humans is paralogous to b-globin in humans. quently occurring type of gene displacement takes
place in non-catalytic subunits in, for example,
Gene movement fumarate reductase. Depending on the species, the fu-
Gene displacement: Occurs when different genes code for proteins marate reductase protein complex is anchored to the
that perform the same function. membrane by either two subunits (e.g. in E. coli) or
Non-homologous gene displacement: Occurs if the genes displac- one subunit (e.g. in B. subtilis), or can even be cyto-
ing each other are not homologous. plasmatic and fused to a heterodisulphide reductase, as
Non-orthologous gene displacement: Includes both paralogous in M. thermoautotrophicum14. Furthermore, mem-
and non-homologous gene displacement32. brane-binding subunits are not necessarily ortholo-
Orthologous gene displacement: Occurs if orthologous genes dis-
gous across the species (e.g. compare E. coli with
place each other by, for example, horizontal gene transfer from
one genome to another. A. fulgidus). The fusion of orthologous catalytic
Paralogous gene displacement: Occurs if the genes displacing domains with proteins that are non-homologous to
each other are paralogous. each other points to the embedding of the reactions of
the CAC in different cellular contexts.
For example, the genes for pyruvate carboxylase have not been identified in the H. pylori genome17.
(pycB) and oxaloacetate decarboxylase (oadA) are In Mycoplasma genitalium and Mycoplasma pneu-
orthologues of each other. Their occurrence in the moniae, by contrast, only malate-dehydrogenase ac-
genome in conjunction with either oxaloacetate tivity has been detected18. The gene that could be
decarboxylase b chain (encoded by oadB) or biotin responsible for this activity, MG460, was classified
carboxylase (encoded by accC) indicates the flux of originally as a lactate dehydrogenase but might have
the reaction they catalyse. The genomes of the hetero- a dual substrate specificity19.
trophic species Treponema pallidum and Pyrococcus In A. pyrophilus, a close relative of A. aoelicus, all
horikoshii contain oadB but not accC, indicating a enzyme activities of a complete, reductive TCA have
catabolic flux (i.e. decarboxylating), whereas the been reported7. By contrast, in A. aoelicus, no suit-
autotrophic species in Table 1 contain accC, indi- able candidate for 2-ketoglutarate ferredoxin oxi-
cating an anabolic flux. In A. fulgidus, which carries doreductase has been found. The enzymes that have
both the oadB and the accC genes, the reaction been proposed as 2-ketoglutarate ferredoxin oxidore-
appears to occur in both directions. ductases20 are significantly more similar to pyruvate
What is considered a gene displacement depends, oxidoreductases and ketoisovalerate oxidoreduc-
of course, on the level of functional resolution15. For tases, although they do not cluster within any of the
example, is merely replacing a catalyst in a pathway known groups of ferredoxin oxidoreductases.
sufficient to qualify as gene displacement or must the No biochemical data are available on the CAC of
cofactors used also be the same? There are several Chlamydia trachomatis. Interestingly, it has been
non-homologous enzymes that catalyse the transfor- reported that infection by Chlamydia psittaci in-
mation between oxaloacetate and phosphoenolpyru- creases glutamate production in the host cell21. Gluta-
vate but that use different sources of phosphate (ATP, mate could be oxidized via 2-ketoglutarate and the
GTP or pyrophosphate; Table 1). incomplete CAC of C. trachomatis. The heterotrophic
A less drastic form of gene displacement is paral- species P. horikoshii requires peptides for growth22.
ogous displacement (Box 1). There are two types of This could explain its unusual, fragmented cycle, with
aconitase synthase, encoded by the genes acnA and the isolated ketoglutarate oxidoreductase playing a
acnB, whose divergence pre-dates that of the radia- role in the fermentation of glutamate.
tion in the Bacteria (see Table 1 for more examples).
Even though several non-orthologous gene displace- Technical aspects of finding orthologous genes
ments have been identified in the CAC, we cannot In determining whether genes coding for the enzymes
claim to know them all. Thus, genome data should of specific reactions are present, several techniques
always be checked for consistency with experimental can be used that go beyond simple homology and
data on the pathways in a species, especially when orthology searches. They reveal the extraordinary va-
reaction steps are hypothesized to be absent. riety in the evolution of proteins and, consequently, a
number of technical pitfalls in the analysis of genome
Comparing biochemical data with genomic data sequences.
The predictions of the presence of biochemical path-
ways based on the genome analysis shown in Fig. 1 Discrimination between orthology and paralogy
are consistent with the biochemical data on the Determining whether homologous proteins perform
species analyzed, with a few exceptions, which are the same function, or merely a similar function is one
discussed below. (An overview of the literature sup- of the major bottlenecks in genome annotation. In de-
porting our predictions of the presence and absence termining functional equivalence, orthology (Box 1)
of CAC genes is available from http://dove. is an important tool. The principal sources of infor-
EMBL-Heidleberg.DE/Genome/Citric_Acid_ mation for identifying orthologous genes are the rela-
Cycle.) Caution is needed when comparing gen- tive levels of sequence identity. Orthologous genes
omic data from one species with biochemical data are expected to have higher levels of sequence identity
from a different species within the same genus, as to each other than to paralogous genes9. Using multiple
gene content evolves at a high rate9. Even within a genomes, orthology predictions can be tested for
species, the gene content varies: a phosphofructoki- consistency9,23. However, patterns of sequence simi-
nase gene isolated from E. coli is absent from the larity alone are often not sufficient to make a reliable
published E. coli genome sequence (T. Dandekar decision.
et al., submitted). An example of this is given by the isopropylmalate
In H. pylori, all of the CAC reactions have been dehydrogenases of the Archaea (Fig. 2). These genes,
observed16 except that of succinyl-CoA synthetase. none of which has been experimentally characterized,
No sequence homologous to a known malate dehy- have similar levels of identity to bacterial isopropyl-
drogenase has, however, been detected in either of the malate dehydrogenases (part of the leucine-synthesis
sequenced H. pylori genomes. The malate-dehy- pathway) and bacterial isocitrate dehydratases. Two
drogenase activity measured16 is one order of magni- of the archaeal isopropylmalate dehydrogenase
tude lower than the activity of the CAC enzymes for genes, MT1388 and AF0628, are located in an
which genes could be found. Also, for the glycolytic operon together with other leucine-synthesis genes.
enzymes pyruvate kinase and phosphofructokinase This identifies them as isopropylmalate dehydrogenases,
low activities have been observed16 and their genes whose function can subsequently be transferred to a