You are on page 1of 10

Introduction

Lactic acid bacteria (LAB) represent a functional group of microorganisms comprising Gram-positive,
acid-tolerant, non-sporulating rods or cocci that produce lactic acid as the major metabolic end-product of
carbohydrate fermentation [43]. As such, they play an important role in many industrial fermentation processes and
human nutrition, due to their presence in the gastro-intestinal tract, and some members have emerged as probiotics,
since they are of human origin and confer benefits on human health [42,56]. Despite the functional definition
characterizing LAB mem-bers they are very heterogeneous from a taxonomic viewpoint [30]. A number of LAB
genomes have been sequenced and the sub-sequent explosion of genomic information has allowed a better
comprehension of LAB features, such as their physiology, metabolic capabilities, key gene features and niche
adaptation. Moreover, the availability of genome sequences has provided a good opportu-nity to understand LAB
evolutionary history [42,67]. In particular, several approaches have been applied, including phylogenomics based on
a variable set (ranging from 141 to 383) of core proteins [11,16,54,57,95], whole genome alignments and evaluation
of coli-nearity distances [4,11], comparison of the gene content across the group, the definition of genes shared by all
LAB (namely LaCOG) and, finally, reconstruction of ancestral gene sets [56]. A further step toward a better
comprehension of LAB evolution is represented by the comparison of crucial metabolic pathways, which can also
provide important information for biotechnological applications [3,91]. As already mentioned, the primary product
of LAB metabolism is lactic acid. It is derived from the degradation of glucose through glycolysis (Embden–
Meyerhof–Parnas pathway; EMPP) and the pentose phosphate pathway (PPP) [30], which constitute the cen-tral
metabolism of the cell together with the citric acid cycle and the Entner–Dourdoroff pathway. Starting from different
carbon sources, LAB can produce diverse end-products of fermentation (homo- and heterolactic) and these
metabolic peculiarities have also been used for taxonomic differ-entiation. Species performing homolactic
fermentation (namely, homofermentative metabolism) produce more than 85% lactic acid from hexoses, using the
EMPP almost exclusively, while hetero-fermenters utilize the 6-phosphogluconate pathway (PP pathway), which
allows the fermentation of hexoses and pentoses that yield other side products, such as ethanol, acetic acid and
carbon dioxide [32]. Heterofermentative bacteria can be further divided into facul-tative or obligate metabolizers: in
facultatively heterofermentative metabolism, hexoses and pentoses are fermented by the EMP and PP pathways,
respectively, while in obligately heterofermentative metabolism, pentoses and hexoses are degraded by the
phospho-gluconate pathway (the first phase of the PPP pathway) with the concurrent production of CO2 [30].
Considering the importance of EMPP and PPP in LAB metabolism, the aim of this study was to investigate LAB
evolu-tion through the comparative analysis of the genetic background related to the two pathways in 42 LAB
genomes in terms of their gene distribution, structure and organization.

Materials and methods

Genome data
Available LAB genomes were searched both in the Genomes OnLine Database (GOLD,
http://www.genomesonline.org) and in the Genome section of the NCBI database (http://www.
ncbi.nlm.nih.gov/genomes/MICROBES/microbial taxtree.html).

The LABsubtilis
Bacillus genomes168used
wasfor
chosen
comparative
as an outgroup.
analysis used in this study are shown in Table 1. The genome sequence of

Phylogenetic analysis

To infer reference phylogenetic trees, 16S, 23S and 5S rRNA gene sequences were downloaded from each
genome page in the NCBI database (http://www.ncbi.nlm.nih.gov/genomes/ MICROBES/microbial taxtree.html).
Moreover, 42 ribosomal pro-teins were retrieved from the genome of each LAB according to BLAST searches [2],
adopting the bidirectional best hit (BBH) crite-rion. In detail, given gene x in genome A and gene y in genome B, if
x is the best hit of query y against all the genes in A and vice versa, then the relation between x and y is called
bidirectional best hit [7,29]. Lactococcus lactis subsp. lactis IL1403 ribosomal protein sequences were used as seeds.
Each dataset was aligned using Muscle [21] and manually adjusted with Jalview 2.6.1. [89]. 16S rRNA and
concate-nated rRNA operon (16S, 23S, 5S rRNA) gene sequence alignments were based on 1299 and 3634
nucleotides, respectively, whereas the multi-alignment obtained from the retrieved ribosomal pro-teins formed a
concatenamer of 5447 sites. Phylogenetic analyses were performed with the MEGA v5.0 [83] software package
using Jukes and Cantor as the distance model. The neighbor-joining [76] and minimum evolution [62] methods were
used for tree recon-struction, as suggested by Kämpfer et al. [37] for 16S and rRNA operon trees, while the Poisson
model [62] and neighbor-joining were used to construct ribosomal protein trees. The statistical reliability of the
phylogenetic tree topology was evaluated using bootstrapping with 1000 replicates [25]. The same alignments were
also used for maximum likelihood reconstruction using Phyml 3.0 with a WAG model of amino acid substitution,
including a gamma function with six categories to take into account differ-ences in evolutionary rates at the sites.
Statistical support at the nodes was obtained by non-parametric bootstrapping on 1000 re-sampled datasets by using
Phyml. This approach was also used to infer the dendrogram from the concatenation of amino acid sequences of
seven enzymes universally distributed in the EMPP and PPP (2007 aa), while single phylogenies of the pgi (1179 nt),
pyk (1355 nt), tpiA (702 nt), pgm (644 nt) and rpe (539), as well as ywlF (420 nt) and prsI (929 nt) genes were
constructed with the same procedure used for 16S rRNA-based phylogeny. SplitsTree v4 [35] was applied to
aligned gene sequences of the above-mentioned genes to detect conflicting signals along the sequences which were
then displayed as networks instead of bifur-cating trees. The analysis of gene sequence similarity for affiliation of
Lb. aci-dophilus 30SC and Lb. casei strains was conducted in the same way as nucleotide sequences for phylogenetic
analysis. Analysis of Lb. acidophilus 30SC and related species was based on 16S rRNA (1469 nt), while for Lb. casei
group species, 16S rRNA (1519 nt), hsp (552 nt), recA (322 nt) and tuf (761 nt) gene sequences were used.

Metabolic pathways analysis

The EMPP and PPP were analyzed based on their representa-tion in the Kyoto Encyclopedia of Genes and
Genomes database (KEGG; http://www.genome.jp/kegg/pathway.html#metabolism) using the genome sequence of
B. subtilis 168 as a reference. A total of 42 genes were selected that were found in at least one genome.
Presence/absence of genes was validated through the BlastN option of the Blast [2] program in order to include
occasionally present non-annotated orfs in the analysis.

Hierarchical analysis

Data analysis was performed within the programming and visualization environment R v. 2.13.0. (R
Development Core Team, 2011). The Heatmap.2 tool for plotting the data of the “gplots” package
(http://cran.r-project.org/web/packages/gplots/ index.html) was used to calculate heatmaps based on a
pres-ence/absence matrix. The “RColorBrewer” package (http://cran.
r-project.org/web/packages/RColorBrewer/index.html) was used to map gene presence and absence values to colors
(green and red, respectively). The “distfun” function was used to compute the distance (dissimilarity) between both
rows and columns, while the “hclust” function was employed to compute the hierarchi-cal clustering. The
dendrogram was constructed by means of an agglomerative clustering algorithm in order to cluster both the rows and
columns of tabular data separately. Reordering of the rows and columns was carried out according to the set of
values (row or column means) within the restrictions imposed by the dendrogram.

Analysis of genetic modules

Gene organization was analyzed based on gene annotation in both the NCBI and KEGG databases. The
Comprehensive Microbial Resource database of the J. Craig Venter Insti-tute

(http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi) was searched in order to confirm putative operon


organization.

Results and discussion

Phylogenetic analysis

In this study, 42 genome sequences representative of 27 species belonging to different families and genera of the
order Lacto-bacillales were analyzed from very different ecological sources (Table 1). The genomes in the dataset
were highly heterogeneous with respect to their GC content, which ranged from 34% for Lc. lactis subsp. lactis
KF147 and Lb. acidophilus NCFM to 51% for Lb. fermentum IFO3956, and genome size, which was within 1.8 Mb
for Lb. johnsonii strains to 3.2 Mb for Lb. plantarum strains. Phylogeny at the intraspecific level was investigated
and the genome sequences of representatives of three subspecies (Lb. delbrueckii subsp. bulgar-icus, Lc. lactis
subsp. lactis and cremoris), as well as multiple strains belonging to the same species (i.e. Lb. johnsonii), were also
included in the dataset. The Lb. reuteri strains were both derived from the same original isolate, Lb. reuteri F725
[59].

Three phylogenetic trees were constructed based on: (i) con-catenation of 42 ribosomal protein sequences (Fig.
1), (ii) 16S rRNA genes (Fig. S1a), and (iii) concatenation of 16S, 23S and 5S rRNA gene sequences (rRNA operon)

(Fig. S1b). Trees inferred with dif-ferent methods and models displayed similar, but not identical, topologies. The
main differences concerned the Lactobacillaceae family, which was split into two clades in the rRNA operon and

ribo-somal protein trees, and one of the clades was supported as a sister group of the clade embedding
Leuconostocaceae representatives in both trees. The monophyly of the Streptococcaceae and Leuconosto-caceae

families was supported in each tree. Among Lactobacillaceae members, the clade including Lactobacillus
delbrueckii (i.e. the Lac-tobacillus delbrueckii group [77] named after the type species of the genus) was always

highly supported statistically. In particular, in the rRNA operon tree, it formed a single lineage of descent,
associ-ated with Leuconostocaceae and Streptococcaceae species, while in the tree inferred by concatenated

ribosomal proteins it was related to Lb. casei, Lb. paracasei, Lb. rhamnosus and Lb. sakei. Other relation-ships
supported were detected within Lactobacillaceae: between Lb. casei/Lb. paracasei strains and Lb. rhamnosus (Lb.
casei group), and Lb. reuteri strains with Lb. fermentum (Lb. reuteri group) [77]. Concerning other families,

Leuconostocaceae displayed a conflicting branching order, since its representatives were embedded in the same
cluster of Streptococcaceae members in the 16S rRNA tree, whereas they clustered with the Lb. delbrueckii group
in the rRNA operon tree and with other Lactobacillaceae members in the ribo-somal protein tree.

The diverse topologies displayed by phylogenetic trees can be explained by differences in evolutionary rates
and/or occurrence of horizontal gene transfer (HGT) [70]. The monophyly of Lacto-bacillaceae in the 16S rRNA
tree was in agreement with previous 16S rRNA analyses [95], but Claesson et al. [16] observed that this molecular
marker was unstable for Lactobacillaceae species, since diverse branching patterns were obtained in different 16S
rRNA trees based on the number of species included. As such, the utilization of multiple markers, such as those
coding for the ribosomal proteins, provides a buffer against the distorting effects of recombination at a single locus
(observed also for 16S rRNA) [27]. The paraphyly of the Lactobacillaceae family (in particular the Lactobacillus
genus) was in agreement with previous studies on whole-genome DNA, DNA-RNA hybridization [17,32], and
phylo-genetic analysis based on concatenated alignments of four subunits of the DNA-dependent RNA polymerase
[11,16,50,56], 32 ribo-somal proteins [11], 141, 383 and 232 core proteins and 243 core gene families
[16,39,54,95]. Highly supported clades were observed within Lactobacillaceae with the largest and most coherent
being the Lb. delbrueckii group, which defined a single line of descent in each tree with respect to other lactobacilli.
Based on this observation and the results obtained, the phyloge-netic tree inferred by the concatenation of ribosomal
proteins was chosen as the reference for further analyses [72,84,92].

Analysis of the EMPP and PPP pathways

The phylogenetic distribution and organization of genes involved in EMPP and PPP biosynthetic pathways
were analyzed. The two pathways were studied based on their representation in the KEGG database
(http://www.genome.jp/kegg/) in which both pathways were denoted by their peripheral enzymes, which are not
usually included in standard textbooks. Genes that were annotated in at least one genome were selected by taking
B. subtilis 168 as a reference. Potentially non-annotated orthologs were identified and retrieved by searching each
genome sequence in order to have the correct representation of the distribution of EMPP and PPP genes.
Concerning EMPP, 42 metabolic steps and their corresponding coding genes were considered: 25 of them
belonging to EMPP and 17 to PPP (Table 2). As for glycolysis, the analysis of the 25 genes revealed a complex
pattern of phylogenetic distribution (Table 2a). In fact, only four EMPP genes were shared by the 42 strains: pgi,
tpiA, pgm, and pyk (coding for EC 5.3.1.9, EC 5.3.1.1, EC 5.4.2.1, EC 2.7.1.40, respectively), suggesting a key
role of these enzymes for cell life that could be an indication of their ancestral origin, since Pgi and Pyk are
involved in the regulatory mechanism of central metabolism, and disruption of both these enzymes has been shown
to lead to a switch from EMPP to PPP [34,79]. TpiA is one of the most abundant enzymes in bacteria [63] and has
been described as the near-perfect catalyst [45]. On the other hand, Pgm is essential for cell growth since no strain
with zero expression of this gene has been obtained [81]. As expected, genes belonging to the central part were
more conserved than those of the peripheral section (Table 2a). The other genes showed a more complex pat-tern
of presence/absence throughout the strains in the dataset. The Lb. delbrueckii group was the LAB cluster lacking
the highest num-ber of central EMPP genes and all 11 strains were missing pdhA, pdhB, pdhC, and pdhD (EC
1.2.4.1 , EC 1.2.4.1 , EC 2.3.1.12, and EC 1.8.1.4), which are organized as an operon. Regarding PPP genes (Table
2b), only three of them were conserved, rpe and ywlF (coding for EC 5.1.3.1 and EC 5.3.1.6, respec-tively)
belonging to the central PPP pathway, and the peripheral gene prsI (EC 2.7.6.1). YwlF allows the cell to utilize
single pentose sugars, while Rpi mutants are ribose auxotrophs [33,55]. Finally, PrsI links the PPP to other central
pathways, such as the pyrimi-dine and purine nucleotide pathway or the biosynthesis of histidine and tryptophan,
and its products are required constantly by the cell [22]. The conserved phylogenetic distribution of these genes
suggests their essential role for cell life. The PPP peripheral genes showed a less conserved and more complex
distribution. Focusing on gene copy number, for EMPP, up to ten copies were annotated in KEGG for pgm and
bglC (EC 5.4.2.1 and EC 3.2.1.86) in Lb. plantarum strains. Similarly, pgm displayed a high number of putative
paralogous copies in all the strains considered in this study. On the other hand, the maximum copy number for PPP
path-way genes was three (data not shown). Genes shared by distinct sets of bacteria reflect recent gene exchanges
between distantly related species within the species Lactobacillales [57]. As for the gene copy number annotated in
KEGG, the presence of several putative par-alogous genes could represent duplications of widely conserved genes
in the common ancestor of Bacilli or at a later stage of evo-lution. Furthermore, it supports the occurrence of
duplication or HGT of genes involved in sugar metabolism, as estimated by pre-vious evolutionary investigations
[57]. However, it is clear that the number of paralogs has to be carefully evaluated for each species. Hierarchical
analysis of the pathways and gene organization. A hierarchical clustering analysis was performed in order to
identify significant patterns of EMPP and PPP gene distribution. This analy-sis was visualized through the
heatmaps shown in Figs. 2a and 3a, where the hierarchical agglomerative clustering of genes was com-bined with
that of species, and then displayed as dendrograms at the top and left-hand side of the graph. Concerning the
EMPP heatmap (Fig. 2a), the differential dis-tribution of several genes – pdh cluster and pfkA gene – showed
clustering in three groups, namely, AE , BE and CE . Cluster AE grouped Streptococcaceae members, some
Lactobacillus strains (i.e. those belonging to the Lb. casei group and Lb. plantarum species), as well as Bacillus,
Enterococcus and Pediococcus; cluster BE included members of Leuconostocaceae and the obligately
heterofermentative Lacto-bacillus strains (members of the Lb. reuteri group and Lb. brevis); whereas cluster CE
encompassed all the species of the Lb. delbrueckii group. Looking at the degree of gene co-occurrence, the EMPP
heatmap showed the presence of three groups of genes (Fig. 2a): group aE included the above mentioned pdh
operon genes, whereas clusters bE and cE contained genes distributed in all the strains. The analysis of gene
distribution and association revealed the presence of another operon in EMPP, namely the gap operon in B.
subtilis, which is constituted by gapA, pgk, tpiA, pgm and eno genes. The distribution of pdh and gap operons and
their orientation in the chromosome were mapped on the ribosomal protein phy-logenetic tree (Fig. 2b). These
operons had the same orientation in the chromosome of strains belonging to the same species or subspecies, but
they differed at the genus and family level. The pdh operon was harbored by all the genomes except the L.
del-brueckii group species. As for the gap operon, B. subtilis displayed a pentacistronic operon, which was not
detected in any of the 42 LAB strains, where the five genes were partially clustered. Lacto-bacillaceae members
harbored the operon without the pgm gene, which was located in another locus of the genome: the majority of
strains in the Lb. delbrueckii group revealed a tricistronic operon consisting of gapA, pgk and tpiA. The gap
operon gene composi-tion and orientation agreed with phylogenetic clustering for the Lb. casei group species and
Lb. sakei 23 K, and for Lb. plantarum/P. pentosaceus strains. These genes were not organized in an operon in
Streptococcaceae and Leuconostocaceae members, although there were some exceptions (S. gallolyticus UCN34
and Le. mesenteroides ATCC 8293T ). The gap operon involved the major part of the genes in the payoff phase of
glycolysis (Table 2a) and these genes play a cen-tral role in glycolysis, since their encoded proteins are responsible
for the conversion of triose phosphate into phosphoenolpyruvate, the precursor of pyruvate, and the production of
one ATP molecule. Even if the whole operon was not maintained in every genome, the genes involved were
universally distributed in the dataset with the only exception of Lb. reuteri DSM 20016T that lacked a 40 Kb
region containing three genes out of the five (gap, pgk and eno) compared to Lb. reuteri JCM 1112T [59]. The eno
gene is nearly ubiquitous and more than one copy is annotated in KEGG, which seems to be a particular
characteristic of Lactobacillales. It has been shown in previous phylogenetic analysis that one of the enolases in
Lactobacillales is the ancestral version of that in Gram-positive bacteria, while the other copy has been acquired by
the ances-tor of Lactobacillales from a different bacterial lineage, most likely Actinobacteria [56,57]. The pdh
operon codes the four enzymes constituting the pyruvate dehydrogenase complex (PDHc), which is a multimeric
structure that catalyzes the conversion of pyruvate to acetyl-CoA at the end of EMPP with the reduction of the
oxidizing agent NAD+ to NADH and H+ [31]. The PDHc constitutes one of the alternative pyruvate metabolic
pathways besides lactate dehydrogenase, and yields the production of side-products such as acetate and ethanol
[61]. The PDHc could be the key that connects the genotype of lacto-bacilli to their phenotype: the finding that
PDHc is absent in the Lb. delbrueckii group species is in full agreement with the fact that they are
homofermentative [30] and, as such, glucose is converted exclu-sively to lactic acid via Ldh. Furthermore, pfkA,
which was missing in Lb. brevis, Lb. reuteri, Lb. fermentum, Lb. buchneri and Leuconos-tocaceae, catalyzes the
phosphoryl transfer to the 1-position of the substrate fructose-6-phosphate, a critical priming ATP-dependent
reaction at the top of the EMPP [73]. Similarly to PDHc, its absence in obligate heterofermentative species can be
linked strictly to their metabolism [59] and, since EMPP is incomplete, glucose fer-mentation is driven through the
PPP pathway instead, leading to the production of CO2 , which differentiates obligately from fac-ultatively
heterofermentative metabolism. These hypotheses are supported by transcriptional studies and previous in silico
predic-tions, which showed that genes coding for glycolytic enzymes were among the most highly expressed in
homofermentative and facul-tative heterofermentative lactobacilli [28]. Other interesting associations were found
within genes throughout the genomes (e.g. the pyk gene was always associated with pfkA). Moreover, Lc. lactis
strains displayed the synteny of pyk, pfkA and ldh genes, named the las operon, which was a distinc-tive feature of
this genus (data not shown). The activity of PyK is of critical importance and it is increased together with that of
Ldh by fructose 1,6 bisphosphate, the product of PfkA, indicating that these enzymes constitute a very efficient
tool for regulating energy production and lactic acid synthesis [46,53]. The overall distribution of EMPP genes in
distantly related species agreed with the phylogenetic analyses and enzyme distribution studies performed on
Archaea and hyperthermophilic bacteria, which revealed that the enzymes of the EMPP preparatory phase were all
derived from multiple gene families, whereas single sequence families were detected for the enzymes of the payoff
phase (Table 2a) [74]. This finding has suggested that this latter portion is under strict evolutionary maintenance
leading to the hypothesis of a gluconeogenic origin for the pathway (from the bottom up) [26,74]. As for the PPP
heatmap (Fig. 3a), three groups of species were recognized according to the distribution of the gntK, rbsK, gntZ,
ykgB, zwf and drm genes. Cluster AP grouped Lacto-bacillus delbrueckii species, Leuconostocaceae strains and a
few other lactobacilli, while cluster BP comprised the remaining Lactobacillus strains together with Enterococcus,
Bacillus and mem-bers of Lc. lactis. Streptococcus species were included in cluster CP . Regarding gene
co-occurrence, two groups were observed: cluster aP embedded genes which were present in each
genome; and cluster bP grouped the rbsK, gntZ, ykgB, and zwf genes, which were only missing in streptococci.
Again, the genomic organization of the loci was investigated (Fig. 3b) and led to the detection of two different
associations: the first embedded zwf, gntZ and gntK, whereas the second comprised rbsK, rpe and ywlF. zwf, gntZ
and gntK are part of the PPP non-oxidative branch, while rbsK, rpe and ywlF are involved in the metabolism of
ribulose-5-phosphate, a crucial intermediate of the PPP. The enzymes coded for by zwf, gntZ and gntK are
responsible, together with YkgB, for the conversion of -d-glucose-6P and d-gluconate to d-ribulose-5P with
production of CO2 , whereas enzymes coded by rbsK, rpe and ywlF convert d-ribose and d-xylulose-5P into
d-ribulose-5P with the degradation of one molecule of ATP in ADP. Mapping zwf, gntZ and gntK genes on the
ribosomal proteins phylogenetic tree (Fig. 3b) revealed alternative combinations of gene organization,
heterogeneously distributed at the species and genus level. The synteny of zwf, gntZ and gntK was not detected in
any LAB genome, whereas zwf-gntZ or gntZ-gntK were alternatively observed and no homogeneity was found at
the family level. At the genus level, Lac-tobacillus showed the most divergent pattern due to the absence of
association in Lb. delbrueckii group species and in Lb. salivar-ius UCC118, and the presence of zwf-gntZ or
gntZ-gntK in the remnant lactobacilli. A bicistronic operon gntZ-gntK was observed also in Leuconostoc members
but with a different orientation. Regarding the rbsK-rpe-ywlF cluster, its presence was observed in Lactobacillaceae,
Leuconostocaceae and Streptococcaceae although with different synteny. Interestingly, rpe-ywlF and rbsK-ywlF
were observed in plasmids pMP118 and pLBUC01 harbored by Lb. salivar-ius UCC118 and Lb. buchneri NRRL
B-30929, respectively (data not shown). As already observed for EMPP genes, despite the absence of wide genetic
modules, other associations involving PPP genes were found but without a clear link to the taxonomic distribution.
The PPP data seem to argue for the relatively recent evolution of PPP compared to EMPP [18], since the
development of the PPP pathway allows the cell to produce pyruvate from pentoses, thus by-passing the EMPP
pathway between glucose-6P and fructose-6P and, as such, increasing cell metabolism plasticity [19,87]. The
gen-eral absence of the almost complete non-oxidative branch (gntK, rbsK, gntZ, ykgB, zwf) in the Streptococcus
species under study and in other Streptococcus strains (data not shown) suggested the recent loss of these genes
during Streptococcus evolution. The limited num-ber of pyruvate branches and the absence of a complete PPP might
have important consequences for NADPH generation, as well as ribonucleotide and aromatic amino acid synthesis in
these species, which, conversely, have shown a complex amino acid metabolism via redox constraints [68]. The lack
of these PPP genes in Strep-tococcus species and the presence of the EMPP las operon only in Lc. lactis strains
denote a high rate of diversity among members of Streptococcaceae family.

Phylogenetic analysis of the carbohydrate metabolism

Single phylogenies for genes found in all the strains (pgi, pyk, tpiA, pgm, rpe, ywlF and prsI) were constructed
(Figs. S2a–g) together with a novel phylogenetic tree inferred from the con-catenation of the corresponding amino
acid sequences (Fig. 4). The branching order depicted in the reference phylogeny was recognized in almost each
nucleotide tree except for the posi-tion of O. oeni and obligately heterofermentative lactobacilli in the pgi, tpiA and
ywlF trees. Interestingly, in the concate-nated tree, the obligately heterofermentative Lb. reuteri group species, Lb.
brevis and Lb. buchneri branched with Leuconosto-caceae members, which provided evidence of a phylogenetic
relationship between these groups of strains besides their con-gruent metabolic properties, adding a novel element
concerning the origin and evolution of heterofermentative/homofermentative metabolism.
In addition, the application of the split decomposition method revealed the presence of interconnecting networks in

each phy-logeny (Figs. S3a–g), indicating data conflict along the nucleotide sequences. This may indicate a complex

evolutionary history at the level of these loci, characterized by more complicated events other than speciation, such

as recombination or HGT, which explains the different position of the branching point that did not correspond
directly to the reference topology.

The proposal of an evolutionary model

Data obtained from the presence/absence matrices and heatmaps were mapped together on the phylogenetic tree
inferred by the concatenation of ribosomal proteins in order to have a comprehensive vision of gene/operon
distribution and, as such, to propose an evolutionary model for Lactobacillales EMPP and PPP genes (Fig. 5).
Assuming that the most parsimonious model was the most reliable, these data suggested the existence of a defined
set of EMPP and PPP genes in the common ancestor of LAB. Some of them might have undergone single or multiple
loss events in differ-ent phylogenetic lineages, although genes harbored by a restricted number of closely related
bacteria might have conversely been acquired. According to this idea, the data obtained predicted the existence in the
genome of the common ancestor of (at least): (i) all genes belonging to the central phase of EMPP and three
periph-eral genes (galM, bglC and adhB), and (ii) six central PPP genes (zwf, ykgB, gntZ, rpe, ywlF and xpkA) and
three peripheral genes (gntK, rbsK and deoC). Some of the events of gene loss have been reported in Fig. 5. These
observations agreed with previous analysis of fre-quent phyletic patterns in LaCOG (Lactobacillales-specific
clusters of orthologous protein coding genes) [57]. All the members of the same phylogenetic group were found to
belong to the corresponding cluster in the EMPP and PPP heatmaps (Fig. 5), indicating a comparable genotypic
background (referring to the metabolic pathways investigated) and supporting the hypothesis that they evolved from
the same common ances-tor. This hypothesis was further reinforced in particular groups whose members were
characterized by the absence of specific genes, such as the Lb. delbrueckii or Streptococcus groups, which can be
referred to the principal events of gene loss in the common ancestor. Besides gene loss events, it was also possible to
detect gene gain episodes. In EMPP, gapN was found in Lb. buchneri, Lb. delbrueckii subsp. bulgaricus and
Streptococcus members. Blast searches revealed that L. delbrueckii subsp. bulgaricus gene sequences shared 70%
gene sequence similarity with strains belonging to Strepto-coccus species. This data led to the hypothesis of a gene
transfer between Streptococcus and Lb. delbrueckii. Events such as this have been detected previously for other
genes between S. thermophilus and Lb. delbrueckii subsp. bulgaricus, due to their stable co-existence in a milk
environment with positive cooperation [51]. Other poten-tial gene gain events regarded ytcI in Lc. lactis strains (72%
gene similarity with Lc. garviae strains) and ycbD in Lb. plantarum strains (80% gene similarity with
Staphylococcus aureus strains). As for PPP genes, the only feasible gene gain event involved xpkA in Lc. lac-tis
subsp. lactis, which showed 81% gene sequence similarity to Le. kimchii IMSNU 11154. Even if Lc. lactis subsp.
lactis strains included in this analysis were derived from a dairy environment, the hypoth-esis of HGT between these
two species is supported by the isolation of Lc. lactis and Le. kimchii from kimchi [49].

Taxonomic implications

The taxonomic implications of the evolutionary analyses of LAB presented in this study are broad. Although only
six type strains were included in the analysis, some considerations regarding the taxonomy can be emphasized.
The analysis of the phylogenetic trees (Fig. 1) revealed the exist-ence of some nomenclature ambiguities within the
Lactobacillus genus regarding: (i) Lb. acidophilus 30SC, and (ii) Lb. casei/Lb. paracasei strains. As for the former,
16S rRNA gene sequences of Lb. acidophilus 30SC, Lb. amylovorus GRL1112, Lb. acidophilus NCFM, Lb.
amylovorus DSM 20531T and Lb. acidophilus DSM 20079T , were compared. Similarity values between strain 30SC
with Lb. amylovorus GRL1112 and DSM 20531 T (99.9 and 99.7%, respectively) were higher than those observed
with Lb. acidophilus strains (98.3%) (Table S1), indicating that strain 30SC belonged to Lb. amylovorus species
rather than to Lb. acidophilus (Fig. 1) [8]. As for the Lb. casei clade, a more comprehensive analysis was based on
16S rRNA, hsp60, recA and tuf gene sequences (1519, 552, 322, 761 nt, respectively), which have already been used
as markers for the phylogeny of the Lb. casei group [9,13,20,23]. Gene sequence comparison revealed that strains
Zhang and BL23 were more closely related to Lb. paracasei JCM 8130T and ATCC 334 than to Lb. casei ATCC 393T
(Table S2), thereby designating these strains as Lb. paracasei species.

Based on phylogenetic, genotypic and evolutionary data, this study added further elements which confirmed and
emphasized the high heterogeneity and inconsistency of the Lactobacillus genus and, therefore, of the
Lactobacillaceae family. The taxon-omy of genus Lactobacillus has been considered unsatisfactory and far from
being well established. This is due to inconsis-tency between the traditional phenotypic classification, based on the
type of carbohydrate metabolism, and 16S rRNA-based sequence comparison revealed that strains Zhang and BL23
were more closely related to Lb. paracasei JCM 8130T and ATCC 334 than to Lb. casei ATCC 393T (Table S2),
thereby designating these strains as Lb. paracasei species.

Based on phylogenetic, genotypic and evolutionary data, this study added further elements which confirmed and

emphasized the high heterogeneity and inconsistency of the Lactobacillus genus and, therefore, of the
Lactobacillaceae family. The taxon-omy of genus Lactobacillus has been considered unsatisfactory and far from

being well established. This is due to inconsis-tency between the traditional phenotypic classification, based on the
type of carbohydrate metabolism, and 16S rRNA-based phylogeny [17,85] together with a GC content which is

approxi-mately twice as large as that normally accepted for well-defined genera [15,78]. The diversity enhanced in
this study may be sufficiently feasible to reclassify the genus into new genera, although evolutionary data would

have to be validated for each species and integrated with other taxonomically relevant information [24].

Note

The genome sequences of other Lactobacillales strains are now available. These sequences, and others that will
be available shortly, will be helpful for the progress of the evolutionary model following the same approach
proposed in this analysis. Competing interests The authors declare they have no competing interests.
Acknowledgements The authors are grateful to Yakult B.V. for financial support of this research and Prof. Franco
Dellaglio for fruitful discussions concerning the project. Marco Fondi was financially supported by a FEMS
Advanced Fellowship (FAF2012).

You might also like