You are on page 1of 6

Species

Delimitation Plugin Manual


Authors
Brad Masters, Vicky Fan and Howard Ross
Bioinformatics Institute
University of Auckland
Contact: h.ross@auckland.ac.nz

Purpose
The Species Delimitation plugin for the Geneious bioinformatics software
(www.biomatters.com) is an exploratory tool that allows users to assess putative species in
phylogenetic trees. The plugin summarises measures of phylogenetic support and
diagnosability of species defined as user-selected collections of taxa on user-supplied trees,
but it does not provide definitive support for species groups (Figure 1). The phylogenetic
trees may be estimated using modules in Geneious, or with external applications.
Distinct clades in a tree are often interpreted as species. However monophyly of a set of taxa
can occur by chance within a larger panmictic group as a result of the coalescent process.
The plugin implements the method of Rosenberg (2007) for calculating the probability of
reciprocal monophyly under the null model of random coalescence.
Species boundaries are sometimes identified by deep divergences in phylogenetic trees
estimated from single gene sequence alignments. However, when a sample of sequences is
collected from a panmictic population, one can sometimes observe a marked cladistic
structure, arising solely from the stochastic process of gene coalescence. In these situations,
if the distance from a species-defining node to the tips is much smaller than the distance
from that species-defining node to its ancestral node, then we might mistakenly infer the
presence of a cryptic species. The plugin implements a method developed by Rodrigo and
colleagues (2008) to estimate the probability of observing such a divergence under the null
hypothesis of coalescence acting on a single, panmictic population of constant size.
Diagnosability is an important criterion in species delimitation. Hebert and colleagues (2003)
proposed that the barcode gap provided evidence that single-gene sequences could
provide reliable identification of most species. The relationship between genetic
differentiation and the reliability of species identification was assessed in several different
evolutionary scenarios in a simulation study by Ross and colleagues (2008). The plugin
implements the findings of that study, to present the probability that a member of a
putative species could be identified correctly given the current alignment as the reference
dataset.
You can use the plugin to investigate different hypotheses of species boundaries. To do so,
make several copies of a tree and then assign taxa to species differently in each.

Figure 1. A tree document and its associated Species Delimitation option panel.

Download
The Species Delimitation plugin can be installed directly from within Geneious (see next).
The documentation and an example tree can be downloaded from
www.fos.auckland.ac.nz/~howardross/Software/SpDelim/

Installation
Requires Geneious. Install it from www.geneious.com.
To install the plugin:
Start the Geneious program.
From the Tools : Plugins menu
Select the Species Delimitation plugin from the list.
Click the Install button associated with it.
If Geneious prompts for a restart, please do so before continuing.

Using the Plugin


Getting Started: Once the Species Delimitation plugin is installed, you can use it by selecting
a tree document, selecting Tree View, and in the right-hand sidebar selecting the Species
Delimitation option panel (Figure 1). This panel displays all the information the Species
Delimitation tool can access from the current tree document. Using the Species Delimitation
tool will not alter the topology or branch lengths of your tree but it will alter the node
colourings.
We suggest that you first copy and paste the tree document, to give a working copy of the
tree with its nodes coloured and statistics calculated.
Adding Species: Taxa (i.e., sequences, leaves, tips) belonging to the same species are
indicated or specified by a shared colour. You can assign taxa to species in two different
ways:
1. Select a group of nodes, and in the Species Delimitation option panel enter a name and
click Add Selection to define a species. The nodes will be given a randomly chosen colour.
2. The species may be defined in advance by using the Geneious Color Nodes function (Pro
Version only). Select a set of nodes and assign the same colour to them. Or, as you work on
the tree, select a set of nodes, use the Color Nodes tool to give them a unique color, click off
the nodes, and click Reload Tree in the Species Delimitation option panel. All nodes of the
same colour will be added to the same species.
Rename Selection: To change the name assigned to a species, or to assign a name to an
unnamed species, select entry from the list of species and click Rename Selection.
Redefining Species: Currently the plugin is not capable of additions or deletions of members
of a species set. To redefine a species, select the entry from the list of species and click
Remove. Reassign the taxa using the methods for adding species.
Reload Tree: This clears all of the species sets and redefines them based on the identically
coloured groups in the tree. This is useful if species require specific colour settings. The
Geneious colouring tool will not automatically assert the coloured group as a species in the
Species Delimitation tool.
Save SpDelim Results: When you click Save SpDelim Results in the Species Delimitation
option panel, the species sets, with their colourings, and the associated statistics are saved.
A table summarizing the statistics is created in the Species Delimitation tab in the tree
document. Note that only one set of results can be stored with each tree document. If you
want to investigate several scenarios, then copy and paste several copies of the tree
document and develop a different scenario on each copy. To save the statistics to file, go to
the Species Delimitation tab and click the Save to Text File button.
The Save SpDelim Results button will overwrite previous species delimitation results. It is
possible to save the tree in its current state without updating the results by using the
Geneious Save command (Ctrl+S, Cmnd+S, or Save icon), however this will cause the Species
Delimitation tab to become out of sync with the species groups on the tree. Clicking the
Save SpDelim Results button again will resynchronize the tree and results.
Reset: This clears all of the colouring and species sets from the tree. Warning: This will also
remove any colouring added to the trees prior to use of the plugin.

Example Tree
The tree shown in Figure 1 may be downloaded and imported into Geneious. Use the File :
Import : From file command. You can then select the clades, define them as species, and
check that you get the same results as illustrated. The tree was estimated using sequences
deposited in Genbank by Mil and colleagues (2007). It is only intended to illustrate the use
of this plugin and not to be considered an analysis of the delimitation of species within this
group of organisms.

Interpreting the Results


The plugin reports several statistics that can be useful when considering where species
might be delimited. The statistics presented are largely based on group-to-group
comparisons. Consequently at least two species must be defined in order to obtain results.
The results are calculated dynamically with each addition or removal of a species group.
Results can be viewed by selecting a species using the drop down menu or by selecting all
the nodes of a species on the tree itself. Selecting a monophyletic species group is easily
accomplished by selecting its most recent common ancestor.
Monophyletic?: Whether the species is monophyletic or not. This property is important in
determining whether it is possible to calculate P(Randomly Distinct), Clade Support or
Rosenbergs PAB as these methods are only applicable to monophyletic groups. Note that
species containing a single member are monophyletic by definition.
Intra Dist: The average pairwise tree distance among members of the focal species. Larger
values indicate that the members of the species are more diverse.
Inter Dist: The average pairwise tree distance between the members of the focal species and
members of the next closest species. Larger values indicate the species groups are
increasingly distinct.
Intra/Inter: The ratio of Intra Dist to Inter Dist. This provides a measure of genetic
differentiation between the focal species and its nearest neighbouring species. Small values
indicate that genetic differences within the focal species are small relative to differences
between members of the focal species and members of the closest species.
P ID(Strict): The mean probability, with the 95% confidence interval (CI) for the prediction,
of making a correct identification of an unknown specimen of the focal species using
placement on a tree and the criterion that it must fall within, but not sister to, the species
clade.
P ID(Liberal): The mean probability, with the 95% confidence interval (CI) for the prediction,
of making a correct identification of an unknown specimen of the focal species using BLAST
(best sequence alignment), DNA Barcoding (closest genetic distance) or placement on a tree,
with the criterion that it falls sister to or within a monophyletic species clade.
P ID(Strict) and P ID(Liberal) are based on the premise that the putative species is
represented in the reference sequence alignment by the current member taxa and that any
query or unknown sequence is drawn from a species having the same coalescent model.
They are derived from the simulation results of Ross et al. (2008). Regression analysis was
applied to these results to model the response variable, the probability of a correct
identification, by the explanatory variable, the Intra/Inter genetic distance ratio. Separate
regression analyses were performed for cases with specific numbers of reference sequences
for a putative species. The cases were 1, 2, 3, 4, 5-6, 7-8, 9-11, 12-15, and 16-19 references
per species. Separate models were developed for the Strict and Liberal criteria and these are
applied to the taxa in the users tree.
Av(MRCA): The mean distance between the most recent common ancestor of a species and
its members.
P(Randomly Distinct): The probability that a clade has the observed degree of
distinctiveness, i.e., has such a long subtending branch, due to random coalescent processes
(Rodrigo et al., 2008). Focal groups with values between 0.05 and 1 represent groups that
have branching events that would be expected under the coalescent model in a Wright-
Fisher population and a strict molecular clock. We can only conclude that the focal group has
branching significantly different to what we would expect under the coalescent process if
the result is less than 0.05. A value this low indicates the possibility that a cryptic species is
present, as the lineage is not conforming to a Wright-Fisher model.
Notes:
1. This calculation is only relevant if the tree is estimated under a strict molecular clock.
2. The formula used to calculate this statistic becomes unstable due to computational
issues with precision when the number of individuals in the tree exceeds 40. It will still
be useful for species groups with a deep lineage, however, shallow groups whose
ancestor exceeds the 40th coalescent point will always return the value 1. This high value
indicates that we cannot confirm or deny the assumption of a population acting under
the Wright-Fisher model.
Clade Support: Bootstrap support or Bayesian posterior probability will be available for
monophyletic species where the tree estimation technique has estimated the sequence
support for the indicated clade. Larger values indicate stronger support.
Note that when the Bootstrap support is computed using Geneious then it will automatically
be given the field name Consensus support(%) or bootstrap proportion. In a similar way,
the Bayesian posterior probability is automatically named Clade support by Geneious.
When these support values are computed by other applications, such as MrBayes, Paup*,
PHYML or PHYLIP, then they will have no associated field name. When the trees are
imported into Geneious, then the field is given the generic name label. However, if you
first import such a tree into FigTree, assign a different name to the support value field (e.g.
PostProb or BPP) and then export the tree, the support values will have the new names
assigned to them. Then if the tree is subsequently imported to Geneious, the support value
will have the field name that you assigned. This plugin only supports the default field names
used by Geneious (Consensus support(%), bootstrap proportion, Clade support and
label) and will be unable to summarize support values with other names. Nevertheless,
you will be able to display such values on your tree by selecting the Show Node Labels
option and selecting the appropriate field from the dropdown list.
Rosenbergs PAB: The probability that species A represented by a sequences, in a clade of a +
b sequences, will be reciprocally monophyletic with the remaining b sequences under the
null model of random coalescence.

Technical Issues
Speed of operation is an issue with this plugin, especially for larger trees containing
hundreds of taxa. The primary reason is that the plugin uses the tree as its data, not any
underlying sequence alignment. In performing its calculations, the plugin must compute all
pairwise distances on the tree. As the number of taxa (n) increases, the number of pairwise
distances increases in proportion to n2. This version introduces some changes to improve the
performance of the plugin, and the effects should be noticed with larger trees.

References
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA
barcodes. Proceedings of the Royal Society of London. Series B, Biological Sciences
270, 313-321.
Mil B, Smith TB, Wayne RK (2007) Speciation and rapid phenotypic differentiation in the
yellow-rumped warbler Dendroica coronata complex. Molecular Ecology 16, 159-173.
Rodrigo AG, Bertels F, Heled J, Noder R, Shearman H, Tsai P (2008) The perils of plenty: what
are we going to do with all these genes? Philosophical Transactions of the Royal
Society London. Series B, Biological Sciences 363, 3893-3902.
Rosenberg NA (2007) Statistical tests for taxonomic distinctiveness from observations of
monophyly. Evolution 61, 317-323.
Ross HA, Murugan S, Li WLS (2008) Testing the reliability of genetic methods of species
identification via simulation. Systematic Biology 57, 216-230.

v 1.03
30 July 2010

You might also like