Professional Documents
Culture Documents
doi:10.1093/jhered/esh074
Computer Note
GENECLASS2: A Software
for Genetic Assignment and
First-Generation Migrant
Detection
S. PIRY, A. ALAPETITE, J.-M. CORNUET,
D. PAETKAU, L. BAUDOUIN, AND A. ESTOUP
From the Centre de Biologie et de Gestion des Populations,
INRA, Campus International de Baillarguet CS 30 016,
F34988, Monferrier-sur-Lez CEDEX, France (Piry, Alapetite,
Cornuet, and Estoup); Wildlife Genetics International, Box
274, Nelson, BC V1L 5P9, Canada (Paetkau); and CIRAD,
Departement Cultures Perennes, Avenue Agropolis, F34098
Montpellier CEDEX 5, France (Baudouin).
Address correspondence to Sylvain Piry at the address
above, or e-mail: piry@ensam.inra.fr.
536
Statistical Criteria
Three types of criteria used for likelihood estimation have
been implemented in GENECLASS2: genetic distancebased criteria, a criterion directly based on allele frequencies,
Computer Notes
k
X
mi Logpi ;
i1
k
i1
k
X
Logmi 1
i1
k
X
i1
a
Log mi
k
Logm n a;
Probability Computation
The program computes the probability of the multilocus
genotype of each individual to be encountered in a given
population. Monte Carlo methods allow computing a random
sample of multilocus genotypes for a large number of
537
538
Program Features
Input Data Files
For the simple computation of criteria, a data file containing
a mixture of diploid or haploid data is accepted, and the level
of ploidy is taken into account during computations.
However, probabilities based on Monte Carlo resampling
methods are computable only for data files containing diploid
or haploid data and not a mixture of both. The file formats
accepted by GENECLASS2 are those used by the following population genetics software programs: GENEPOP
(Raymond and Rousset 1995), GENETIX (Belkhir et al.
19962001), and FSTAT (Goudet 1995). GENECLASS2
also converts input data files into any of the three above file
formats.
Datasets of virtually any size can be treated, provided the
computer has enough memory to load the data and make the
computations. Two files are needed for the assignment of
individuals not included in the reference population sample.
One file contains the reference dataset, the other the
individuals or groups of individuals to be assigned. Only
a single file is needed when the to-be-assigned individuals are
included in the reference population sample (cf. selfassignment or migrants detection).
Output Data Files
The criterion log10(L) or the genetic distance for the
distance-based method, as well as probabilities, are displayed
in a grid with one row per individual and one column per
population. When the probability computation option is not
used, an adjustable number of best-matching populations are
sorted for each individual and a score is given for each
population. In a reference file with P populations, the score
of an individual i in a population T is computed as follows:
Li;T
;
4
Scorei;T PP
j1 Lj;T
with Li,T the likelihood value of the individual i in the
population T. In the case of self-assignment, a quality
index is computed as the mean value of the scores of each
individual in the population it belongs to.
Computer Notes
Running Environment
GENECLASS2 was developed in the Pascal object programming language and compiled with Borland Delphi6 and
Kylix2. Therefore the software can be run on a Microsoft
Windows or a Linux platform. An easy-to-use graphics
interface has been designed to guide the user in the
assignment process: choice of task (assign/exclude population as origin of individuals or detection of migrants), choice
of statistical criterion for likelihood estimation, computation
of probabilities, etc. The package includes a user-friendly
help file with graphical interfaces that explain how to run the
program to perform the two above tasks.
Program Availability
GENECLASS2 is freely available in English or French at
http://www.montpellier.inra.fr/CBGP/softwares. A selfextracting setup executable leads the user through installation of the software on Windows-based machines. An
RPM file containing the program file and the libraries allows
the package to be installed on Mandrake and Red Hat Linux
platforms. A .tar.Z archive allows manual installation of the
binaries on other kinds of Linux platforms. A registration
form allows users to be kept informed of new releases.
Acknowledgments
References
Baudouin L and Lebrun P, 2000. An operational bayesian approach for the
identification of sexually reproduced cross-fertilized populations using
molecular markers. In: Proceedings of the International Symposium on
Molecular Markers for Characterizing Genotypes and Identifying Cultivars
in Horticulture, Montpellier, France, March 69, 2000 (Dore C, Dosba F
539