Professional Documents
Culture Documents
The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and known for its linguistic diversity, is
considered to be important for understanding human dispersals and genetic diversity in Eurasia. We report a synthesis of
autosomal, Y chromosome, and mitochondrial DNA (mtDNA) variation in populations from all major subregions and
linguistic phyla of the area. Autosomal genome variation in the Caucasus reveals significant genetic uniformity among its
ethnically and linguistically diverse populations and is consistent with predominantly Near/Middle Eastern origin of the
Caucasians, with minor external impacts. In contrast to autosomal and mtDNA variation, signals of regional Y
chromosome founder effects distinguish the eastern from western North Caucasians. Genetic discontinuity between the
North Caucasus and the East European Plain contrasts with continuity through Anatolia and the Balkans, suggesting major
routes of ancient gene flows and admixture.
Key words: Caucasus, population genetics, human migration, autosomal markers, Y chromosome, mtDNA.
Introduction
The earliest evidence of the dispersal of the genus Homo
outside Africa comes from the Caucasus, a mountainous
region between the Black and Caspian Seas, linking the
Near/Middle East and the East European Plain (Lieberman
2007; Lordkipanidze et al. 2007). Anatomically modern
humans appeared there at least 42,000 years ago (Adler
et al. 2008). The early Upper Paleolithic (UP) sites in the
Caucasus (Bar-Yosef et al. 2006; Pinhasi et al. 2008) and
the adjacent East European Plain (Krause et al. 2010) are
nearly contemporary, whereas the lower Danube Basin
UP temporal estimates (Mellars 2006) predate the northern
Pontocaspian UP by several thousands of years. This makes
the two pathsacross the Caucasus and via Anatolia and
the Balkansequally plausible for the pioneer phase of
peopling of East Europe. Likewise, people from both the
Caucasus and the East European Last Glacial Maximum
(LGM) refugia (Adams and Faure 1997; Tarasov et al.
2000) may have been the source population for the repopulation of East Europe after the LGM, whereas the route
The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: journals.permissions@oup.com
359
Abstract
MBE
FIG. 1. (A) geographical map of the populations of the Caucasus included in this study. The language family affiliation of each population is
given. Adapted from Wikipedia. (B) Pairwise FST distances between the Caucasus and neighboring populations, ranging from red (low) to blue
(high), based on autosomal data. The populations (data from this study and the literature; Li et al. 2008; Behar et al. 2010) are divided into
regional groups. (C) Plot of the first and second components of the PCA of the Caucasus and neighboring populations based on autosomal
360
Autosomal Analyses
Africa (except Egyptians), East Asia, and Siberia and also Hazaras, Kurds, Uygurs, and Altaians, resulting in a final data set
of 212,398 markers and 848 individuals. PCA was carried out in
the smartpca program (Patterson et al. 2006) using outlier
removal procedure (5 outliers were removed, leaving 843 individuals). Pairwise genetic differentiation indices (FST values)
were also estimated using smartpca software based on the
same thinned marker set. The Gplots R package (http://
cran.r-project.org/web/packages/gplots/index.html) was used
to graphically represent genetic similarities between populations by color-coding pairwise FST values on a heatmap.
Y Chromosome Analyses
A total of 1952 samples from 24 populations from the Caucasus were analyzed for Y chromosome markers. The samples
data, with the clustering of populations approximating geography. The thick lines denote probable directions of movements of people,
interrupted between the North Caucasus and Eastern Europe. Despite the genetic discontinuity between the Caucasus and the East European
Plain, some admixture (denoted by the thin two-pointed arrow) has occurred, apparent from the several samples occupying the gap between
the regions. For population abbreviations see supplementary table S1 (Supplementary Material online). (D) Population structure inferred by
ADMIXTURE analysis of the autosomal data at K 5 7. Each individual is represented by a vertical (100%) stacked column of ancestry
probabilities in the seven constructed ancestral populations. Populations introduced in this study and analyzed together with data from Li et al.
(2008), Behar et al. (2010) and Rasmussen et al. (2010) are labeled in red. Language families of the Caucasus populations are also denoted: AA,
Abkhazian-Adyghe; ND, Nakh-Dagestanian; KV, Kartvelian; IE, Indo-European; TU, Turkic.
361
Two hundred and four samples from this study were genotyped with the Illumina 610 K single nucleotide polymorphism (SNP) array and used for whole-genome analysis
together with 929 samples from the literature (Li et al.
2008; Behar et al. 2010; Rasmussen et al. 2010) (supplementary tableS1, Supplementary Material online).
MBE
latter to Syrians, Lebanese, Jordanians and further southward contrasts with the sharp border between the Caucasians and the Slavic, Finno-Ugric and Turkic-speaking
populations of the East European Plain. However, the same
heatmap plot reveals a range of low genetic distances starting from the populations of the Levant and Syria, extending
to the East European Plain. Importantly, this series proceeds
around the western flank of the Black Sea: from Turks to
Bulgarians to Romanians to Ukrainians to Russians, ending
with Mordvins, but does not extend through the Caucasus.
We suggest that this similarity gradient reflects most likely
a combination of several ancient human migrations, perhaps
also including elements of the spread of the Neolithic. A
previous study based on eight Alu insertion loci found the
Caucasus populations to exhibit high levels of betweenpopulation differentiation, with an average FST of 0.113,
a value which is almost as large as the FST of 0.157 for
worldwide populations (Nasidze et al. 2001). Our results,
based on genome-wide data, reveal instead that the populations of the Caucasus region show between population
differentiation (average FST 5 0.004) that is slightly lower
than that for the Near East (0.006) and Europe (0.006) and
are thus more consistent with the results of Bulayeva et al.
(2003).
Similarly to our FST analysis, PCA of the autosomal data
shows the within-region clustering of the Caucasian populations and reveals a noticeable gap between the Caucasus
and the East European Plain, contrasting the continuous
transition from the Near/Middle East to the Caucasus
(fig. 1C; supplementary fig. S5 and S6, Supplementary Material online). Geography rather than language-based clustering can also be observed in the PC analysis of Y
chromosome data (supplementary fig. S7, Supplementary
Material online). Indeed, we see a clustering of Georgians,
who belong to the Kartvelian language family, and Indo-Europeanspeaking Armenians from the South Caucasus, together with the peoples of the NW Caucasus, of which the
Karachays and Balkars speak Turkic, the Ossetians Indo-Iranian, and the remainder Abkhazian-Adyghe languages
(supplementary fig. S7, Supplementary Material online).
The Nakh and Dagestanian language speakers, however,
differ considerably both from each other and the rest of
the Caucasian populations due to the unique structure
of Y chromosome haplogroup frequencies in the NE Caucasus (supplementary note 2, Supplementary Material online), but this distinction is not supported by autosomal
or mtDNA data, consistent with the results of an earlier
study where highland Dagestanian populations were
found to have reduced Y chromosomal but not mitochondrial or autosomal genetic diversity (Marchani
et al. 2008). The Caucasian populations do not differ in
common mtDNA haplogroup frequencies to an extent
that would allow the discrimination of geographic subregions or language groups (supplementary fig. S8, Supplementary Material online).
We also analyzed the autosomal data of the Caucasus
and reference populations (Li et al. 2008; Behar et al.
2010; Rasmussen et al. 2010) using a structure-like
mtDNA Analyses
MBE
MBE
MBE
Supplementary Material
Supplementary figures S1S9, tables S1S6, and notes 1
and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
We thank the individuals who provided DNA samples for
this study, and Mari Nelis, Georgi Hudjashov, and Viljo
Soo for conducting the autosomal genotyping. This work
was supported by the European Commission, DirectorateGeneral for Research (FP7 Ecogene grant number 205419
to R.V. and D.M.B.); the European Union Regional Development Fund through the Centre of Excellence in Genomics to
364
References
Adams JM, Faure H. 1997. Preliminary vegetation maps of the world
since the last glacial maximum: an aid to archaeological
understanding. J Archaeol Sci. 24:623647.
Adler DS, Bar-Yosef O, Belfer-Cohen A, Tushabramishvili N,
Boaretto E, Mercier N, Valladas H, Rink WJ. 2008. Dating the
demise: neandertal extinction and the establishment of modern
humans in the southern Caucasus. J Hum Evol. 55:817833.
Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation
of ancestry in unrelated individuals. Genome Res. 19:16551664.
Badyukova EN. 2007. Age of Khvalynian transgressions in the
Caspian Sea region. Oceanology 47:400405.
Balanovsky O, Dibirova K, Dybo A, et al. (23 co-authors). 2011.
Parallel evolution of genes and languages in the Caucasus region.
Mol Biol Evol. 28:29052920.
Balanovsky O, Rootsi S, Pshenichnov A, et al. (11 co-authors). 2008.
Two sources of the Russian patrilineal heritage in their Eurasian
context. Am J Hum Genet. 82:236250.
Banerjee S. 2005. On geodetic distance computations in spatial
modeling. Biometrics 61:617625.
Bar-Yosef O, Belfer-Cohen A, Adler DS. 2006. The implications of the
Middle-Upper Paleolithic chronological boundary in the Caucasus to Eurasian prehistory. Anthropologie XLIV:4960.
Behar DM, Yunusbayev B, Metspalu M, et al. (21 co-authors). 2010. The
genome-wide structure of the Jewish people. Nature 466:238242.
Bermisheva MA, Kutuev I, Korshunova T, Dubova NA, Villems R,
Khusnutdinova EK. 2004. Phylogeografic analysis of mitochondrial
DNA in the Nogays: the high level of mixture of maternal lineages
from Eastern and Western Eurasia. Mol Biol (Mosk). 38:617624.
Bulayeva K, Jorde LB, Ostler C, Watkins S, Bulayev O, Harpending H.
2003. Genetics and population history of Caucasus populations.
Hum Biol. 75:837853.
Cann HM, de Toma C, Cazes L, et al. (41 co-authors). 2002. A
human genome diversity cell line panel. Science 296:261262.
Comrie B. 2008. Linguistic diversity in the Caucasus. Annu Rev
Anthropol. 37:131143.
Goslee SC, Urban DL. 2007. The ecodist package for dissimilaritybased analysis of ecological data. J Stat Softw. 22:119.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL,
Hammer MF. 2008. New binary polymorphisms reshape and
increase resolution of the human Y chromosomal haplogroup
tree. Genome Res. 18:830838.
Kolga M, Tonurist I, Vaba L, Viikberg J. 2001. The red book of the
peoples of the Russian Empire. Tallinn (Estonia): NGO Red Book.
Krause J, Briggs AW, Kircher M, Maricic T, Zwyns N, Derevianko A,
Paabo S. 2010. A complete mtDNA genome of an early modern
human from Kostenki, Russia. Curr Biol. 20:231236.
Conclusions
365
MBE