Parallel studies of genetic and linguistic diversities: reconstructing the biological and cultural histories of human populations

old_uid16374
titleParallel studies of genetic and linguistic diversities: reconstructing the biological and cultural histories of human populations
start_date2018/11/19
schedule14h30-16h30
onlineno
summaryReconstructing human evolutionary history in its entire complexity can largely benefit from comparative analyses of genetic and linguistic variation. Genes and languages are both transmitted from one generation to the next according to processes of descent with modification: genes are transmitted only vertically from parents to offspring according to Mendel’s laws, while languages are transmitted through much more complex processes. These processes include, among others, vertical transmission from parents to offspring, horizontal transmission among peers, and oblique transmission from adults to children who are not their own offspring including via various possible media. Parallel analyses of genetic and linguistic data can enable inferences about the evolution of language, culture, and the history of human migrations, beyond those that can be obtained when considering either source of data alone. In a seminal work, Cavalli-Sforza et al. (1988; 1992) compared trees based on genetic or linguistic distances. These authors found, as Darwin foresaw (The Origin of Species, 1871), a striking concordance between such trees in several instances. Since then, numerous studies statistically described and compared patterns of genetic and linguistic variation among human populations, and considerably helped to better understand the convergent or divergent biological and cultural evolutions at the root of human diversity. Nevertheless, these studies exhibited two major asymmetries in the nature of the genetic and the linguistic data considered. First, while population genetics studies commonly use migration, admixture, and gene-flow events to infer historical processes that have led to the currently observed genetic diversity, existing computational linguistic inference methods mostly consider linguistic evolution scenarios without linguistic borrowings and replacements. This is mainly due to the inherent complexity of linguistic models incorporating horizontal or oblique transmissions, which makes them mathematically unsolvable using exact likelihood approaches. However, such non-vertical transmission mechanisms are well known and often occur in the history of languages. Thus, incorporating such events is unquestionably crucial to further our understanding of linguistic changes over time. Second, whereas studies of genetic data regularly incorporate variability among individuals within a population, computational linguistics analyses compare entire languages or dialects, focusing on language differences among groups of individuals rather than considering variability among individual speakers of the same mutually understandable language. However, speakers of a given language evidently vary in their individual speech patterns; variation that thus occurs on an analogous scale compared to genetics’ inter-individual variation within a population. In this presentation, we will present two separate studies trying to overcome these issues. First, we will present a new flexible linguistic data simulator incorporating possible events of borrowing among language varieties. Furthermore, we coupled this new simulator with existing genetic data simulators allowing us to investigate population histories for which both genetic and linguistic data are available. Using these simulation tools with Approximate Bayesian Computations procedures, we then reconstruct, in parallel, the histories of genetic and linguistic divergences and borrowings among Central Asian Turkic and Indo-Iranian speaking populations. Second, we will present a novel parallel analysis of genetic and linguistic diversities within the Cape Verdean Kriolu speaking population. We used word frequency counting in semi-spontaneous speech (N-grams) collected among speakers of the same language (Cape Verdean Kriolu), for whom genome-wide genetic data were also generated. Using joint statistical descriptions of these genetic and linguistic data, we show that processes of genetic and linguistic admixture in Cape Verde likely followed parallel historical trajectories. This work prompts future research aiming at developing mechanistic models of linguistic transmission within a single language, which would enable us to test historical hypotheses and perform historical inferences.
responsiblesLazcano