Blanc SVSE 7 - Blanc - SVSE 7 - Biodiversité, évolution, écologie et agronomie

Inference of demographic history from large DNA polymorphism data. – demochips

Submission summary

Population genetics methods allow researchers to infer historical events in human and non-human populations, at time scales for which historical records provide no information. Coalescent-based methods have been developed to infer these events. These methods have been successfully applied to many populations, using classical population genetics markers (e.g. microsatellites, DNA sequences). They have allowed us for instance to determine whether populations have undergone events of growth or decline, of migration between surrounding populations, and if some populations result from admixture events between two or more populations. The parameters of these demographic phenomena (e.g. growth rates, migration rates, ancestral population sizes, admixture rates) could be estimated to some extent. The amount of data available on DNA polymorphism is increasing by several orders of magnitude through the recent development of new kind of polymorphism datasets: DNA chips datasets with several hundred of thousands or even a few millions of single nucleotide polymorphisms (SNPs) and full genome sequences. Some of the SNPs are in coding or regulatory regions and may thus be submitted to selection, but others are outside these regions and can thus be used for demographic processes inference. This strong increase in the amount of available data may lead to the logical conclusion that demographic events could be inferred much more precisely thanks to these new datasets. Based on the existing methods, the main problem is to develop new algorithms adapted to such data, as they differ from classical data both by the amount of available polymorphism and also by the occurrence in these datasets of many linked loci, which offers the possibility to use the level of linkage disequilibrium inside the estimation process. The aim of this study is to develop new coalescent-based approaches (ABC and MCMC) for these new data sets and to apply them to human and Drosophila melanogaster polymorphism datasets. The first step will be to develop a simulation program that will be able to generate such large datasets. In a second step, the simulation program will be then used directly to develop ABC methods, but also as a mean to test the validity of the different methods. For the MCMC method, we will focus on how to optimize these methods for large data sets and if a strategy of optimal sub-sampling can be designed to keep a reasonable computing time. In a third step, we will apply these methods to real data on human and Drosophila populations. Regarding humans, the first question will be whether we can infer different demographic history for populations that have been submitted to different lifestyles, namely agriculturalists, herders and hunter-gatherers. In particular do these differences in lifestyle influence their expansion rate? The second question will be whether we can infer the history of migration and admixture of populations in Central Asia. Are these populations the results of admixture events between the neighbouring European and Asian populations, or conversely are they one of the first areas colonised after the emergence of modern humans out of Africa, areas from which other Eurasian area were subsequently colonised? Finally, we will also investigate the possibility to infer a recombination map along the genome in the different population taking into account their demographic history. Regarding D. melanogaster, we will investigate its demographic history in Africa exploiting the data produced by the DPGP project. Two main issue will be tackled, namely the timing and mode of expansion in Africa (particularly the proposed division between East and West African populations) and the time of the out-of–Africa.

Project coordinator

Monsieur Frederic Austerlitz (Laboratoire Eco-Anthropologie et Ethnobiologie) – austerlitz@mnhn.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

SAE - UMR 7138 UMR Systematique, Adaptation, Evolution
INSERM INSERM
GEH - URA 3012 Institut Pasteur-Unité de Génétique Evolutive Humaine, /CNRS-URA 3012
LBIP - UMR 7205 Laboratoire de Biologie Intégrative des Populations
EAE - UMR 7206 Laboratoire Eco-Anthropologie et Ethnobiologie

Help of the ANR 259,437 euros
Beginning and duration of the scientific project: December 2012 - 36 Months

Useful links