Improved statistical approaches for the analysis of biodiversity using genetic and spatial data – GenoSpace
It is stating the obvious that we live on a planet consisting in continuous landscapes. Yet, there
exists genuine barriers to developing sound statistical models that accommodate for continuous
spatial information and genetic data in a satisfactory way. In fact, dominant spatial models in
population genetics rely on the crude assumption that populations are divided in discrete
demes. Other approaches make predictions about the spatial distribution of individuals that are
generally not supported by biological evidence. Current limitations in the models and the inference
techniques available hamper our understanding of biodiversity in space and time. They are thus the
main focus of our project.
Recent advances in theoretical population genetics have produced a new model, the spatial
Lambda-Fleming-Viot model, that alleviates the limitations of current methods. This model considers
the habitat as a truly continuous area and allows for a stationary distribution of individuals in
time and space. A straightforward probabilistic description of the ancestral locations and
genealogical relationships between sampled individuals is also available, thereby defining a simple
way to calculate the likelihood of this model (the probability of the data given the model
parameters). Yet, this likelihood involves a lot of latent variables, i.e., parameters that are not
of utmost biological interest but are mandatory in order to proceed with the evaluation of the
function of interest. It is therefore not clear whether the spatial Lambda-Fleming-Viot model is
amenable to parameter inference.
We have implemented and tested a prototype of a Bayesian sampler that estimates the posterior
distribution of this model parameters from the analysis of geo-referenced genetic data. Preliminary
results indicate that, when harnessed to state of-the-art statistical inference techniques, this new
model indeed provides accurate estimates of the population densities and the dispersal range, two
parameters that cannot be estimated separately with most traditional approaches.
These promising results suggest that the spatial Lambda-Fleming-Viot model can indeed serve as a
sound basis to tackle important biological questions. In particular, we will assess the impact of
non-homogeneous landscapes on migration of individuals in this project. We will also investigate
the detection of variability of a population density in space and during the course of
evolution. Alongside these extensions of the original model, mathematical simplifications of the
likelihood function will be examined. We have in fact identified analytical "shortcuts" that should
considerably simplify, and therefore speed up, the calculations. Extensions and improvements of the
models and inference techniques developed in this project will be applied to the analysis of large
population genomics datasets from two flagship species of considerable economic importance: the
"harlequin ladybird" and the spotted wing drosophila. We will quantify levels of gene flow and
population densities throughout their respective habitats, thereby gaining some insight into the
biology of these organisms. Software applications will be produced that implement the most relevant
approaches developed in this project. These applications will be thoroughly tested through extensive
simulations and then made available to a wide scientific audience.
Project coordination
Stephane Guindon (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
CNRS-LIRMM Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
INRA-CBGP INRA
Help of the ANR 136,163 euros
Beginning and duration of the scientific project:
October 2016
- 36 Months