CE02 - Milieux et biodiversité : Terre vivante


Detect unknown diversity with horizontal gene transfers

To a first approximation, all species are extinct, and those that are not are still unknown. This hidden biodiversity is or has been involved in horizontal gene transfers, just like the biodiversity that we know. <br />By detecting genes that have been horizontally acquired in genomes that we can analyze, this project proposes to detect species that are still unknown but that have been hosts of these genes at some point.

General objective and main issues raised

This ambitious project should allow, for the first time, to have access to the hidden diversity (extinct or unsampled species) of species that don't leave fossils. The objective is twofold: a better characterization of the real extant of the living world, and better taking into account this hidden diversity in a myriad of evolutionary biology studies. Indeed, considering this hidden diversity can drastically change the null hypothesis of many methods.

We use two types of methods to detect unknown lineages: the detection of horizontal gene transfers with a «reconciliation« method, and the detection of introgression using the D statistics (or ABBA BABA test).
The approach chosen is the same in both cases : we simulate evolutionary scenarios that involve extinct and unsampled lineages, and we explore our capacity to detect traces of these lineages.

We published a tool for simulating the evolution of genomes along a species phylogeny that does take into account extinct lineages.
Thanks to this tool, we showed on simulations that groups of unknown species (or ghost lineages) were detectable after a reconciliation analysis implying only extant species.
Finally, we showed that considering hidden lineages changed radically the interpretation that one can make when using ABBA-BABA tests to detect introgression. This call for a new null hypothesis for this popular method.

We keep exploring all these elements and start writing publications about those. We also wish to better understand what differs between in silico data and biological data in this context. Indeed, until now, the detection of hidden diversity using reconciliation method did not prove efficient on the biological datasets that we built and we still need to understand why.

Davín, A. A., Tricou, T., Tannier, E., de Vienne, D. M., & Szöllosi, G. J. (2020). Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages. Bioinformatics, 36(4), 1286-1288.

“To a first approximation, all species are extinct”, and most of those that are not are still unknown. A global vision of diversity and of the history of species is only possible through their detection even when fossil records are not available. Horizontal gene transfers are very common, especially in prokaryotes, and many genes detected in extant species are likely to have been acquired by horizontal transfers from extinct or unknown species.
The comparison of genes and species trees, using so-called “reconciliation” methods, allows reconstructing the evolutionary history of genes in terms of duplications, transfers and losses.
We showed with preliminary results based on simulations that reconciliation methods can also be used as a mean to detect the diversity that is extinct or still unknown by detecting the transfers originating from these “hidden” groups. This has very strong impacts. It suggests that we can have access to extinct clades in species that do not leave fossils (the majority of species on earth), that we can predict the presence of large clades that are still unknown without relying on massive blind metagenomics approaches, and that we can explore for the first time the effect that mass extinctions, known solely from the observation of eukaryote fossils, had on prokaryotic diversity.

This STHORIZ project (contraction of STory and HORIZontal) proposes to explore further this original idea. More precisely, we will (1) illustrate the importance of considering extinct diversity when working on transfers, by an intensive contradictory review and reanalysis of previously published results, (2) evaluate the ability and limits of existing HGT detection methods to detect hidden diversity, (3) use these methods to predict hidden diversity and explore macroevolutionary patterns in prokaryotes. All these tasks will require a feature-rich simulator (that takes extinctions into account) whose development is already advanced.

This STHORIZ project has the potential to generate new insights about hitherto unknown prokaryote and eukaryote diversity but also change the mentalities by giving more consideration to extinct lineages. It will undoubtedly open the door for a new conceptual framework to explore biodiversity in the near future.

Project coordinator


The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.



Help of the ANR 139,864 euros
Beginning and duration of the scientific project: September 2018 - 36 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter