CE45 - Interfaces : mathématiques, sciences du numérique – biologie, santé 2023

Machine learning for improved phylogenomic inference – DEELOGENY

Submission summary

Phylogenomics aims to analyze genomes in an evolutionary framework to reconstruct their history and understand their functions.
This field of research motivates the sequencing of thousands of genomes in all areas of the tree of life.
It relies on a series of estimation steps that are costly to run and are often based on probabilistic models.
Even when using simplistic and unrealistic models, these methods are not efficient enough to handle data sets currently generated.
Besides, each step can introduce errors that can affect subsequent steps of the pipeline, notably because the methods do not propagate the uncertainty associated to their inferences.
Deelogeny aims to rethink several key steps in the pipeline to reduce inferential errors, by implicitly integrating over intermediate objects while inferring parameters of interest.
This will be done with Neural Networks (NNs), which will achieve better efficiency than the standard pipeline.
These NNs will be trained on simulations from sophisticated probabilistic models, whose realism will be validated through comparisons with empirical data sets.
The NNs will be based on recent developments such as attention networks (e.g., transformer) to take into account dependencies between nodes of a phylogeny or between sites or sequences, and Graph Neural Networks to handle phylogenetic tree topologies.
Our new methods will be validated on simulations, and on empirical genomic data sets of large size.
The steps we will target are: the inference of phylogeny from sequences, without infering aligments; the inference of rates of diversification or of epidemic spread from alignments, without infering phylogenies; the inference of gene family histories without infering gene trees; the inference of associations between genotype and phenotype without reconstructing ancestral characters.

Project coordination

Bastien BOUSSAU (LABORATOIRE DE BIOMÉTRIE ET BIOLOGIE EVOLUTIVE)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

CIRB Centre interdisciplinaire de recherche en biologie
IP Hub de bioinformatique et biostatistique
LBBE LABORATOIRE DE BIOMÉTRIE ET BIOLOGIE EVOLUTIVE
LEHNA LABORATOIRE D'ECOLOGIE DES HYDROSYSTEMES NATURELS ANTHROPISES
IBENS Institut de biologie de l'Ecole Normale Supérieure
LCQB Laboratoire de biologie computationnelle et quantitative

Help of the ANR 908,954 euros
Beginning and duration of the scientific project: September 2023 - 60 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter