CE45 - Interfaces : mathématiques, sciences du numérique – biologie, santé 2025

Pangenome Graphs for AI-driven Microbial genome eXploration – PanGAIMiX

Submission summary

The PanGAIMiX project, dedicated to computational microbiology, is structured in four innovative research axes.
Firstly, PanGAIMiX will develop advanced models for pangenome graphs, enabling genome comparisons to be extended from the species to the genus level.
Secondly, PanGAIMiX will exploit methods based on graph neural networks to identify conserved genomic contexts across thousands of pangenomes. This approach should make it possible to delineate shared functional modules by exploiting convolution patterns arising from evolutionary constraints on genes.
The third axis will use large language models to predict biological processes, such as metabolic pathways or defense systems. Each pangenome will be represented as a sequence of sentences, where words are functional units derived from gene families. This approach will learn the complex relationships between families across different species, making it possible to predict missing functions and new biological processes.
Finally, the project will develop PanGBank, an exhaustive pangenome database of over 40,000 microbial species, which will be made publicly accessible via a web API. It will be utilized for case studies demonstrating the relevance of methodological developments, such as the study of the spread of antibiotic resistance genes in ESKAPEE bacteria, and the discovery of novel metabolic pathways by exploring reaction modules conserved in pangenomes.
Applied to a wide diversity of species, the approaches in PanGAIMiX will reveal previously inaccessible evolutionary patterns and provide new insights into the evolution and functioning of microbial communities, fostering advances in health, environmental sciences and biotechnology.

Project coordination

Alexandra Calteau (COMMISSARIAT À L'ÉNERGIE ATOMIQUE ET AUX ÉNERGIES ALTERNATIVES)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

CEA COMMISSARIAT À L'ÉNERGIE ATOMIQUE ET AUX ÉNERGIES ALTERNATIVES
LaMME Laboratoire de Mathématiques et Modélisation d'Evry
MaIAGE Mathématiques et Informatique Appliquées du Génome à l'Environnement

Help of the ANR 609,903 euros
Beginning and duration of the scientific project: November 2025 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter