CE45 - Mathématiques et sciences du numérique pour la biologie et la santé

Multi-scale and multi-resolution bio-molecular structure determination by geometric approaches – multiBioStruct

Submission summary

This project is in the context of Structural Biology, where Distance Geometry (DG) has been proved to be a valid tool for the analysis and the determination of biological structures, such as proteins. The classical application of DG arises in the framework of Nuclear Magnetic Resonance (NMR) experiments, where distances between atom pairs are estimated by the experimental technique, and suitable three-dimensional conformations of the corresponding molecule need to be identified. This problem is NP-hard and was historically tackled via the use of heuristic and meta-heuristic methods; since some years, several partners of the present project are working on a discretization approach for DG which allows to employ a branch-and-prune (BP) algorithm for the identification of three-dimensional conformations. One strong point of this discretization approach is that the DG solution set can be potentially exhaustively enumerated, providing in this way all possible three-dimensional protein conformations fitting with the experimental data. The main idea in this project is to enhance the robustness of such an approach for efficiently dealing with uncertain data, and to extend its domain of applicability to genomics data and Hi-C data.

This project is organized in 4 workpackages (WPs). WP1 and WP2 focus on methodologies, while WP3 and WP4 are related to applications. In particular, the main goal of WP1 is to define features which, given chemical and NMR information about a protein, make it possible to predict a sufficiently accurate distance information to correctly describe the secondary structures of the protein. The main goal of WP2 is to design an error-tolerant BP algorithm, which is in particular able to deal with uncertain data. The aim of WP3 is to exploit the results of WP1 and WP2 in order to find the three-dimensional structure of disordered proteins by using NMR chemical shifts only, while WP4 is expected to apply the results of WP1 and WP2 to genomics and Hi-C data.

The principal investigator of the present project has a long experience on the distance geometry and on some of its different applications. His initial works on the topic date back to about 10 years ago, when he was postdoc researcher at LIX (Ecole Polytechnique) under the supervision of Leo Liberti. At that time, the collaboration with scientists of the Institut Pasteur began, and in particular with Thérèse Malliavin. Since then, the main application on which we have been focusing our attention is the one concerning protein conformations. The collaboration between Antonio Mucherino and Jung-Hsin Lin is much more recent, but it became more active in the last period thanks to a CNRS PRC project over the years 2018 and 2019, which is allowing the two partners to meet regularly and to have a fast progress over the initial ideas for a collaboration.

The consortium gathers scientists from different disciplines and with different backgrounds, located in France and in Taiwan. No team in the consortium has expertise similar to another, so that all teams are actually indispensable for this project. Each partner will recruit a temporary research fellow to be employed full-time on the different WPs of the present project. Other requested costs are related to the organization of regular meetings among the partners (in France or Taiwan), and to the participation to conferences, where we plan to publish our initial results.

The excellent results obtained in the context of DG with NMR data is our strong motivation to propose the present project. If similar results will be obtained at the end of this project for disordered proteins, as well as for genomics and Hi-C data, then we will be able to deliver to the scientific community a robust tool that will be of crucial importance in the field of biotechnology.

Project coordinator

Monsieur Antonio Mucherino (Institut de Recherche en Informatique et Systèmes Aléatoires)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


IRISA Institut de Recherche en Informatique et Systèmes Aléatoires
LIX Laboratoire d'Informatique de l'Ecole Polytechnique
RCAS Academia Sinica / Research Center for Applied Sciences
GRC Genomics Research Center of Academia Sinica

Help of the ANR 361,800 euros
Beginning and duration of the scientific project: December 2019 - 48 Months

Useful links