Deciphering Evolutionary Constraints on RNA sequences: from Physical models to Design. – DECRyPteD
Non-coding RNAs play several fundamental roles in the cell, particularly in catalytic and regulatory processes. A fundamental question is how the sequence of an RNA molecule encodes its structure, that is, the folding in a 3D conformation, and its function, for example how it interacts with other molecules. This project aims to answer this question by developing inferred statistical models from RNA sequence data collected in different organisms. These RNAs have evolved for hundreds of millions of years from a common ancestor, while retaining their function and structure. It is therefore crucial to be able to precisely characterize the sequence diversity and to understand how it encodes the structural, functional and evolutionary constraints to which these RNAs are subjected.
As part of DECRyPTeD, we will develop innovative theoretical and computational approaches, inspired by statistical physics and unsupervised machine learning, to model and exploit massive sequence data. The models will be used to (i) predict tertiary structures (native and / or associated with allosteric changes) and transient contacts during refolding; (ii) identify the nucleotide motifs in the sequence controlling its functionalities (activity, specificity, etc.); (iii) synthesizing new RNA sequences having a given structure and specificity for a particular ligand; (iv) predicting the evolution of a population of sequences with an initial given specificity and constrained to acquire a different specificity.
This modeling approach will be validated by in silico calculations and also experimentally. From a computational point of view, we will infer models from artificial sequence distributions with structures of increasing complexity and test the ability of these models to produce new sequences with the same structural properties. From the experimental point of view, we will use (i) chemical probing methods (SHAPE, DMS) that allow to mark unpaired bases of RNA molecules (natural or synthesized from models) and therefore to determine their structure; (ii) directed evolution techniques (SELEX), which make it possible to select, among a large population of molecules, those which have sufficient affinity for a given target (ligand) and thus to evolve a family of RNA sequences to change their specificity.
This project is both fundamental and applied. It aims on the one hand to develop and evaluate quantitative modeling approaches, at the intersection of statistical physics and machine learning, which are both controlled and interpretable, and on the other hand to develop the control and design of non-coding RNA for an applied purpose. The long-term medical applications include the development of novel antibacterial drugs, genome editing technologies and cancer immunotherapy techniques.
DECRyPTeD is supported by three teams from different communities (statistical physics, computer science and bioinformatics, computational biology, biochemistry and evolutionary biology). This project will also strengthen cooperation between these disciplines through conferences and meetings promoting an interdisciplinary approach and vision of research in France.
Madame Simona COCCO (Laboratoire de physique de l'ENS)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LPENS Laboratoire de physique de l'ENS
LIX Laboratoire d'Informatique de l'Ecole Polytechnique
UPDESCARTES -UMR 8038 Cibles Thérapeutiques et Conception de Médicaments
Help of the ANR 400,227 euros
Beginning and duration of the scientific project: September 2019 - 48 Months