Integrated Sequencing and Structural Analysis of RNA Probing Experiments – INSSANE
The structure of RNA molecules and their complexes are crucial for understanding biology. Notorious examples of large RNAs include the genomes of RNA viruses (Influenza, HIV, Chikungunya, SARS-CoV2...), whose lengths exceed the current capabilities of predictive computational methods, as well as high-res experimental structural techniques.
In the INSSANE project, we will develop integrated experimental protocols, together with efficient computational methods for the structural modeling of large RNAs. We will accurately probe and predict the genomic RNA architectures of, bio-medically relevant, viruses. The scope of applicability of our methodologies in bioinformatics will extend beyond viruses, and could be used to model the structure of other large RNAs (lncRNAS, Introns). Towards that goal, we will introduce a novel protocol, named SHAPE-Cut, to streamline the probing of large RNAs. SHAPE-Cut will measure position-specific solvent accessibility by combining novel chemistry and long-read sequencing. In comparison to existing protocols, we expect SHAPE-Cut to avoid typical biases, be easier to implement, and provide increased accuracy, when coupled with specific data analyses and computational methods. We will combine the complementary data of crosslinking and probing experiments: the former reveals long-range interactions, while the latter, through accessibility profiles, has been shown to greatly improve the prediction of local structures. We will implement a recent crosslinking protocol and use its data in index-based genome-wide search of thermodynamically stable RNA-RNA interactions. Then, we devise an integrative structure prediction method that combines SHAPE reactivity, long-range interactions, homology, and thermodynamic stability. Finally, a novel visualization tool will represent genome-scale RNAs and streamline the interdisciplinary dialogue.
Algorithmic hurdles will be overcome to improve the processing of sequencing data produced by RNA structure-targeting experiments. All modern RNA probing protocols are based on sequencing technologies, and reveal structural information indirectly, through an alteration that is observable at the RNA sequence level (mutations, stops/cut). However, the crucial mapping of primary sequencing data has received relatively scarce attention in the context of probing techniques, despite specific challenges (chimeric reads, informative errors/stops) having been identified at the root of biases and technical artifacts. We will tailor mapping to our protocols, and develop data structures and indexing techniques to fully exploit sequencing data to its fullest extent. We will also inform mapping by predicted accessibility, e.g. to disambiguate the mapping of erroneous (but probably informative) reads. Beyond increasing mappability, we will deconvolute isoforms/subgenomes, which are known to occur in viral genomes. Our final integrative structure modeling method will consider evolutionary information, and will be formulated as a Maximum-Independent-Set (MIS) graph problem for a conflict graph including both alternative local structure and long-range interactions. We will implement a Fixed Parameter Tractable algorithm based on the treewidth to produce a model with maximal support and thermodynamic stability.
By including experts in bioinformatics of RNA structure, sequence analysis, biochemistry, and organic chemistry, our consortium is uniquely positioned to address the timely challenges tackled in the project. Its implementation requires a combination of expertise from traditionally distinct areas of bioinformatics, namely combinatorial structure prediction and high-throughput sequencing analysis. Its synergies will build on existing pairwise collaborations and will streamline the communication between partners representing complementary perspectives on RNA as an object of study.
Project coordination
Sebastian Will (Laboratoire d'Informatique de l'Ecole Polytechnique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LCBPT Laboratoire de Chimie et Biochimie Pharmacologiques et Toxicologiques
CiTCoM Cibles Thérapeutiques et Conception de Médicaments
LIX Laboratoire d'Informatique de l'Ecole Polytechnique
CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille
Help of the ANR 429,623 euros
Beginning and duration of the scientific project:
- 48 Months