CE45 - Mathématiques et sciences du numérique pour la biologie et la santé 2020

Protein in silico assessment – PISA

Submission summary

Protein engineering aims at creating artificial proteins useful for health, green chemistry and environmental applications. Artificial protein design is a long and costly process combining computational simulations and predictions with experimental validations in laboratories. Computational protein design methods aim at finding a sequence of amino acids compatible with a target tri-dimensional protein structure. Experimental synthesis of computationally predicted protein sequences often fails, due to approximations in protein models and energy functions. In this context, efficient in silico protein design assessment tools are mandatory in order to maximize the success rate of experimental synthesis. Upon these tools, forward folding represents the most stringent test. Forward folding refers to using a protein structure prediction method in order to check whether a designed protein sequence actually folds to its target protein structure. Obviously, the reliability of this test depends on the efficiency and properties of the protein structure prediction method employed. Many natural protein sequences fold into similar protein structures. It is well-known that protein sequences sharing more than 30%
of their sequences usually fold the same way. However, it is also well-known that a simple artificial mutation may destabilize the protein and disrupt the folding process. An artificial protein sequence is likely to contain such destabilizing mutations, due to inherent inaccuracies of the modeling process. Thus, protein structure prediction methods relying on sequence homology are not suitable for forward folding tasks: such methods would identify a structure template from the similarity between the designed and the natural protein sequences, and would use that template as a starting point for structure modeling. Since most of the designed protein sequences share more than 30% sequence similarity with their natural counterpart, any structure prediction method exploiting global sequence homology will achieve high false positive rates when used as forward folding methods for in silico protein design assessment. The fragment-based protein structure prediction approach does not use global sequence information, but rather focus on local interactions through the assembly of small protein fragments taken from known structures. Consequently, this method is particuliarly well-suited for in silico protein design assessment through forward folding.

The objective of the PISA project is to improve the reliability of computational protein design methods through protein in silico assessment forward folding techniques. We propose to deal with this challenge using an hybrid artificial intelligence approach combining deep artificial neural networks and estimation of distribution algorithms. More specifically, a recurrent deep neural network architecture will be designed to construct protein fragment libraries. Then, an iterative estimation of distribution algorithm minimizing the latest Rosetta energy function will be developed for protein fragment assembly. The estimation of distribution will be performed on the fragment libraries, and will be done by estimating the parameters of a random Markov field representing interactions between fragments. This approach will be used and validated in the frame of computational protein design projects involving the research team and external collaborators.

David Simoncini (Institut de Recherche en Informatique de Toulouse)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

IRIT Institut de Recherche en Informatique de Toulouse

Help of the ANR 171,180 euros
Beginning and duration of the scientific project: March 2021 - 30 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.