Deep learning for the functional classification of carbohydrate-active enzymes – DEFINE
A myriad of carbohydrate-active enzyme (CAZyme) sequences accumulating in our databases from different ecosystems have no identified function. Their functional classification is the critical bottleneck for their understanding, for our monitoring of ecosystem health, and for biotechnological advances. We will design a Deep Learning (DL) architecture, DEFINE, capable of classifying sets of enzyme sequences by function and discovering the existence of new functions and functional subclasses. We shall take advantage of the huge amounts of sequences present in our databases, the last generation of protein language models (pLMs) and the power of unsupervised learning, the recent inhouse approach ProfileView devoted to domain functional classification, the experience and robustness of CAZy annotation for a paramount quality of training and testing, the possibility of experimentally testing catalytic activity for a number of enzyme subfamilies, and validating their functional determinants by the determination of crystal structures. The method should allow 1. to infer a function on sequences sharing similar motifs by transferring functional labels from the few sequences where the function is already characterized, and 2. to discover the existence of new functions to test based on new sequence motifs. The large amounts of sequences considered in this project and their classification will allow the creation of a Deep-CAZy database, gathering data dedicated to experimentation in addition to the CAZy database. DEFINE will provide the proof of concept for the construction of a generic DL model applicable at large-scale to all protein families. The complementarity of the consortium based on functional classification and DL, the biology of CAZymes, biochemistry and crystallography will guarantee the success of the project.
Project coordination
Alessandra Carbone (Sorbonne Université)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
CQB Sorbonne Université
AFMB Université Aix-Marseille
CQB Sorbonne Université
Help of the ANR 486,365 euros
Beginning and duration of the scientific project:
December 2024
- 36 Months