Propose innovative bioinformatics methods to identify distant homologues and structurally varied enzymes <br /> <br />In an international movement of energy transition, catalysis, and more particularly biocatalysis which uses enzymes as catalysts, meets the needs of a more sustainable chemistry. Some enzymes named amine dehydrogenases (AmDHs) are one of the green friendly alternative to access one of the key entities of the chemical industries: amines.
For biocatalysis to be a more applied alternative to conventional chemistry, it is essential to provide various templates in terms of sequences and structures. Even if protein engineering is a powerful method for evolving enzymes according to performance criteria (stability, substrate spectrum, etc.), these mutants do not afford the diversity essential to access the full potential of biocatalysis. The current boom in the use of genomic data from the exploration of microbial communities provides a gigantic resource of potential biocatalysts. Promoting bioinformatics approaches to efficiently identify the targeted enzyme is a major challenge. <br />The aim of the MODAMDH project is to access distant homologues and AmDHs with varied structures, displaying extended characteristics, through innovative screening of biodiversity. The expected results will also allow a boom in bioinformatics research methods within the biocatalysis community.
MODAMDH is an innovative project combining bio-informatics, chemo-informatics and biocatalysis to identify diverse AmDHs among biodiversity. It aims to combine methodologies only scarcely used for biocatalytic purposes. Native AmDHs are searched both by sequence-driven analysis approaches using distant homology and by 3D structure-guided approaches. To enlarge the catalogue of AmDHs, the biodiversity is screened using not only the common UniProtKB database, but all the publicly available genomic and metagenomics data resources, in addition to the Genoscope ones. The different steps are :
- definition of the reference AmDH family, clusterization, generation of hidden Markov models (HMM) and of a catalophore (minimal active site topology) for each subgroup
- definition of the NAD(P)-dependent enzyme pool to be screened, clusterization, generation of HMM and 3D-enzyme model for each family
- selection of distant homologues based on HMM/HMM search between NAD(P)-dependent enzyme families and reference AmDHs ones
- identification of new structurally different AmDHs families from the screening of the reference catalophores within the 3D-enzyme models of NAD(P)-dependent enzyme families
- production and in vitro tests of selected enzymes
- iterative approach using new experimentally validated AmDHs and newly resolved structures
After a year of work, the current results are:
- recovery of data from the various available genomic resources: UniProtKB (SwissProt / TrEMBL), GEM (Genomes from Earth's Microbiomes), OM-RGC (Ocean Microbial Reference Gene Catalog), MetDB (Marine Eukaryotes Transcriptomes), SMAGs (MetaAssembled Genomes of eukaryotic metagenomic data from Tara Ocean expedition), IGC (Integrated Gene Catalog of Human gut), UHGP (Unified Human Gastrointestinal Protein) and MGnify from EMBL-EBI
- definition of a reference set of NAD(P) proteins from these resources: based on the HMMs defining the superfamily SSF51735 (NAD(P)-binding Rossmann-fold domains superfamily), 17.2 M of sequences were selected according to a previously optimized score threshold
- clustering of the NAD(P) protein reference set: using the MMseqs2 tool, 322,429 subfamilies were identified and as many HMMs generated
- definition of the AmDH reference family: the HMMs (without the NAD domain signature) of the groups from the ASMC classification (Active Site Modeling and Clustering) of the AmDH family already described (Mayol et al 2019) were searched within the various genomic resources listed. 27,282 sequences correspond to the defined criteria and constitute the reference AmDH family
- Establishment of the phylogeny of this reference family: study in progress including generation of HMMs from each group
- Establishment of an ASMC classification of this family: study in progress including generation of HMMs of each group
Regarding the sequence-based analysis approach, the next steps are:
- search for HMMs of groups of the reference AmDH family (from the phylogenetic or ASMC structural classification) within the HMMs established from subfamilies of the reference set of NAD(P) proteins
Regarding the structure-based approach, the next steps considered are:
- 3D modeling of representatives of subfamilies of the reference set of NAD(P) proteins
- definition of catalophores of the groups of the reference AmDH family
- search for these catalophores within the models generated from the reference set of NAD(P) proteins
Following this steps and additional studies (docking, identification of key residues, etc.), a selection of candidate enzymes will be established. These enzymes will be produced by heterologous expression in E.coli from synthetic genes or genetic material in the case of genes from available strains. Activity tests will be carried out according to different objectives (stability, enantioselectivity, substrate spectrum, etc.). Among the experimentally validated enzymes, structural resolutions will be carried out.
publication expected end of 2021
In a current context of waste reduction, catalysis and more particularly biocatalysis, meets the needs of a more sustainable chemistry. The presence of chiral amines in many synthetic intermediates of key pharmacological, agronomic or other industrial compounds, leads in the search for biocatalytic methods of access to these molecules in enantiomerically pure forms. Amine dehydrogenases (AmDHs), catalyzing the asymmetric reductive amination of ketones by only the use of an inexpensive amine source, ammonia, and a regenerable cofactor, are one of the most promising alternative to conventional synthesis. The discovery of native AmDHs, by the scientific coordinator of MODAMDH project and her collaborators, allowed assigning the first genes to this function, thus widening this panel of enzymes previously restricted to engineered enzymes. To meet the criteria for the development of the biocatalysts in the industry, particularly their stability, activity, selectivity, specificity or substrate spectra, new enzymes have to be found. Mining genomes and metagenomes constitutes a powerful and complementary way to access novel sequences offering high diversity and original features. Strategies dealing with the sole pairwise alignment of primary sequences make it possible to discover numerous enzymes that meet the needs but generally lead to enzymes belonging to already known families. The aim of the MODAMDH project is to screen biodiversity by two more innovative approaches using distant homology and three-dimensional topology of active sites within filtered genomic and metagenomic data. This will help to widen the enzymatic frameworks catalyzing reductive amination and to access to homologues with various structures, having extended characteristics, in particular in terms of substrate spectrum or complementary stereoselectivities. The structural analysis of all experimentally validated AmDHs will make it possible to propose targets for protein engineering required to improve their use in synthesis. The expected results will also allow these research methods to flourish within the biocatalysis community, by further promoting the progress of associated bio-informatics tools.
Madame Carine Vergne (UMR 8030/GENOSCOPE/CEA)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
University of York / York Structural Biology Laboratory
UMR 8030/CEA UMR 8030/GENOSCOPE/CEA
Help of the ANR 168,680 euros
Beginning and duration of the scientific project: March 2020 - 24 Months