CE45 - Mathématiques et sciences du numérique pour la biologie et la santé

End-to-End Deep learning for Precision Medicine through Metagenomics and cost-sensitive data integration – DeepIntegrOmics

Submission summary

In chronic diseases such as cardiometabolic diseases (CMD), the use of intestinal microbiota as a source of patient stratification and of innovative treatment is on the rise. As a “super integrator" of the patient's condition metagenomics is poised to play a key role in precision medicine. However, there are still computational barriers to its routine use in medical services. In particular most metagenomics diagnosis approaches rely on tedious and computationally heavy projections of the sequence data against very large genomic reference catalogs (>170 Million genes for the latest one UHGP). Deep learning has revolutionized predictive analysis, improving many of the previous models involving heavy bioinformatics pipelines to perform classification or stratification tasks. Yet, very little literature exists on end-to-end deep learning of raw metagenomics data to stratify patients’ cohorts and/or predict patient phenotypes. A first scientific barrier this project addresses is to develop metagenomics-based routine “point-of-care” prognosis or diagnosis. A recurrent problem in precision medicine is to integrate different sources of omics data, while controlling the cost/benefit balance of exams, in order to evaluate the usefulness of requesting more exams is critical to their routine use. Although CMD, in particular ischemic heart disease (IHD) and stroke, are the leading cause of global mortality and a major contributor to disability, current patient stratification is insufficient and integrated molecular signatures that inform on the evolution of CMD stages are missing. In this context the DeepIntegrOmics project main scientific goal is to significantly improve DL-based methodological frameworks using multi-Omics data for Precision Medicine in two main directions : first to support both reliable end-to-end prediction from metagenomics raw-data and second to improve classification accuracy and stratification by integrating other omics data. Two more applied objectives are to propose novel approaches for multi-omics biomarker identification of cardiometabolic disease stages and propose means of patient stratification through the interpretation of these neural network architectures. This study will be performed on a unique phenotypic database of 1844 patients (one of the largest existing datasets from the EU H2020 MetaCardis project) for which metagenomic, clinical and three types of metabolomic data are available. We will evaluate the classification performance of the DL integration architecture to predict the eight CMD groups (including control) to which the 1844 patients belong. We will assess the prognostic value of the stratification to predict CMD progression for 807 patients from the 1844 for whom we have characterized their evolution (clinical changes) during 10 years. Altogether, these objectives will support translational and precision medicine (i.e. classification and novel stratification of patients) in the perspective of deploying these models for routine use in clinical centers. From a translational perspective, the expected results in both stratification of patients in MetaCardis, biomarkers signatures and the ability to predict transition in disease progression are key outcomes that could help improve the management of patients with cardiometabolic diseases (CMD). From a methodological perspective, the expected result is both a DL architecture for cost-sensitive data integration and open sourced embeddings to perform multi-omics classification. In terms of impact, the classification based on the new gut microbiota-derived markers “omics” could generate new therapeutic targets. We also expect an Impact on patient management and the patients themselves. The consortium, led by an experienced researcher, has very strong mathematical, bioinformatics, AI, and clinical skills, as well as a long-standing intensive collaboration and recent work on proofs of concept of the proposed approaches.

Project coordination

Jean-Daniel Zucker (Unité de modélisation mathématique et informatique des systèmes complexes)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

UMMISCO Unité de modélisation mathématique et informatique des systèmes complexes
IBISC Informatique, BioInformatique, Systèmes Complexes
NUTRIOMICS NUTRITION ET OBESITES : APPROCHES SYSTEMIQUES (NUTRIOMIQUE)
LAMSADE Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Help of the ANR 621,005 euros
Beginning and duration of the scientific project: - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter