CE45 - Mathématique, informatique, automatique, traitement du signal pour répondre aux défis de la biologie et de la santé

Statistics and Machine Learning for Single Cell Genomics – SingleStatOmics

Submission summary

The ability to measure genome-wide gene expression or mutations from a biological sample made of thousands or millions of cells has revolutionized biology in the late 1990’s, allowing for example to characterize subtypes of cancers from their molecular profile or to identify comprehensive lists of genes expressed or inhibited in particular conditions. Cells within a sample are however never all the same, and measuring an average over thousands of cells may mask or even misrepresent signals of interest that vary between individual cells. Fortunately, recent technological advance in massively parallel sequencing and high-throughput cell biology technologies now give us the ability to measure, at the level of individual cells, genome-wide measurements based on DNA, RNA, chromatin states or proteins. The use of these techniques, which we collectively refer to as single-cell genomics, allows us to study cell-to-cell variability within a biological sample and investigate new questions out of reach for classical bulk genomics. For example, intra-tissue heterogeneity is now clearly established in many cell types including T cells, lung cells, or myeloid progenitors. The construction of a comprehensive atlas of human cell types is now within our reach. Cell-to-cell variability is also central in many biological processes such as gene regulation or cell differentiation, as it reflects the intrinsic stochastic molecular processes and provides information on the underlying molecular networks. This variability has been shown to play an important functional role in the cell decision-making process and beyond. Consequently, the measurement of gene expression in single cells has the promise of revolutionizing our understanding of gene regulation and resolving many longstanding debates in biology. Besides technological aspects, single-cell genomics raises new mathematical and computational challenges. The nature of data produced by single-cell genomics techniques, as well as the questions we need to answer, differ indeed a lot from standard bulk genomics. For example, due to the extremely small amount of biological material present in a single cell, it is common to have 90% of missing values in a single-cell experiment, and the observed values can themselves be strongly distorted by particular experimental artifacts, calling for new statistical modelling of these data. In addition, the quantity of cells that are investigated simultaneously by the latest (and future) single-cell technologies goes easily in the millions, orders of magnitude more than the number of samples in standard bulk genomics, raising new computational challenges for scalability. Finally, new biological questions are raised, such as modelling a differentiation process or integrating genetic and epigenetic data at the single-cell level, which calls for new mathematical models and algorithms. In short, new dedicated analytical tools are crucially needed to unleash the full power of single cell genomics. The goal of this project is to attack some of these pressing challenges, by developing new mathematical models and computational tools for three biological problems: (i) investigating sample heterogeneity and cell identity, (ii) modelling the dynamics of cell differentiation and gene regulation, and (iii) exploring single cell epigenomics. For that purpose, we have gathered a consortium with a unique combined experience in high dimensional statistics, machine learning, bioinformatics, computational and systems biology, and an extended network of collaborators on single-cell genomics in France and abroad.

Project coordination

Franck Picard (Laboratoire biologie et modélisation de la cellule)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LBMC UMR 5239 Laboratoire biologie et modélisation de la cellule
LBMC LABORATOIRE DE BIOLOGIE ET MODELISATION DE LA CELLULE
LBBE Laboratoire de biométrie et biologie évolutive
Mathématiques et Informatique Appliquées

Help of the ANR 597,435 euros
Beginning and duration of the scientific project: February 2019 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter