Restricted Boltzmann Machines for Modelling Physical Systems: Theory and Applications to Proteins – RBMPro
First-principle approaches to the modelling of complex physical systems, i.e. with strong and heterogeneous interactions, have often produced limited progress so far. An example is provided by proteins: despite intensive studies, accurate models capable of predicting how proteins fold, interact with other molecules, or change properties upon modifications of one or more of their amino acids (the so-called mutational landscape) are still out-of-reach, except for short proteins. It is therefore tempting to use machine learning, the science of automatic extraction of information from data, to help model complex systems. In the present project, we will concentrate on Restricted Boltzmann Machines (RBM), a fundamental architecture in unsupervised machine learning. In the simplest formulation, a RBM is a Boltzmann machine on a bipartite graph, with a visible layer that represents the data, connected to a hidden layer meant to extract and explain the statistical features of the data.
The objectives of RBMPro are two-fold and complementary: (1) exploit and push forward the conceptual and technical wealth of statistical physics to understand how RBM work and learn from data; (2) use and extend the technical wealth of molecular biology and screening technologies to generate the high-throughput and quantitative data necessary to apply and evaluate RBMs in the challenging problem of protein sequence to function relation, using the trypsin enzyme as a model system. Our project is therefore theoretical, experimental and computational. We expect that it will bring great benefits to the modelling of the structural and functional properties of proteins, and will help turn machine learning, whose importance is growing at an impressive pace, into a controlled and practical tool to model complex physical systems.
RBMPro will be led by R. Monasson, Director of Research at CNRS, hosted by the Laboratory of Theoretical Physics at Ecole Normale Superieure (ENS). R.M. is a specialist of the statistical physics of disordered systems, and of its interdisciplinary applications, in particular to computer science, machine learning, and to biological systems (genomics, neuroscience). The realization of the project will be shared between the theory team in the Physics Department at ENS (S. Cocco and R.M.) and the experimental/computational team (C. Nizak and O. Rivoire) at College de France (CdF), who has recently acquired equipment for state-of-the-art microfluidics techniques and large-scale mutagenesis experiments. S.C. develops statistical physics models and inference methods for biological data analysis, in particular for protein science and neuroscience; C.N. develops and performs high-throughput screening by phage display and droplet microfluidics combined with high-throughput sequencing; O.R. studies molecular evolution by combining statistical analyses of sequence data, theoretical models and in vitro evolution.
Monsieur remi MONASSON (Lab. Theoretical & Statistical Physics ENS - UMR 8549-8550)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
CIRB CNRS UMR7241 - INSERM U1050 Equipe "Biologie statistique" (Centre interdisciplinaire de recherche en biologie)
LPT-LPS-ENS Lab. Theoretical & Statistical Physics ENS - UMR 8549-8550
Help of the ANR 372,600 euros
Beginning and duration of the scientific project: November 2017 - 48 Months