ChairesIA_2019_2 - Chaires de recherche et d'enseignement en Intelligence Artificielle - vague 2 de l'édition 2019

Bayesian learning of expensive models, with applications to cell biology – Baccarat

AI chair «Baccarat«

Bayesian learning of expensive models

Monte Carlo methods with fast rates

Biologists develop intricate models of cells, ecologists model the dynamics of ecosystems at a world scale. A single evaluation of such complex models takes minutes or hours on today’s hardware and fitting probabilistic models to biological data can require millions of serial evaluations. Monte Carlo methods, for example, are ubiquitous in statistical inference for scientific data, but they scale poorly with the number of model evaluations. Meanwhile, the use of parallel computing architectures for Monte Carlo is often limited to running independent copies of the same algorithm. The aim of Baccarat is to provide Monte Carlo methods that unlock inference for expensive-to-evaluate models in biology by directly addressing the slow rate of convergence and the parallelization of Monte Carlo methods.

Monte Carlo methods are randomized numerical quadratures, i. e. random sets of weighted points, where each point (or node) corresponds to a given value of the parameters of a biological model. Our workhorse is to design numerical quadratures with repulsive nodes. For instance, determinantal point processes, a prototypal repulsive distribution introduced in physics, improve the Monte Carlo convergence rate, just like electrons lead to low-variance estimation of volumes by efficiently filling a box. Such results lead to open computational and statistical challenges. We propose to solve these challenges, and make repulsive processes a novel tool for applied statisticians, signal processers, and machine learners.

Here are some of our results:
* We have proved that determinantal point processes (DPPs) give numerical integration algorithms with an optimal convergence rate in reproducing kernel Hilbert spaces.
* We have given an asymptotic statistical test of hyperuniformity, the property of a point process that makes it a fast Monte Carlo algorithm.
* We have shown that the combination of a classical and a quantum computer allows to sample some DPPs faster than using only classical computers.

Perpectives are numerous, e.g.:
* there is still a gap between the processes that efficiently solve a nuemrical integration task on paper and the processes that we can efficiently sample on a (classical or quantum) computer.
* sampling a DPP on a quantum computer is a task that can be optimized in a variety of ways. This task is exciting, since it is possible that, if and once (large) quantum computers become easily accessible, DPPs shall become a standard distribution for every data scientist.

Our production is available as papers in scientific journals, as well as software packages. See
rbardenet.github.io

Expensive computer simulations have become routine in the experimental sciences. Astrophysicists design complex models of the evolution of galaxies, biologists develop intricate models of cells, ecologists model the dynamics of ecosystems at a world scale. A single evaluation of such complex models takes minutes or hours on today's hardware. On the other hand, fitting these models to data can require millions of serial evaluations. Monte Carlo methods, for example, are ubiquitous in statistical inference for scientific data, but they scale poorly with the number of model evaluations. Meanwhile, the use of parallel computing architectures for Monte Carlo is often limited to running independent copies of the same algorithm. Baccarat will provide Monte Carlo methods that unlock inference for expensive models in biology by directly addressing the slow rate of convergence and the parallelization of Monte Carlo methods.

The key to take down the Monte Carlo rate is to introduce repulsiveness between the quadrature nodes. For instance, we recently proved that determinantal point processes, a prototypal repulsive distribution introduced in physics, improve the Monte Carlo convergence rate, just like electrons lead to low-variance estimation of volumes by efficiently filling a box. Such results lead to open computational and statistical challenges. We propose to solve these challenges, and make repulsive processes a novel tool for applied statisticians, signal processers, and machine learners.

Still with repulsiveness as a hammer, we will design the first parallel Markov chain Monte Carlo algorithms that are qualitatively different from running independent copies of known algorithms, i.e., that explicitly improve the order of convergence of the single-machine algorithm. To this end, we will turn mathematical tools such as repulsive particle systems and non-colliding processes into computationally cheap, communication-efficient Monte Carlo schemes with fast convergence.

Project coordination

Rémi Bardenet (Centre de Recherche en Informatique, Signal et Automatique de Lille)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille

Help of the ANR 433,620 euros
Beginning and duration of the scientific project: April 2020 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter