CE40 - Mathématiques, informatique théorique, automatique et traitement du signal

Approximate Bayesian solutions for the interpretation of large datasets and complex models – ABSint

Submission summary

While the 1990s witnessed a tremendous acceleration in the development of powerful computing tools and of associated algorithms, primarily thanks to the Monte Carlo Markov Chain (MCMC) revolution, the current era of "Big Data" and of complex parameter models underscores the limits of this paradigm. which has by now become a "traditional" approach. Such limitations can be associated to either the enormous amount of data to be processed by current models or the very structure of ever expanding probabilistic and mechanical models, as for example when they involve too many parameters to allow for inference. Many examples of this difficulty or even impossibility of computing statistical procedures and of producing feasible statistical inference can be found in biology (genomics, proteomics), in the analysis of networks, the signal and the image.

Nonetheless, thanks to the very same tools, Bayesian non-parametric statistics is now an important area of ??research in statistics and machine-learning, and a recognized methodology in applied fields, for its theoretical developments, with better convergence characteristics in both well and badly specified models, as well as in terms of methodology. It is clear, however, that the convergence properties associated with such procedures are not applicable to a large number of modelling problems and need be replaced by other structures or procedures.

We have thus reached a turning point for the methodological and algorithmic tools that have made the Bayesian analysis particularly successful in many applied fields and which back up a theoretically valid approach for statistical inference. These tools must therefore adapt or disappear when faced with the present pressure of more rudimentary optimization tools that manage to offer (partial) snapshots of the model to be estimated in a very short time. much shorter that the production of a standard Bayesian inference. Since we defend the foundational perspective that Bayesian analysis (and statistics as a whole) provides an added value to machine learning outputs, by covering both the problem of model selection and the analysis of the uncertainty attached to any inference, we aim in this project at validating and extending our current tools to manage to overcome this crisis of fundamentals, henceforth proposing approximate Bayesian methods that have begun to emerge in recent years from specific areas of applications like population genetics.

A first direction of this project thus focuses on approximate Bayesian inference tools, their extensions, their calibration and their potential validation. The subject must of course be understood in a broad sense that covers the specific areas of research of the members of the research teams, including ABC (approximate Bayesian computation, also known as likelihood-free methods), expectation-propagation (EP) and variational Bayes approximations. These techniques share the property of approximating and of analysing models where the true likelihood function cannot be evaluated numerically or completed into a manageable model. We aim to combine these methods into a single class of methods, towards the aggregation of multiple non-parametric Bayesian techniques to obtain more efficient approximations, and to simultaneously provide a degree of validation of these approximations.

A second and related theme of this project is the study of the asymptotic properties of posterior distributions in complex high-dimensional models, towards producing robust Bayesian uncertainty measures, such as credible regions. We will study generic approaches in terms of their modelling capabilities and focus more on the two families of specific sampling problems motivated by the large-scale applications discussed in this project.

Project coordinator

Monsieur Christian Robert (Centre de recherches en mathématiques de la décision)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


University of Oxford / Statistics
CEREMADE Centre de recherches en mathématiques de la décision
CMAP Centre de Mathématiques Appliquées
IMAG Institut Montpelliérain Alexander Grothendieck

Help of the ANR 345,150 euros
Beginning and duration of the scientific project: December 2018 - 48 Months

Useful links

Sign up for the latest news:
Subscribe to our newsletter