Stattistics, computation and Artificial Intelligence – SCAI
The key factor for the recent boom in AI is the emergence of deep learning (DL). The successes of these methods - in particular, for supervised learning- are amazing. But the limits and some of the downsides of DL have been identified. It is well acknowledged that the current DL algorithms are “data hungry”, that the performance of DL is strongly affected by the quality of the gathered data, that DL is known to be poor at representing uncertainty, etc..
These problems can be explained and hopefully alleviated to a certain extent. Current DL research is almost entirely focused on “data-driven” “model agnostic” AI. The data deluge fueled by the drop of the price of data storage, the tremendous increase in computing power (with GPU, TPU, massive parallelization) and the discovery of new neural nets architectures (CNN, LSTM) algorithms have made it possible to solve problems which seemed to be out of reach a decade ago. We can build systems that can predict what will happen next based on what they have learned so far, very efficiently. But entirely “data-driven” approaches have their limits: it is now felt that whereas this approach is perfectly fine for predictive tasks it will fals short to reach higher-level abstraction unless we effectively combine “data-driven” with more “model-driven” approaches. By “model-driven” AI, I mean approaches that capture prior knowledge and to avoid wasting computation time and (a lot) of data to learn blindly concepts that are already known. There are of course different types of models. Symbolic models capture the “high-level semantic” are appealing but their shortcomings are well-documented. My personal inclination goes toward statistical model which can capture complex knowledge while accounting for uncertainties and various forms of “nuisances”. I want to build a hybrid AI system composed of sub-systems: some “data-centric” and other statistical model-driven. The challenge is to combine such sub-systems and to develop appropriate inference methods, which is a tough task.
Designing these hybrid models and solving associated inference problems in modern large-scale AI poses very significant theoretical and computational problems. The most obvious challenge is to cope with the model dimensions (10^3-10^6 of variables) and the size of data collections (10^6-10^9 of data points are not uncommon). But there are also other difficulties. In some applications for example, the time-budget (either at the training time of at the inference time) is tight. In other settings (which become increasingly important), the data are distributed at a very large scale (for example mobile phones) and the individual devices should collaboratively learn a shared prediction model while keeping all the training data on the device to preserve privacy.
These challenges bring to the fore new mathematical and computational problems both in high-dimensional inference and large-scale (distributed) optimization and simulations. The domain of learning with hybrid models is full of exciting theoretical and methodological research avenues. I have divided the program of the chair in 4 main intertwined challenges. Challenge 1 deals with Bayesian models with applications to uncertainty quantification of DNN and collaborative crowdsourcing for labelling. Challenge 2 is on deep generative models, which directly link DL and probabilistic model. This is a prototype example where “data-centric” deep neural networks are used to construct new statistical models for high-dimensional data. Challenge 3 is large-scale distributed optimization and simulation with an emphasis of massive parallelization, federated learning under communication and privacy constraints. Here statistical models are used on the top of DL to perform appropriate aggregation of the information shared by the individual learners. Challenge 4 is Large-scale MC and cover both theoretical (mixing times, dependence on dimensions, etc.) and algorithmic issues.
Monsieur Eric Moulines (Centre de Mathématiques Appliquées de l'Ecole Polytechnique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
UMR7641 Centre de Mathématiques Appliquées de l'Ecole Polytechnique
Help of the ANR 579,044 euros
Beginning and duration of the scientific project: August 2020 - 48 Months