A PAC-Bayesian Representation Learning Perspective – APRIORI
Understanding representation learning via the PAC-Bayesian theory
The tools offered by the PAC-Bayesian theory allow to bring an original point of view on representation learning methods while bringing strong theoretical guarantees and new directions to develop new algorithms.
PAC-Bayesian theory and methods for deep learning and metric learning
The main objective of the APRIORI project is to bridge the gap between representation learning practices and theory. To do so, we adopt the perspective of a theory in machine learning called «PAC-Bayesian theory«. This theory is known to provide theoretical guarantees on models that are expressed as votes.<br />of majorities. We wish to provide theoretical justifications, on the one hand, for metric learning methods (which includes the notion of distance, similarity or kernel) and, on the other hand, for deep learning methods. To do so, we express representation learning as the learning of a combination of (sub)-models (or representations). From this formulation, we derive theoretical guarantees to (i) bring a better understanding of these methods, (ii) derive new theoretically based algorithms, and (iii) guide the users of these methods in their choice of methods to implement.
The main idea is to redefine or reformulate representation learning problems in the lens of the PAC-Bayesian theory. From such a redefinition, we can theoretically study the problematic to derive generalization bounds. These bounds then guide us towards the development of learning algorithms.
Note that, we maintain very frequent discussions between the partners.
Several results have been published in international machine learning conferences.
We wish to continue the development of the project with the same sustained rhythm.
In addition, we aim to organize a workshop on the project theme in an international machine learning conference (ICML, NeurIPS, or ICLR)
- Pseudo-Bayesian Learning with Kernel Fourier Transform as
Prior -Gaël Letarte ; Emilie Morvant ; Pascal Germain - International Conference on Artificial Intelligence and Statistics
(AISTATS), 2019
- Landmark-based Ensemble Learning with Random Fourier
Features and Gradient Boosting - Léo Gautheron ; Pascal
Germain ; Amaury Habrard ; Guillaume Metzler ; Emilie
Morvant ; Marc Sebban ; Valentina Zantedeschi - European
Conference on Machine Learning & Principles and Practice of
Knowledge Discovery in Databases (ECML-PKDD), 2020
- Dichotomize and Generalize: PAC-Bayesian Binary Activated
Deep Neural Networks - Gaël Letarte, Pascal Germain,
Benjamin Guedj, François Laviolette - Conference on Neural
Information Processing Systems (NeurIPS), 2019
- PAC-Bayesian Contrastive Unsupervised Representation
Learning - Kento Nozawa, Pascal Germain, Benjamin Guedj -
Conference on Uncertainty in Artificial Intelligence (UAI), 2020
- Improved PAC-Bayesian Bounds for Linear Regression - Vera
Shalaeva, Alireza Fakhrizadeh Esfahani, Pascal Germain, Mihaly
Petreczky - Conference on Artificial Intelligence (AAAI), 2020
- A primer on PAC-Bayesian Learning - Benjamin Guedj - Journal
of Société Mathématiques de france, 2019
- tutorial ar ICML 2019 - A primer on PAC-Bayesian Learning -
Benjamin Guedj and John Shawe-Taylor
- invited talk at JDS 2019 - PAC-Bayesian Learning
and Neural Networks - Pascal Germain
- invited talk at JDS 2019 - When PAC-Bayesian
Majority Votes meets Domain Adaptation - Emilie Morvant
- 4 Communication at CAp (french conference on machine learning)
A key step that determines the success of any data science task is the construction of a right representation of the data that will ease its processing. Indeed, if the representation of the data is not enough meaningful for a given task, one cannot expect to solve this task. Until recently, the crucial step of constructing a representation was done by pre-processing the data with hand-crafted features. This was before the successes of methods known as representation learning, a term that refers to machine learning techniques that are able to automatically construct a representation specific to the task considered.
Machine learning regroups numerous methods allowing a computer program to learn patterns in data batches or streams. Concretely, the objective is to learn a relation between the input space (i.e., the original representation) and the output space (e.g., a label space for prediction tasks). This input-output relation is expressed as a function, often called a hypothesis, that we want well-performing on new data. Note that, to theoretically study the performance of a hypothesis, several statistical machine learning theories exist and provide guarantees on the hypothesis constructed (e.g. VC-dimension or Rademacher complexity based theories). To learn such hypothesis, the representation learning paradigm consists in incorporating in the process the transformation of the input space into a new meaningful representation, from which a model is learned (the whole process can be done simultaneously or sequentially). In other words, learning a representation amounts to transforming the input space into a new feature space (implicitly or explicitly) that can be interpreted as a new latent space bringing more useful semantic information.
The challenge we aim at facing with this project is to bridge the gap between representation learning practices and theory, in order to guide further development in this area. Think about deep learning methods that are able to automatically learn meaningful representations and that has led to impressive breakthroughs in many application areas, such as computer vision, natural language processing, bioinformatics, and so on. However the theoretical understanding of the performance of these methods stands far behind their empirical achievements.
Representation learning is also used in the context of pairwise functions that compute the similarity or distance between data points. This field is known as metric learning (that regroups the notion of distance, similarity or kernel). These methods are easier to analyze theoretically. However, if the empirical achievements are not as impressive as for deep learning methods, we believe that these methods still deserve more studies as they allow one to learn from a relatively small quantity of data, while deep learning methods are successful mainly in the presence of a large amount of data.
Instead of adopting a classical single-hypothesis approach, we will tackle the above challenge by expressing representation learning as the learning of a combination of (sub-)hypotheses or (sub-)representations. This will allow us to make use of a machine learning theory that offers powerful tools to study combinations: the PAC-Bayesian theory. The APRIORI project will focus on understanding the successes of the representation learning techniques with the PAC-Bayesian theory.
Project coordination
Emilie MORVANT (Laboratoire Hubert Curien)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
Inria LNE Inria Lille - Nord Europe
UJM/LabHC Laboratoire Hubert Curien
Help of the ANR 300,499 euros
Beginning and duration of the scientific project:
- 48 Months