In our era of steady data deluge, there is no doubt that the design of efficient techniques to extract knowledge from massive data collections or data streams has become a fundamental scientific and technological challenge bearing immense opportunities, both industrial and societal. As such, it is naturally the object of intense work in many communities, from signal processing to machine learning and data mining, now converging to a new dedicated field called Data Science. In a nutshell, the ultimate goal of Data Science is to design algorithms for information extraction achieving a good balance between the quality of the extracted information and the resources (computation time, energy, memory, ...) invested to obtain it.
Be it for data visualization or for prediction models, today's most successful techniques rely at some stage on Machine Learning (ML) where in a (possibly costly) learning phase a training collection is exploited to tune the algorithm parameters.
The vision of OATMIL is that Optimal Transport (OT) has the potential to provide elegant and conceptually rich approaches to address the corresponding challenges, through a geometric perspective on ML seen as a family of techniques to map the empirical distribution of training data to parameters. Vice-versa, we envision that many of the challenging computational aspects of OT can be efficiently addressed by developing new optimization algorithms inspired by state of the art ML.
OATMIL will leverage this key duality between OT and ML to develop a new ensemble of machine learning tools built upon Optimal Transport (OT) theory. Based on the geometrical perspective brought by OT, these new tools will pave new ways for manipulating empirical distributions and will open new avenues for data-adapted machine learning methodologies. The new methodologies will be implemented in a toolbox made available to the research and industrial community. The practical interest of these tools will be evaluated on real world problems (remote sensing or astronomical imagery, audio signal processing) as well as a practical industrial case study, brought by a non-funded collaborator of the project.
Monsieur Nicolas COURTY (Institut de recherche en informatique et systèmes aléatoires)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
IRISA Institut de recherche en informatique et systèmes aléatoires
LITIS Laboratoire d'Informatique, du Traitement de l'information et des Systèmes
CMAP CMAP - Ecole polytechique
LAGRANGE (OCA/CNRS/UNS) Laboratoire J-L Lagrange (OCA/CNRS/UNS)
Help of the ANR 390,570 euros
Beginning and duration of the scientific project: - 48 Months