TopAI: Topological Data Analysis for Machine Learning and AI – TopAI
TopAI
Topological Data Analysis for Machine Learning and AI
Development of new tools to discover and exploit the topological structure of Data in Machine Learning and AI
The recent years have seen all domains of science, economy and even everyday life overwhelmed by massive amounts of data. Bringing scientists and users to the most relevant, often unexpected, features and giving them the tools to discover, extract and exploit the best knowledge out of their data are fundamental challenges for our modern society. A closer look at data reveals that they often concentrate around shapes carrying an interesting topological or geometric structure. Identifying, extracting and exploiting the topological and geometric features or invariants underlying data has become a problem of major importance to better understand relevant properties of the systems from which they have been generated.<br />Building on solid theoretical and algorithmic bases, geometric inference and computational topology have experienced important developments towards data analysis. New mathematically well-founded theories gave birth to the field of Topological Data Analysis (TDA), which is now arousing interest from both academia and industry.<br />During the last few years, TDA has witnessed many successful theoretical contributions, with the emergence of persistent homology theory and distance-based approaches, important algorithmic and software developments, with the development, by the candidate’s team, of the GUDHI library, and some real-world successful applications, with the involvement of several industrial companies in the field. These developments have demonstrated very promising potential in the combination of TDA methodology with other ML and AI approaches, opening new theoretical and applied research directions at the crossing of TDA, ML and AI. <br />The TopAI project aims at developing a world-leading research activity on topological and geometric approaches in Machine Learning (ML) and AI with a double academic and industrial/societal objective. First,TopAI aims at designing new mathematically well-founded topological and geometric methods and tools for Data Analysis and ML and to make them available to the data science and AI community through state-of-the-art software tools. Second, thanks to already established close collaborations and the strong involvement of French industrial partners, TopAI aims at exploiting its expertise and tools to address a set of challenging problems with high societal and economic impact in personalized medicine and AI-assisted medical diagnosis.<br /><br />The TopAI project is organized around two main pillars, one regrouping the fundamental and generic research and one focused on the applications in medical domain that will feed each other.
The project is structured around 4 work packages reflecting the variety of approaches:
Work Package 1: New mathematical tools for TDA in ML
Challenges: Designing and providing new TDA tools and model for ML that come with strong mathematical guarantees is of major importance, more particularly for the applied problems considered in this project (personalized medicine, diagnosis) where the precision and the reliability of the results are critical.
Expected outcomes:
- Developing the mathematical and statistical foundations of topological and geometric data analysis for ML and AI.
- Providing new mathematically well-founded and efficient TDA approaches and methods.
Methodology: Our general approach consists in building over promising recent preliminary theoretical and/or experimental results
Work Package 2: A generic software toolbox for TDA in ML and AI
Challenges: Promoting and transferring TDA methods and knowledge generated within the TopAI project beyond the TDA community and the targeted domains in WP3 and WP4 through easy-to-use software for non-expert data scientists.
Methodology: integration of the research outcomes in the open source C++/Python library, Gudhi, that implements state-of-the-art algorithms and data structures at the core of TDA.
Expected outcomes: they are two-fold. First, an efficient and state-of-the-art toolbox coming with tutorials and use cases for training and teaching purposes. Second, the launching of a start-up during the last year of the project.
Work Package 3: Personalized medicine and diagnosis through clinical endpoints development
Work Package 4: Topology-based unsupervised classification and anomaly detection on cytometry data for medical diagnosis.
The project is still going-on
The project is still going-on.
[BCMR22] Thomas Bonis, Frédéric Chazal, Bertrand Michel, Wojciech Reise. Topological phase estimation method for reparameterized periodic functions. 2022. (hal-03687686)
[P22a] Louis Pujol. ISDE : Independence Structure Density Estimation. 2022. ?hal-03401530v4?
[P22b] Louis Pujol. Nonparametric estimation of a multivariate density under Kullback-Leibler loss with ISDE. 2022. ?hal-03660157?
[SHCL22] T. de Surrel, F. Hensel, M. Carrière, T. Lacombe, Y. Ike, H. Kurihara, M. Glisse, F. Chazal RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds. Feb. 2022.
[CM21] Frédéric Chazal, Bertrand Michel. An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists. Frontiers in Artificial Intelligence, Frontiers Media S.A., 2021, Front. Artif. Intell., ?10.3389/frai.2021.667963?. ?hal-01614384?
[CCGIK21] Mathieu Carriere, Frédéric Chazal, Marc Glisse, Yuichi Ike, Hariprasad Kannan. Optimizing persistent homology based functions. ICML 2021 - 38th International Conference on Machine Learning, Jul 2021, Virtual conference, United States. pp.1294-1303. ?hal-02969305v2?
[RCLUI21] Martin Royer, Frédéric Chazal, Clément Levrard, Yuhei Umeda, Yuichi Ike. ATOL: Measure Vectorization for Automatic Topologically-Oriented Learning. AISTATS 2021 - 24th International Conference on Artificial Intelligence and Statistics, Apr 2021, Virtual conference, France. ?hal-02296513v3?
[CLR21] Frédéric Chazal, Clément Levrard, Martin Royer. Optimal quantization of the mean measure and applications to statistical learning. Electronic Journal of Statistics , Shaker Heights, OH : Institute of Mathematical Statistics, 2021, 15 (1), pp.2060-2104. ?hal-02465446v4?
TopAI is a project that aims at developing a world-leading research activity on topological and geometric approaches in Machine Learning (ML) and Artificial Intelligence (AI) going from mathematical foundations to industrial applications with high societal and economic impact in personalized medicine and AI-assisted medical diagnosis.
Motivated by a strong interest in the understanding of the complex structures underlying data, geometry and topology have recently experienced important developments towards data analysis and Machine Learning. New mathematically well-founded theories gave birth to the field of Topological Data Analysis (TDA), which is now arousing interest from both academia and industry. During the last few years, TDA has witnessed many successful theoretical contributions, important algorithmic and software developments, and, some real-world successful applications. These developments have demonstrated very promising potential in the combination of TDA methodology with other ML and AI approaches, opening new theoretical and applied research directions at the crossing of TDA, ML and AI that are at the core of the TopAI project.
The TopAI activities are organized around a double academic and industrial/societal objective. First, TopAI aims at designing new mathematically well-founded topological and geometric methods and tools for Data Analysis and ML and to make them available to the data science and AI community through state-of-the-art software. Second, thanks to already established close collaborations and the strong involvement of two French innovative SMEs, Sysnav and MetaFora, TopAI aims at exploiting its expertise and tools to address a set of challenging problems with high societal and economic impact in personalized medicine and AI-assisted medical diagnosis.
TopAI embraces a unique variety of expertise in a common framework going from the mathematical and foundations of TDA and AI to applied and industrial research. This combination of upstream and downstream research will create a unique synergy whose expected outcomes include academic and basic research contributions, industrial transfer and valorization.
Project coordination
CHAZAL frédéric (Centre de Recherche Inria Saclay - Île-de-France)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Inria Saclay - Ile de France - équipe DATASHAPE Centre de Recherche Inria Saclay - Île-de-France
Help of the ANR 546,951 euros
Beginning and duration of the scientific project:
August 2020
- 48 Months