CE23 - Intelligence Artificielle

Supervised Ultrametric Learning – ULTRA-LEARN

Ultra-Learn: Supervised Ultrametric Learning

Hierarchical clustering is a technique that allows to simultaneously capture features of complex data at different scales. The objective of this project is to develop new methods of hierarchical analysis based on their representation in the form of ultrametric distance and allowing the continuous optimization of hierarchical cost functions. These new methods can then be integrated into classical machine learning schemes.

Interests of hierarchical clustering

A hierarchical clustering corresponds to recursively dividing a dataset into increasingly smaller groups. The fundamental assumption of this approach is that the visible structure in the data depends on the chosen scale of observation. This hypothesis has been confirmed, in theory and in practice, on many complex data such as social networks, electrical networks, actor networks, computer networks, cortical brain networks, linguistic networks, etc. Building high quality hierarchical representations is therefore an important step in the analysis and in the understanding of these data. The hierarchical clustering methods used today are essentially unsupervised and are based on heuristic algorithms: therefore, the hierarchical clustering obtained does not correspond to the optimization of an explicit criterion. These limitations are due to difficulties in solving optimization problems on these structures, as they are generally NP-hard.

This progress will be based on two main ingredients. The first one is the development of ultrametric networks, i.e. models which can be trained, and which will output ultrametrics. An ultrametric network will itself be made up of two elements, a multilayer neural network for the extraction of characteristics and an ultrametric layer which will be able to produce an ultrametric from the characteristics provided by the neural network. This ultrametric layer will be differentiable: this will allow us to learn the parameters of the network end-to-end. The second ingredient of the method will be the definition of cost functions to measure the dissimilarity effectively and differentially between two ultrametrics.

The ultrametric networks developed will be applied on three problems. We will propose an ultrametric network to estimate hierarchical segmentations of images, i.e. the complete decomposition of an image into objects and the iterative refinement of these objects into parts. The second application will aim at the development of a classifier taking advantage of a class ontology known a priori, for example we know that a cat and a dog are both mammals and that they are semantically closer than a bird. Finally, the third application considered is the hierarchical semi-supervised grouping of data; in this case, we will seek to construct an optimal hierarchical grouping based on partial information given a priori by an expert.

We have developed a first method to optimize a differentiable hierarchical cost function with a gradient descent algorithm. This method is based on the subdominant ultrametric operator which associates an ultrametric to any dissimilarity function in a subdifferentiable way. We have shown how it is then possible to compute efficiently this operator and its gradient in the context of sparse graph analysis. These developments have allowed us to provide efficient solutions for the optimization of hierarchical cost functions whose optimization is NP-hard. We have also shown how this approach allows us to easily construct new hierarchical cost functions by combining data attachment terms and regularization terms that can include supervised information.

The ultrametric networks developed will be applied on three problems. We will propose an ultrametric network to estimate hierarchical segmentations of images, i.e. the complete decomposition of an image into objects and the iterative refinement of these objects into parts. The second application will aim at the development of a classifier taking advantage of a class ontology known a priori, for example we know that a cat and a dog are both mammals and that they are semantically closer than a bird. Finally, the third application considered is the hierarchical semi-supervised grouping of data; in this case, we will seek to construct an optimal hierarchical grouping based on partial information given a priori by an expert.

-G. Chierchia and B. Perret. Ultrametric fitting by gradient descent. Journal of Statistical Mechanics: Theory and Experiment, 2020(12), 124004.
-G. Chierchia and B. Perret. Ultrametric fitting by gradient descent. Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.

A hierarchical clustering corresponds to recursively dividing a dataset into increasingly smaller groups. The fundamental assumption of this approach is that the visible structures in the data depends on the chosen scale of observation. This hypothesis has been confirmed, in theory and in practice, on many complex data such as social networks, electrical networks, actor networks, computer networks, cortical brain networks, linguistic networks, etc. Building high quality hierarchical representations is therefore an important step in the analysis and in the understanding of these data. The hierarchical clustering methods used today are essentially unsupervised and are based on heuristic algorithms: therefore, the hierarchical clustering obtained does not correspond to the optimization of an explicit criterion. These limitations are due to difficulties in solving optimization problems on these structures, as they are generally NP-hard.

In this project, we propose to study the problem of optimizing hierarchical clusterings from the angle of optimizing ultrametrics which are a dual and continuous representation of hierarchical clusterings. In topology, an ultrametric distance is obtained by replacing the triangular inequality by the ultrametric inequality, which requires that any triplet of points forms an isosceles triangle where the two equal sides are at least as large as the third side. This ultrametric inequality imposes a new non-convex constraint which is difficult to manage. We can nevertheless reformulate the problem of optimization of ultrametrics to make this constraint implicit. It then becomes possible to optimize ultrametrics with gradient descent algorithms. This project aims to develop this possibility to obtain supervised learning methods for hierarchical clustering working on large datasets. In other words, we will formulate the problem of supervised hierarchical clusterings learning on the continuous space of ultrametrics, hence the name of the project: ULTRA-LEARN.

This progress will be based on two main ingredients. The first one is the development of ultrametric networks, i.e. models which can be trained, and which will yield ultrametrics. An ultrametric network will itself be made up of two elements, a multilayer neural network for the extraction of characteristics and an ultrametric layer which will be able to produce an ultrametric from the characteristics provided by the neural network. This ultrametric layer will be differentiable: this will allow us to learn the parameters of the network end-to-end. The second ingredient of the method will be the definition of cost functions to measure the dissimilarity effectively and differentially between two ultrametrics. All these elements will also benefit from algorithmic developments to operate on highly parallel hardware such as GPUs to allow the processing of large datasets.

The ultrametric networks developed will be applied on three problems. We will propose an ultrametric network to estimate hierarchical segmentations of images, i.e. the complete decomposition of an image into objects and the iterative refinement of these objects into parts. The second application will aim at the development of a classifier taking advantage of a class ontology known a priori, for example we know that a cat and a dog are both mammals and that they are semantically closer than a bird. Finally, the third application considered is the semi-supervised hierarchical clustering of data; in this case, we will seek to construct an optimal hierarchical clustering based on partial information given a priori by an expert.

Project coordination

Benjamin Perret (Laboratoire d'Informatique Gaspard-Monge)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LIGM Laboratoire d'Informatique Gaspard-Monge

Help of the ANR 163,404 euros
Beginning and duration of the scientific project: March 2021 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter