ChairesIA_2019_1 - Chaires de recherche et d'enseignement en Intelligence Artificielle - vague 1 de l'édition 2019 2019

BRIDinG thE gAp Between iterative proximaL methods and nEural networks – BRIDGEABLE

BRIDinG thE gAp Between iterative proximaL methods and nEural networks

A powerful and elegant approach for solving challenges in data science consists of formulating them as an optimization problem. Since the seminal work by Moreau in the 1960s, proximal tools have been growing in popularity in optimization. At the same time, deep neural networks have led to outstanding achievements in many application fields related to data analysis.

Towards more powerful and robust AI methods

Processing high-dimensional datasets in a reasonable time by designing both efficient and robust algorithms is a stimulating scientific endeavour that constitutes the backbone of this project. The ability of proximal methods, e.g. ADMM or the proximal gradient algorithm, to tackle non-smooth problems and to split complex objective functions in sums of simpler terms has enabled significant advances in large scale data processing over the last decade. One of the main advantages of iterative proximal algorithms is that they rely on a strong mathematical background, which allows their convergence properties to be analyzed in a precise manner. In contrast, the fundamental reasons for the excellent performance of deep neural networks are still poorly understood from a mathematical viewpoint. In addition, few theoretical guarantees exist today concerning the robustness of these methods. Recently, we have shown that almost all the activation functions used in NN architectures correspond to proximity operators of convex functions. This finding opens new perspectives in deep learning by exploiting tight links between NN structures and iterative proximal algorithms. The objective of this chair project is to investigate these relations in depth, in order to bring new insights in the analysis of NN architectures. The objective of the project is to lead to the advent of a new generation of approaches that will take advantage of the combined reliability of iterative proximal methods with the practical efficiency of deep learning methods. These methodological developments are essential to improve the interpretability and safety of AI methods, which are of utmost importance in many industrial contexts. This great challenge is addressed in close collaborations with three industrial partners.

Three main research directions split into tasks are investigated.

WP 1: Design of robust neural networks
It is widely acknowledged that NNs are sensitive to adversarial perturbations of their inputs. A way of providing a guarantee of stability consists of quantifying the Lipschitz regularity of the NN. For feedforward NNs, sharp bounds can be derived, for a wide range of norms, by leveraging the averaging properties of standard activation functions. These results are obtained by employing tools borrowed from fixed point theory..
Task 1.1: Generalize Lipschitz stability analysis to more complex neural structures
?Task 1.2: Propose new NN architectures taking inspiration from the form of proximal algorithms
Task 1.3: Perform constrained training to impose stability certificates

WP 2: Learning maximally monotone operators
We propose a new paradigm for solving inverse problems where an operator regularization is substituted for the standard functional regularization. More specifically, our purpose is to define the regularized solution by solving a monotone inclusion problem involving the sum of the subdifferential of the data fidelity function and a maximally monotone operator accounting for the prior information on the sought object. Although this generalization of classical convex formulations may appear both natural and elegant, it induces a high degree of freedom in the choice of the regularization strategy. To turn this difficulty into an advantage, we propose to learn the maximally monotone operator in a supervised manner by using available datasets.
Tasks 2.1: Generate neural network models of monotone operators
?Task 2.2: Explore connections with plug-and-play methods
?Task 2.3: Devise suitable fixed point training strategies?

WP 3: A proximal view of deep dictionary learning
Dictionary learning is a powerful and popular tool in compressed sensing and sparse signal recovery. Recent efforts have proposed to extend these dictionary learning approaches in a multiscale way. The resulting deep dictionary learning methods have been shown to be competitive with respect to deep neural networks. Since proximal methods constitute the machinery behind these approaches and they also play a fundamental role in deep learning, they provide a suitable framework for capturing the differences and common characteristics between these two approaches.
Task 3.1: Explore links between dictionary learning and changes of metrics??Task 3.2: Theoretically analyze the robustness and expressivity of deep dictionary learning
Task 3.3: Develop adaptive learning strategies

1) Development of proximal tools for neural network analysis
- Proximal method for neural network compression (collaboration with Schneider Electric): we have proposed a new approach to compress neural networks and thus allow their implementation on architectures with low memory capacity.
- Certification of neural networks (collaboration with Thales): we have introduced a new multivariate analysis of the Lipschitz regularity of neural networks. The results can be visualized using a «Lipschitz star« representation to measure the influence of each input or group of inputs.
- Robust neural network training algorithms (collaboration with Politehnica Bucharest): we have developed a new constrained learning strategy to ensure the robustness of a neural network to adversarial perturbations. Our algorithm relies on the control of the Lipschitz constant of the network, here assumed to be with nonnegative weights.

2) Proposal of new fixed point strategies
- Definition of a rigorous framework to ensure the convergence of plug-and-play methods (collaboration with Herriot Watt Univ., Edinburgh): a new formulation is introduced for solving inverse problems where learning the resolvent of a maximally monotone operator is substituted for the classical regularization approach. This work provides theoretical guarantees of convergence of iterative PnP methods.
- Study of adjoint mismatch problems in image reconstruction methods (collaboration with GE Healthcare): we studied the proximal gradient algorithm when the adjoint of this operator is replaced by an approximation, which can be simpler to implement. We have characterized the fixed points of the algorithm, analyzed its convergence conditions, and evaluated the error incurred by this approximation.

3) Design of adaptive representations
- Theoretical link between deep dictionary learning methods and neural networks (collaboration with North Carolina State Univ.): we have established close links between deep dictionary learning methods and recurrent neural network structures. This result makes it possible to exploit existing differential programming methods to make deep dictionary learning more efficient.
- Trained lifting schemes for image compression (collaboration with Univ. Paris 13): we proposed to learn the prediction and update operations appearing in lifting schemes used in image compression. More precisely, these operations have been performed by fully connected neural networks.

The different investigated topics will be further developed in our future work by considering more complex neural network structures (GNNs, in particular) and generalizing the obtained convergence results to new iterative methods. Moreover, we will also focus on using proximal methods for solving weakly supervised problems based on neural networks.

- 7 international journal articles
- 7 International conference articles
- 1 brevet international soumis

Proximal methods have enabled significant advances in large scale optimization in the last decade. At the same time, deep neural networks (NNs) have led to outstanding achievements in many application domains related to data science. However, the fundamental reasons for their excellent performance are still poorly understood from a mathematical viewpoint. Recently, we have shown that almost all the activation functions used in NN architectures (e.g. the multivariate squashing functions recently introduced for capsule networks) identify with the proximity operators of convex functions. This finding opens up new perspectives in deep learning by exploiting tight links between NN structures and iterative proximal algorithms. More precisely, we propose three main research avenues.
First , the well-known fragility of neural networks with respect to adversarial perturbations will be investigated. For this purpose, we will be using fixed point techniques grounded on the firm non-expansiveness property of these activation operators. Our preliminary results in this direction will be extended by considering more general architectures than basic feedforward ones (e.g. residual networks or GANs). Novel architectures intended to be more robust will also be proposed by mimicking existing proximal methods. Suitable training algorithms will be designed allowing us to control the Lipschitz constant of these resulting NNs, thus making a first step towards their certifiability.
Second, a new formulation of inverse problems will be proposed, aiming at replacing standard convex regularizing functions by a regularization approach based on maximally monotone operators (MMOs). This strategy will be not only more general, but also more flexible. It will allow data-driven MMOs to be learned in a supervised manner. This will lead to efficient plug and play iterative algorithms for solving image restoration or reconstruction problems. In these approaches, denoising steps will be performed by a NN. One of the major benefits of our framework will be to yield clear convergence results of the resulting iterative schemes.
Finally, we will investigate deep dictionary learning (DDL) methods. These currently appear as competitive alternative approaches to NNs. In each step of these methods, a non-smooth cost function is optimized in order to find an optimal representation of the analyzed data in a suitable dictionary. Since this optimization is usually performed by proximal techniques, these methods can be interpreted as the use of a smart nonlinear activation operator. Our purpose will be to clarify the relations existing between DDL and NNs in order to both make DDL techniques more powerful and to better analyze their performance. In addition, strategies will be introduced to increase the versatility of DDL approaches by making them adaptive to incoming data.
In terms of methodological outcomes, the project is expected to lead to significant progress in the explainability of NNs and in the proposition of novel methods for improving their reliability. In terms of practical impact, the developed methods will result in a new generation of techniques for solving problems arising in three application fields: 3D medical imaging (collaboration with GE Healthcare), data analysis for energy and environment issues (collaboration with IFPEN) and multivariate nonlinear modeling of electric motors (collaboration with Schneider Electric).

Project coordination

Jean-Christophe PESQUET (Centre de Vision Numérique)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

CVN Centre de Vision Numérique

Help of the ANR 484,920 euros
Beginning and duration of the scientific project: August 2020 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter