Deep Learning for Constraint Optimization – DELCO
Deep Learning for Combinatorial Optimization
This project aims at building on recent progress in game playing and deep reinforcement learning to design a new generation of more general constraint satisfaction and optimization solvers.
This project aims at building on recent progress in game playing and deep reinforcement learning<br />to design a new generation of more general constraint satisfaction and optimization solvers. Upon<br />completion, this project will contribute to make basic (or even naive) problem formulations far<br />more efficient to solve. Eventually, the progress in this research direction will make constraints<br />solvers as accessible as database systems are nowadays.<br /><br />In case of success our approach will be able to rediscover basic solving heuristics, such as branching<br />over the most constrained variables, the same way AlphaGo was able to rediscover strategic knowl-<br />edge accumulated over centuries. Through this procedure, the network will also accumulate more<br />specific knowledge about various problems of different nature. To facilitate the discovery of prob-<br />lem specific knowledge, we will use state-of-the-art Convolutional neural networks (CNN) such as<br />residual neural networks and in particular we will rely on our expertise on transferring<br />knowledge from one task to another using such networks. In a second stage, the CNN will be<br />specialised for the problem at hand, using data collected during the search (for example making use<br />of restarts as in CDCL solvers), and combined with a lookahead procedure such as MCTS.
The goal is to use well-established techniques in imitation learning and reinforcement learning, that have been successful in the game AIs. For example we can start by gener-
ating large amounts of supervised data using MCTS, and then train a neural network to mimic the
behavior of MCTS without having to run expensive simulations (a.k.a. behavior cloning). We can
also use direct policy search techniques to improve our search policy beyond the quality of MCTS.
For example, the REINFORCE algorithm, can be directly applied to train the weights of the search
No results yet.
A fully functional MIP solver that is able to learn from resolving past instances as well as from knowledge collecting online, during the solving.
1. Fréchet Mean Computation in Graph Space through Projected Block Gradient Descent. Boria, N., Negrevergne, B., & Yger, F. (2020). European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN).
2. The Minimum Edit Arborescence Problem and Its Use in Compressing Graph Collections. Gnecco, L., Boria, N., Bougleux, S., Yger, F., & Blumenthal, D. B. (2021). International Conference on Similarity Search and Applications.
3. Monte Carlo Graph Coloring. Cazenave, T., Negrevergne, B., & Sikora, F. (2020). Monte Carlo Search, IJCAI Workshop.
4. Deep Reinforcement Learning for Morpion Solitaire. Doux, B., Negrevergne, B., & Cazenave, T. (2021). Advances in Computer Games.
5. On the expressivity of bi-Lipschitz normalizing flows. Verine, A., Negrevergne, B., Rossi, F., & Chevaleyre, Y. (2021). ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.
This project aims at building on the recent progress in deep learning and reinforcement
learning to design a new generation of more general constraint satisfaction and optimization
More precisely, our goal is to develop a methodology to learn search heuristics that can be used to make informed branching decisions during the search. To achieve this result, we will use convolutional neural networks (CNNs) in order to model complex relations between the state of a CSOP and the optimal branching decision. We will also rely on the reinforcement learning framework to train the heuristic using sparse, but relevant rewards such as final execution time or number of nodes explored. Together, these two techniques will be used to quickly drive the search towards good feasible solutions without too much backtracking.
During the project, we will evaluate the applicability of this approach on a variety of tasks. We will start with simple grid-based problems and puzzle games as they can be directly processed by CNNs. Later, we will address more complex graph-based problems using graph embeddings.
We will also combine learned search heuristics with different search strategies, ranging from basic greedy search, to state-of-the-art solvers using complex constraint propagation techniques. We will be particularly careful at investigating the possible synergies between constraint propagation and learning.
Since the generality of the approach is an important focus of this project, we will investigate several machine learning techniques that can be used to train more general and robust models.
In particular, we will look at generating hard instances instances (i.e. instances that are difficult to solve), and we will adapt the GAN framework (Generative Adversarial Networks) to CSOPs. These new techniques will be useful to build search heuristics that are more robust to corner cases, and can be reused on different variations of similar problems.
The success of our approach relies on a solid experimental methodology. In our project, an engineer will set up an automated testing plateform, that will be used along the course of the project to provide reliable evaluation metrics as well as rich experimental insights.
Upon success, we can expect the following breakthroughs:
- More general and efficient CSOP solving strategies. In particular, solving strategies whose efficiency does not depend on the way a problem is encoded using variables and constraints. This will make CSOP solving more accessible to non-experts.
- CSOP solvers combining local logical reasoning, and long term probabilistic reasoning, paving the way for a new solver paradigm.
- Hybrid solvers exploiting highly parallel GPU architectures (for inference) as well as CPU (for search).
- New techniques for learning robust branching decisions based on recent developments such as GANs.
Monsieur Benjamin Negrevergne (Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LAMSADE Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision
Help of the ANR 283,853 euros
Beginning and duration of the scientific project: September 2019 - 48 Months