CE23 - Intelligence artificielle et science des données 2024

Discrete Graphical Model Learning and Solving using Data Mining And Deep Learning – GMLaS

Discrete Graphical Model Learning and Solving using Data Mining and Deep Learning

Today, the convergence of discrete optimization and machine learning (ML) is emerging as a game-changing force in the realm of AI, becoming an omnipresent technology in a wide range of applications. The hybridization of DM, ML, and Deep Learning with constraint reasoning and optimization is one of the major challenges in AI. Our proposal explores this challenge within a discrete reasoning/optimization framework and its associated solver: graphical models and the Toulbar2 solver.

GMLaS project marks a significant step toward the ultimate goal of incorporating ML/DM tools into the cost function paradigm, paving the way for a data-driven hybrid resolution framework

Recent years have seen significant progress in machine learning (ML) and data mining (DM), making these technologies indispensable across a wide range of application areas. Constraint reasoning and optimization are fields that could greatly benefit from the integration of ML/DM. Constraint programming (CP) is regarded as one of the leading paradigms for solving combinatorial problems in AI. It provides a flexible framework for modeling and solving complex problems from diverse domains. The fusion of DM, ML, and deep learning (DL) with constraint reasoning and optimization represents one of the major challenges in AI. Our approach addresses this challenge through a combined framework of machine learning, discrete reasoning, and optimization, along with the solver: graphical models (GM) and the Toulbar2 solver, which has won multiple solver competitions. Discrete graphical models, especially additive ones like cost function networks (CFNs) and Markov random fields, are particularly valuable since they can represent both logical and probabilistic information in a transparent manner. Toulbar2 is a state-of-the-art solver designed for these discrete models, combining exact methods with advanced meta-heuristics. The hybridization of ML and CP has opened two key research directions: (1) leveraging ML to improve the performance of cutting-edge discrete solvers for specific problem families, and (2) applying DL to tackle complex problems in collaboration with discrete solvers.<br /><br />In cases where approximate methods like meta-heuristics (MHs) are needed, ML and DM techniques can also collaborate to solve combinatorial optimization problems. During the exploration phase, MHs generate large amounts of data, including both successful and unsuccessful solutions based on their fitness values, and so on. This data may contain valuable insights, such as patterns that differentiate good solutions from bad ones. By incorporating this knowledge into the search process, we can guide the MHs toward more informed decisions, making them smarter.<br /><br />The GMLaS project is a significant step toward the ultimate goal of integrating ML/DM tools into the cost function paradigm, creating a data-driven hybrid resolution framework capable of solving more complex and large-scale real-world problems. One of the main problems the project will focus on is computational protein design (CPD), a field where one of the project partners has deep expertise.

Constraint programming; machine learning; data mining; Cost function networks; tooulbar2, toulbar2-VNS

The project will rely on several paradigms and methods listed below:

(1) Constraint Programming (CP) is one of the main paradigms for solving combinatorial problems in AI. CP offers a generic framework for modeling and solving combinatorial problems from various domains.

(2) Discrete Graphical Models, particularly additive models such as Cost Function Networks (CFNs) and Markov Random Fields, are of interest because they can transparently represent both logical and probabilistic information.

(3) Metaheuristics (MHs) are a family of approximate optimization methods that drive an interaction between local improvement procedures and high-level strategies capable of escaping local optima and performing a robust exploration of the search space. Variable Neighborhood Search (VNS) is a well-known meta-heuristic, guided by systematic changes in the neighborhood search structures to find a local optimum and escape from it.

(4) Pattern Mining is a fundamental task in data mining, aimed at identifying recurrent and interesting substructures in data as a form of knowledge discovery. We plan to use this type of approach to guide the Toulbar2-VNS search.

(5) Machine Learning and Deep Learning.

(6) Toulbar2 Platform: The backbone of the software development is the Toulbar2 platform, which will host all the algorithmic developments of the project. The collective development environment (GitHub: github.com/toulbar2/toulbar2) that hosts the Toulbar2 software platform will provide support for the coordinated development of the software.

Toulbar2 is a state-of-the-art solver dedicated to Cost Function Networks (CFNs). It includes both exact methods and sophisticated meta heuristics such as Variable Neighborhood Search (toulbar2-VNS). This solver has won several medals in past competitions for Deterministic and Probabilistic Graphical Models, including: Max-CSP 2008 Competition (winner in the 2-ARY-EXT and N-ARY-EXT tasks), Probabilistic Inference Evaluation at UAI 2008 (winner in several MPE tasks), Probabilistic Inference Challenge at UAI 2010 (winner in the 1200-second MPE task), Probabilistic Inference Challenge PIC 2011 (second place by Ficolofo in the 1-hour MAP task), UAI 2014 Inference Competition (winner in all MAP task categories: Proteus, Robin, and IncTb entries), XCSP3 2022 Competition (second place in Mini COP and Parallel COP categories), UAI 2022 Inference Competition (winner in all MPE and MMAP task categories), XCSP3 2023 Competition (first place in Mini COP and second place in Parallel COP categories).

Results

Several scientific outcomes are expected from the GMLaS project:

1/ Solving large-scale CPD instances. The first result of the project will be the potential solution of challenging, very large instances of the Computational Protein Design (CPD) problem, in terms of both computation time and solution quality.

2/ New features of the Toulbar2 solver. A second result of the project will be the enhanced Toulbar2 solver, which will include the following features:
- An automatic configuration tool to efficiently adjust its parameters, thus improving the solver’s efficiency and robustness.
- Heuristic learning for branching and cost function propagation.
- Learning hybrid representations of CPD instances that combine dense pairwise functions with sparse higher-order interactions.

3/ Learning capabilities for Toulbar2-VNS. The Toulbar2-VNS meta-heuristic and its variants will be enhanced with learning capabilities to generate relevant neighbourhoods, exploiting data produced during searches to guide the resolution toward high-quality solutions.

Submission summary

Current research areas in the convergence of Combinatorial Optimization (CO) and ML are emerging as a game-changing force in the realm of AI. Such fusion of two distinct facets of the solving paradigm has recently gained traction in the CO field, becoming a ubiquitous technology across a wide range of applications. One area that can significantly benefit from the use of ML is constraint reasoning and optimization. Constraint Programming (CP) is considered as one of the foremost paradigms for solving combinatorial problems in AI. CP technology offers a generic way of modeling and solving combinatorial problems from various domains. The hybridization of Data Mining, Machine, and Deep Learning with state-of-the-art discrete reasoning and optimization is one of the major challenges in Artificial Intelligence. Our proposal explores the challenge of data-driven decision-making through a combined machine learning (ML), discrete reasoning/optimization framework, and associated solver: Graphical Models (GMs) and the toulbar2 solver (winner of the UAI 2022 solver competition on two discrete optimization/counting tasks). Discrete GMs, especially additive GMs such as Cost Function Networks (CFNs) and Markov Random Fields are attractive because they can represent logical and probabilistic information seamlessly. Then, toulbar2 is a SOTA solver for reasoning on such discrete models. It includes both exact solvers and sophisticated meta-heuristics. GMLaS is a significant step toward the ambitious ultimate goal of integrating ML/DM tools to Cost Function Network paradigm, leading to a hybrid data-driven solving framework, dealing natively with optimization problems and with the ambition to solve larger and more complex real problems. One example of problem the project will focus on is computational protein design (CPD), an area in which one of the project partners has deep expertise.

Samir Loudni (Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

MIAT Institut national de recherche pour l'agriculture, l'alimentation et l'environnement
LS2N Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire
GREYC Université de Caen Normandie

Help of the ANR 549,463 euros
Beginning and duration of the scientific project: February 2025 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.