We propose to combine machine learning and control theory for sequential decision-making of multiple agents. The project proposes fundamental contributions: adding stability to the algorithms of reinforcement learning; data driven methods for robust control; hybrid ML / CT methods for multi-horizon control and planning; decentralized control. The methodological contributions of this fundamental IA project will be applied to the robust control of UAV fleets
We combine machine learning (ML) and control theory (CT) and address problems in control from two perspectives: <br />- What can be modelled (CT) and what needs to be learned (CT)? <br />- Can we provide, estimate or guarantee stability (CT)? <br />- Can we estimate the complexity of the learning task and or the amount of data needed (ML)? <br />- Can we provide auxiliary objectives for more efficient learning and/or easier data creation (ML)?
Several different methodologies of machine learning (ML) and control theory (CT) are used, extended, and/or combined in innovative ways:
- Reinforcement learning and deep reinforcement learning (ML)
- System identification (CT) with learned components (ML)
- Observer design (CT) and state representation (learning)
- Hybrid control with learned controllers (ML) with added stability constraints (CT)
- Hybrid control with designed controllers (CT) with added learned components (ML)
- Observers for dynamical systems can be learned in an unsupervised way with guarantees
- Forecasting the future of dynamical systems with estimates of the error committed by the system
- Forecasting the future of mechanical systems directly in observed pixel space and including notions of causality
- Adding stability to reinforcement learning
- Different algorithms for hybrid control (ML+CT)
- Multi-agent reinforcement learning in asymmetric information access settings
Apart from the large amount of scientific work during the first part of the project, resulting in a large number of written papers, we think that arguably the biggest success is the tight integration of the project partners (see section III.5.1 and Figure 11 of the report), and multi-disciplinary nature of the project, leading to a better understanding of the respective partner domains — machine learning for the process control partners, and control theory for the ML partners. The co-authored papers are the result of genuine collaborations of a consortium interested in learning new scientific directions. The gain is new increased knowledge for the consortium, but also novel scientific contributions for the field (see list of papers in section “scientific production”.
The second half of the project will extend this work to multiple agents and target UAV scenarios. In terms of theory, we will start to address sample complexity and algorithmic stability in control scenarios.
=== Accepted papers (as of 31.3.2021)
 OK. Kocan, D. Astolfi, C. Poussot-Vassal, and A. Manecy. Supervised Output Regulation via Iterative Learning Control for Rejecting Unknown Periodic Disturbances. In IFAC, 2020.
 J. Peralez, F. Galuppo, P. Dufour, C. Wolf, and M. Nadri. Data-driven multi-model control for a waste heat recovery system. In CDC, 2020.
 Yuxuan Xie, J. Dibangoye, and Olivier Buffet. Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing. In ICML 2020.
=== Submitted papers (as of 31.3.2021)
 J. Peralez and M. Nadri. Deep Learning-based Luenberger observer design for discrete-time nonlinear systems. In pre-print arXiv (Submitted to CDC 2021), 2021.
 S. Janny, V. Andrieu, M. Nadri, and C. Wolf. Deep KKL: Data-driven Output Prediction for Non-Linear Systems. pre-print arXiv pending (Submitted to CDC 2021), 2021.
 S. Zoboli, V. Andrieu, D. Astolfi, G. Casadei, J. Dibangoye, and M. Nadri. Reinforcement Learning Policies with local LQR guarantees for Nonlinear Discrete-Time Systems. Arxiv pending, submitted to CDC, 2021.
=== Papers in writing (as of 31.3.2021)
 S. Janny, F. Baradel, N. Neverova, M. Nadri, G. Mori, and C. Wolf. FilteredCoPhy — Un- supervised and Counterfactual Learning of Physical Dynamics. Submission planned to IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
 J. Dibangoye and Yuxuan Xie. Learning to Act Optimally in Decentralized POMDPs Under Hierarchical Information Sharing. Submission planned for NeurIPS 2021.
The last years have witnessed the soaring of Machine Learning (ML), which has provided disruptive performance gains in several fields. Apart from undeniable advances in methodology, these gains are often attributed to massive amounts of training data and computing power, which led to breakthroughs in speech recognition, computer vision and natural language processing. In this project, we propose to extend these advances to sequential decision making of multiple agents for planning and control. We particularly target learning realistic behavior with multiple horizons, requiring long-term planning at the same time as short-term fine-grained control.
In the context of decentralized control of agents like UAVs, mobile robots etc, DeLiCio proposes fundamental contributions on the crossroads between IA/ML and Control Theory (CT), the second key methodology of this project, together with ML. The two fields, while being distinct, have a long history of interactions between them and as both fields mature, their overlap is more and more evident. CT aims to provide differential model-based approaches to solve stabilization and estimation problems. These model-driven approaches are powerful because they are based on a thorough understanding of the system and can leverage established physical relationships. However, nonlinear models usually need to be simplified and they have difficulty accounting for noisy data and non modeled uncertainties.
Machine Learning, on the other hand, aims at learning complex models from (often large amounts of) data and can provide data-driven models for a wide range of tasks. Markov Decision Processes (MDP) and Reinforcement Learning (RL) have traditionally provided a mathematically founded framework for control applications, where agents are required to learn policies from past interactions with an environment. In recent years, this methodology has been combined with deep neural networks, which play the role of high-capacity function approximators, and model the discrete or continuous policy function or a function of the cumulated reward of the agent, or both.
While in many applications learning has become the prevailing methodology, process control is still a field where control engineering cannot be replaced for many low level control problems, mainly due to lack of stability of learned controllers, and computational complexity in embedded settings.
DeLiCio proposes fundamental research on the crossroads of ML/IA and CT with planned algorithmic contributions on the integration of models, prior knowledge and learning in control and the perception action cycle:
- data-driven learning and identification of physical models for control;
- state representation learning for control;
- stability and robustness priors for reinforcement learning;
- stable decentralized (multi-agent) control using ML and CT.
The planned methodological advances of this project will be evaluated on a challenging application requiring planning as well as fine-grained control, namely the synchronization of a UAV swarm through learning. The objective is to learn strategies, which allow a swarm to solve a high-level goal (navigation, visual search) while at the same time maintaining a formation.
Monsieur Christian Wolf (UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LAGEPP LABORATOIRE D'AUTOMATIQUE ET DE GENIE DES PROCEDES
CITI CENTRE D'INNOVATION EN TELECOMMUNICATIONS ET INTEGRATION DE SERVICES
ONERA Département Traitement de l'Information et Systèmes
LIRIS UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION
Help of the ANR 533,072 euros
Beginning and duration of the scientific project: September 2019 - 48 Months