CE25 - Sciences et génie du logiciel - Réseaux de communication multi-usages, infrastructures de hautes performances

Efficient Planning and Learning for Resource Sharing – EPLER

Submission summary

Markov decisions processes (MDP) and their model free counterpart in reinforcement learning have known a large success in the last two decades. However, these successes often rely on quite exceptional hardware possibilities and cannot be applied in many "usual" context, where, for instance, the volume of data available or the amount of computing power is more restricted. To define the next generation of more "democratic" and widely applicable algorithms, such methods still need to deal with very demanding exploration issues. EPLER proposes to overcome this difficulty by exploiting the underlying knowledge and structure present in many MDPs. We will focus in particular on the so-called (rested) multi-armed bandit and restless multi-armed bandit problems, which provide a powerful optimization framework to model scheduling and resource sharing problems. Theory shows that index policies, which are easy to implement, are either optimal or nearly-optimal for bandit problems. Our first challenge will be to characterize performance guarantees for extensions of the restless bandit control problem and to address the case of correlated bandits, which is ubiquitous in resource sharing problems. In our second challenge, we will leverage structures of the optimal policies, i.e. strong optimality of index policies, to significantly improve both the exploration and the exploitation in the model free setting, as well as defining exploration schemes based on particle systems to tackle use cases with sparse rewards.

Project coordination

Matthieu Jonckheere (Laboratoire d'analyse et d'architecture des systèmes)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LAAS-CNRS Laboratoire d'analyse et d'architecture des systèmes
IRIT Institut National Polytechnique Toulouse

Help of the ANR 458,427 euros
Beginning and duration of the scientific project: October 2022 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter