Neuro-Incremental Reinforcement Learning from Human Preferences – NeuRL
In the near future, intelligent agents will be ubiquitous in our daily lives, replacing or assisting humans on a variety of tasks. Reinforcement Learning (RL) is a framework for learning such sequential decision making tasks from data. RL has had several achievements, particularly in game domains, obtained by combining RL and deep neural networks. While impressive, these results required large teams of researchers adapting RL algorithms to each task. In contrast, we expect intelligent agents to solve decision problems on the fly with at most the feedback of task experts, not RL experts. A use case studied in this proposal is an AI managing a farm throughout a harvesting season, building upon our team’s prior work to develop high quality RL environments for agriculture. The particularity of our setting is the personalisation of the task to each farmer’s preferences. Current RL methods would require an RL expert both for the definition of the problem, especially the reward function, and to overcome the well documented instability of RL. To tackle these limitations, we propose an approach for combining neural networks and RL that is novel both in the morphology of the networks and their usage, to produce more stable, closed-form updates. Specifically, the networks will grow in size during learning, allowing a closed form entropy-regularised policy update, and will aggregate the state space, instead of directly modelling value functions, allowing a closed form computation of the value function for the resulting abstract Markov decision process model. The model-based nature of our framework is also key for eliciting the user’s preferences as it requires solving a sequence of RL problems, which is sample inefficient with model-free approaches. Our contributions will be validated on the aforementioned farm management tasks that will have to be learned solely from high-level human feedback, without any intervention from an RL expert.
Project coordination
Riad AKROUR (Centre Inria de l'Université de Lille)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Inria Centre Inria de l'Université de Lille
Help of the ANR 292,376 euros
Beginning and duration of the scientific project:
February 2024
- 48 Months