DS0705 - Fondements du numérique

Extraction and Transfer of Knowledge in Reinforcement Learning – ExTra-Learn

Methods or technologies used

The methodology followed to develop transfer algorithms differ from the standard design of RL
methods and requires answering three different questions:
• Which “prior” knowledge is effective in improving the learning performance? A first important
step will be to develop different models of knowledge. Optimal policies, subpolicies, features,
sampling strategy, value function, raw samples, are examples of the elements characterizing
RL problems and solutions. The choice of which knowledge to use will be directed by the
evidence that if a human supervisor was able to provide it in the form of prior knowledge,
then the learning algorithm would be able to significantly improve its performance. While in
many cases such evidence is already available (e.g., features adapted to the target value
functions), there are still many scenarios where even if explicit prior knowledge were
available, no RL algorithm would be able to provably take advantage of it.
• How can “prior” knowledge be automatically learned? While in some cases the knowledge of
interest is immediately available after the interaction with a task (e.g., samples), there are
more sophisticated forms of knowledge (e.g., an underlying low-dimensional representation
of value functions) that require defining a specific learning process.
• How can knowledge be integrated into a transfer learning process? In order to obtain a full
transfer algorithm, we need to both collect useful knowledge from past tasks and transfer it
while solving new tasks. In some cases, this may not be trivial and may require solving a
trade-off between learning the desired knowledge as fast and efficiently as possible (e.g., by
performing a thorough exploration of the environment in many tasks) and exploiting the
current learned knowledge to improve the performance of the learning process.

Results

In the empirical evaluation of the proposed algorithms we will study
different forms of improvements. In particular, we expect different qualitative
improvements from a transfer RL algorithm:
• Jumpstart. This improvement only considers the very initial stages of the learning process and
it is mostly related to the initialization of the algorithm. We can expect to achieve this
improvement whenever it is possible to develop a prior over the solution of all tasks, that, on
average, is more accurate then a standard non-informative initialization.
• Learning speed. In exploration-exploitation problems, the learning speed improvement
corresponds to the fact that the RL algorithm can actually reduce the amount of exploration of
the environment needed to find near-optimal policies. In approximated RL algorithms,
improving the learning speed corresponds to a reduction in the sample complexity, which
refers to the amount of samples needed to achieve a desired accuracy. In general, a learning
speed improvement does not change the asymptotic performance of the transfer RL algorithm,
which achieves the same level of performance as the no-transfer version.
• Asymptotic performance. Transfer learning and notably feature learning and hierarchical
decomposition may significantly affect the asymptotic performance of a RL algorithm. In fact, if
the approximation scheme changes (e.g., changing the basis function used in linear value
function approximation), the space of functions and policies that the RL algorithm can learn will
be affected as well. As a result, we expect that, if a transfer algorithm captures the common
structure among different tasks, it could be able to learn features that allow to increase the
asymptotic accuracy of the learning process.

Prospects

If the outcome of this research is positive, we expect to contribute to the
development of learning algorithms able to interact with complex environments in a much more
intelligent and autonomous way. This has a clear strategic role in facing the future challenges of a
digital society where intelligent and autonomous systems will be more pervasive and ubiquitous.
In the long term, we envision decision-making support systems where transfer learning takes
advantage of the data available from different tasks (e.g., users) to construct high-level knowledge
that allows sophisticated reasoning and learning in complex domains. For instance, online
education systems will build on automatic schedulers that design specific agendas for each student.
While learning methods guarantee to find the most effective track of lessons for each user, transfer
algorithms will be able to dramatically reduce the exploration needed to discover the skills of the
student (e.g., preliminary exercises to assess her level and define the best learning strategy) to the
minimum thanks to effective exploration strategies refined over time and users. Furthermore,
personal fitness assistants will be able to access training data from thousands of users and
transfer methods will use them to recover the best low-dimensional feature representation, which
will be constantly adapted to make learning the best fitness plan for a new user more accurate.
Finally, complex robotic tasks will be automatically decomposed in a hierarchy of subtasks
whose solutions will be transferred and reused from task to task.

Scientific productions and patents

The results of the project will be primarily disseminated within the machine learning community.
The theoretical, algorithmic, and empirical results will be published in major national and
international conferences and journals.

Submission summary

In the near future, intelligent and autonomous systems will become more ubiquitous and pervasive in applications such as autonomous robotics, design of intelligent personal assistants, and management of energy smart grids. Although very diverse, these applications call for the development of decision-making systems able to interact and manage open-ended, uncertain, and partially known environments. This will require increasing the autonomy of ICT systems, which will have to continuously learn from data, improve their performance over time, and quickly adapt to changes.
EXTRA-LEARN is directly motivated by the evidence that one of the key features that allows humans to accomplish complicated tasks is their ability of building knowledge from past experience and transfer it while learning new tasks. We believe that integrating transfer of learning in machine learning algorithms will dramatically improve their performance and enable them to solve complex tasks. We identify in the reinforcement learning (RL) framework the most suitable candidate for this integration. RL formalizes the problem of learning an optimal control policy from the experience directly collected from an unknown environment. Nonetheless, practical limitations of current algorithms encouraged research to focus on how to integrate prior knowledge into the learning process. Although this improves the performance of RL algorithms, it dramatically reduces their autonomy. In this project we pursue a paradigm shift from designing RL algorithms incorporating prior knowledge to methods able to incrementally discover, construct, and transfer “prior” knowledge in a fully automatic way. More in detail, three main elements of RL algorithms would significantly benefit from transfer of knowledge.
(i) For every new task, RL algorithms need exploring the environment for a long time, and this corresponds to slow learning processes for large environments. Transfer learning would enable RL algorithms to dramatically reduce the exploration of each new task by exploiting its resemblance with tasks solved in the past.
(ii) RL algorithms evaluate the quality of a policy by computing its state-value function. Whenever the number of states is too large, approximation is needed. Since approximation may cause instability, designing suitable approximation schemes is particularly critical. While this is currently done by a domain expert, we propose to perform this step automatically by constructing features that incrementally adapt to the tasks encountered over time. This would significantly reduce human supervision and increase the accuracy and stability of RL algorithms across different tasks.
(iii) In order to deal with complex environments, hierarchical RL solutions have been proposed, where state representations and policies are organized over a hierarchy of subtasks. This requires a careful definition of the hierarchy, which, if not properly constructed, may lead to very poor learning performance. The ambitious goal of transfer learning is to automatically construct a hierarchy of skills, which can be effectively reused over a wide range of similar tasks.
Providing transfer solutions for each of these elements sets our objectives and defines the research lines of EXTRA-LEARN. The major short-term impact of the project will be a significant advancement of the state-of-the-art in transfer and RL, with the development of a novel generation of transfer RL algorithms, whose improved performance will be evaluated in a number of test beds and validated by a rigorous theoretical analysis. In the long term, we envision decision-making support systems where transfer learning takes advantage of the massive amount of data available from many different tasks (e.g., users) to construct high-level knowledge that allows sophisticated reasoning and learning in complex domains, with a dramatic impact on a wide range of domains, from robotics to healthcare, from energy to transportation.

Michal Valko (Institut National de Recherche en Informatique et Automatique)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Inria Institut National de Recherche en Informatique et Automatique

Help of the ANR 251,400 euros
Beginning and duration of the scientific project: September 2014 - 42 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.