Beyond Online Learning for better Decision making – BOLD
Beyond Online Learning for better Decision making
There are currently three main barriers to a broader development of online learning.<br />1) The classical «one step, one decision, one reward« paradigm is unfit.<br />2) Optimality is defined with respect to worst-case generic lower bounds and<br />mechanics behind online learning are not fully understood.<br />3) Algorithms were designed in a non strategic or interactive environment.
Reactive ML algorithms adapt to data generating processes, typically do not require large computational power and, moreover, can be translated into offline (as opposed to online) algorithms<br />if needed. Introduced in the 30s in the context of clinical trials, online ML algorithms have been<br />gaining a lot of theoretical interest for the last 15 years because of their applications to the optimization of recommender systems, click through rates, planning in congested networks, to name<br />just a few. However, in practice, such algorithms are not used as much as they should, because the<br />traditional low-level modelling assumptions they are based upon are not appropriate, as it appears.<br />Instead of trying to complicate and generalise arbitrarily a framework unfit for potential applications, we will tackle this problem from another perspective. We will seek a better understanding<br />of the simple original problem and extend it in the appropriate directions.<br />There are currently three main barriers to a broader development of online learning.<br />1) The classical «one step, one decision, one reward« paradigm is unfit.<br />2) Optimality is defined with respect to worst-case generic lower bounds and<br />mechanics behind online learning are not fully understood.<br />3) Algorithms were designed in a non strategic or interactive environment.
The BOLD project has two interconnected sides: on the one hand, proving theoretical results on
new models motivated by practical applications; on the other hand, implementing our novel algorithms and solutions. Obviously, the project will unfold in successive phases. First, to understand
that state-of-the art models are irrelevant and to create relevant models (this step is already well
advanced prior to this proposal). This has been possible via to our collaboration with industrial
partners. The second step is the theoretical study of these models. Thanks to the variety of expertise and experiences of the members of the different partners of the BOLD consortium, we have
intuitions on why current solutions will fail and how to tackle efficiently these problems.
Of course, it is utopian to provide guarantees that theoretical works will come up with optimal
solutions. But if one of them fails, using our experience on real life practical problems, we are
confident enough to be able to provide solutions working at least empirically (and, in the worst
case that they do not work either, we will be in position of claiming that an approach, albeit
attractive on the paper, is useless in practice).
Implementation of our (positive) results will follow, thanks to the help of our industrial partners
and their willingness to test new, breakthrough solutions.
Many publications (see annex)
The project "Beyond Online Learning for better Decision making" (BOLD) is at the junction of learning theory, statistics, optimisation and game theory.
Starting from the fact that underlying hypothesis on existing online learning are not satisfied in practice, it is possible to identify several barriers to a massive implementation of these techniques
- the classical "one step, one decision, one reward" paradigm is unfit.
- Optimality of algorithms is defined with respect to worst-case generic lower bounds
- Algorithms were designed in a non-strategic or interactive environment.
The objectives of the project are therefore to get rid of these barriers that prevent a larger development of online learning
1) Go beyond "one data, one decision" by noticing that the general idea that data must be treated on the fly one after the other one is not compulsory. Quite often, it is possible to group them, to improve the quality of each decision taken (maybe at the cost of a smaller number of decisions - and rewards)
2) Go beyond "one data, one reward". The concept of minimization of cumulated loss is useful in theory, but not adapted for practical applications. Many learning or optimisation objectives are defined globally, as a non-linear functions of the sequence of decisions.
3) Use the existing underlying structure of the data to "beat the lower bounds". Data generating processes are not necessarily the worst ones that theory consider to evaluate the performances of algorithms. They have some specific particularities, through existing correlations (or similarities) between the data, that are known a-priori. Using them will accelerate rates of convergence, the learning phase and the overall performance of algorithms.
4) Incorporate other interacting and learning agents in the environment. Relevant problems now involve not only one learning agent, but a network of inter-acting agents with conflictual objectives. Investigating learning algorithms in a game theoretic framework is therefore highly needed.
5) Model and study application driven frameworks. Instead of generalizing arbitraritly, and sometimes in seemingly random directions, the already existing models, the project BOLD will question their foundation and the correctness of them, in order to propose new models in line with the applications in mind.
Monsieur Vianney Perchet (Center for Research in Economics and Statistics)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
IMT Institut de Mathématiques de Toulouse
INRIA LNE Centre de Recherche Inria Lille - Nord Europe
UPDESCARTES-MAP5 Mathématiques appliquées à Paris 5
CREST Center for Research in Economics and Statistics
Help of the ANR 270,527 euros
Beginning and duration of the scientific project: September 2019 - 48 Months