Artificial Intelligence for All – HUMANIA
With the current rapid growth of AI research and applications, there are both unprecedented opportunities and legitimate worries about its potential misuses. In this context, we are committed to help making AI easier to access and use by a large population segment. Making AI more accessible to all should both be an important factor of economical growth and help strengthen democracy.
This proposal focuses on data-driven AI. The proposed research aims at reducing the need for human expertise in the implementation of pattern recognition and modeling algorithms, including Deep Learning, in various fields of application (medicine, engineering, social sciences, physics), using multiple modalities (images, videos, text, time series, questionnaires). To that end, we will organize scientific competitions (or challenges) in Automated Machine Learning (AutoML). Our designs will expose the community to progressively harder and more diverse settings, ever reducing the need for human intervention in the modeling process. By involving the scientific community at large in challenge-solving, we will effectively multiply by an important factor our government funding to solve such hard AutoML problems. All winners' code will be open-sourced. This effort will culminate in an AutoRL challenge (Automated Reinforcement Learning) in which participants will have to submit code that will be blind tested on new RL tasks they have never seen before.
Recognizing that there is no good data-driven AI without good data, we also want to dedicate part of our time to educate the public on proper data collection and preparation. Our objective is to instill good practices to reduce problems resulting from bias in data or irreproducible results due to lack of sufficient data. We will also encourage the protection of data confidentiality or privacy by supplying software allowing data donors to replace real data by realistic synthetic data. This will facilitate broadening access to data confidential or private data having a commercial value or the potential to harm individuals.
One original complementary aspect of our proposal is to turn past high profile scientific or industrial challenges into simplified templates, using place holder data (e.g. synthetic data, as described above), and including ready-made starter solutions (e.g. derived from winning solutions of past challenges). Such templates will showcase a wide variety of data-driven AI applications, to trigger the imagination of entrepreneurs world-wide, with not particular AI expertise. By simply cloning a template and replacing the data, an organisation could get immediate baseline results and eventually refine them by opening the challenge as an internal or external competition. To facilitate this process, we will make available for free our open-source challenge platform Codalab and will provide extra computational resources on the platform, based on merit and need of the challenge organizers.
Our proposal will further fundamental scientific research in several directions. We will work on theoretical guarantees that can be offered to mitigate utility and privacy in realistic synthetic data. We will also work on theoretical guarantees for AutoML in the context of "any-time" and "any-resource" learning, namely guarantees of performance when computational resources are scarce and human intervention is minimal.
Our proposal to make AI more accessible to all should benefit to all layers of the society. It will facilitate teaching AI and engaging a new generation of students in the study of AI. In particular, challenges in the classroom have already proved effective as a pedagogic medium. It will also benefit to low budget entrepreneurs to bring AI solutions to domains, which do not attract investments from large corporations, and should make startups bloom. Finally, volunteers who want to contribute to AI for good will have a platform to quickly put together applications by cloning challenge templates.
Madame Isabelle Guyon (Université Paris Sud + Laboratoire de Recherche en Informatique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
UPSud + LRI Université Paris Sud + Laboratoire de Recherche en Informatique
Help of the ANR 600,000 euros
Beginning and duration of the scientific project: August 2020 - 48 Months