BenchArk - An efficient and robust benchmarking suite for AI – BenchArk
Numerical evaluation of novel methods, a.k.a. benchmarking, is a pillar of the scientific method in machine learning.
However, due to practical and statistical obstacles, the reproducibility of published results is currently insufficient: many details can invalidate numerical comparisons, from insufficient uncertainty quantification to improper methodology.
In 2022, the benchopt initiative (https://benchopt.github.io) provided an open-source Python package together with a framework to seamlessly run, reuse, share, and publish benchmarks in numerical optimization.
In this project, we aim to bring Benchopt to the whole machine learning community, making it a new standard in benchmarking by empowering researchers and practitioners with efficient and valid benchmarking methods.
Our goal is to ensure reproducibility and consistency in model evaluation.
We will federate the machine learning community to develop informative and statistically valid benchmarks while providing methods to reduce identified hurdles in implementing such practices.
The results of the project will be integrated into the open-source Benchopt library.
Project coordination
Thomas MOREAU (Centre Inria de Saclay)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Inria Saclay - MIND/SODA Centre Inria de Saclay
IMAG Institut Montpelliérain Alexander Grothendieck
OCKHAM Optimisation, Connaissances pHysiques, Algorithmes et Modèles
Help of the ANR 588,611 euros
Beginning and duration of the scientific project:
September 2024
- 48 Months