A central problem in practical use of statistical models is the interpretability of a model. In many applications it is quite useful to construct a scoring system which can be defined as a sparse linear model where coefficients are simple, having few significant digits, or are even integers. Ideally, a scoring system is based on simple arithmetic operations, is sparse, and can be easily explained by human experts.
In this project, we challenge the problem of automated interpretable score learning purely from data.
Our main motivation from real applications is to construct simple rules which are meaningful for human experts and can be used by healthcare providers.
Our goal is to introduce an original methodology to learn adaptive interpretable discretisation, and the scores associated with the learned categories (or thresholds).
We aim to propose cost-sensitive heterogeneous cascading scoring systems taking into consideration the needs of physicians, costs of data acquisition, medical treatment, and introduce penalties for predicting wrong diagnosis. To realize this challenging task, we will develop multi-stage learning under budget.
The project is extremely interdisciplinary, since there is a need to work on the intersection of statistical machine learning, decision making, optimization, and medicine. The DiagnoLearn unifies researchers whose expertise covers all mentioned domains.
Madame Nataliya Sokolovska (Unité de recherche sur les maladies cardiovasculaires, le métabolisme et la nutrition)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
UMR_S1166 Unité de recherche sur les maladies cardiovasculaires, le métabolisme et la nutrition
Help of the ANR 266,760 euros
Beginning and duration of the scientific project:
December 2017
- 36 Months