CE23 - Données, Connaissances, Big data, Contenus multimédias, Intelligence Artificielle

QualiHealth: Enhancing the Quality of Healthcare Data – QualiHealth

Submission summary

Hospitals and life-science institutes produce a tremendous amount of data on
a daily basis during the healthcare process and ordinary scientific
activity. Such data are highly valuable as they can be used to improve the
process of care delivery and prevention and can also play a pivotal role in
prospective clinical research. However, clinical, biological and imaging
data are usually gathered by means of diverse data collection channels and
procedures exhibiting a diverse degree of reliability and trustability. As
a consequence, the collected data is usually scattered over heterogeneous
data sources and suffers from quality problems that hampers its use for
analysis purposes.

Classical data quality issues can be observed, including missing or
erroneous data, and also more complex problems can be perceived, for
example due to secondary use in different contexts than the ones they were
meant to be collected for. Additionally, the distribution of data can
evolve over time creating “data-glitches” than can cause interpretation
errors of high severity.

Today, no system is able to assist the clinicians and researchers in a
quality-aware exploration of their data. Overall, the lack of quality
indicators strongly limits an in-depth use of healthcare data in
translational research. We argue that more analyses of increasing
complexity and more interactions between clinical and pre-clinical medical
research would be feasible if the available data were annotated with
quality indicators, and if such quality indicators were also employed in
the querying and analysis of the available data.

This research proposal is geared toward a system capable of capturing and
formalizing the knowledge of data quality from domain experts, enriching
the available data with this knowledge and thus exploiting this knowledge
in the subsequent quality-aware medical research studies.

We expect a quality-certified collection of medical and biological
datasets, on which quality-certified analytical queries can be formulated.
We envision the conception and implementation of a quality-aware query
engine with query enrichment and answering capabilities. To reach this
ambitious objectives, the following concrete scientific goals must be
fulfilled :

An innovative research approach, that starts from concrete datasets and
expert practices and knowledge to reach formal models and theoretical
solutions, will be employed to elicit innovative quality dimensions and to
identify, formalize, verify and finally construct quality indicators able
to capture the variety and complexity of medical data;
those indicators have to be composed, normalized and aggregated when
queries involve data with different granularities (e.g., accuracy
indications on pieces of information at the patient level have to be
composed when one queries cohort) and of different quality dimensions
(e.g., mixing incomplete and inaccurate data);
In turn, those complex aggregated indicators have to be used to provide new
quality-driven query answering, refinement, enrichment and data analytics
techniques. A key novelty of this project is the handling of data which are
not rectified on the original database but sanitized in a query-driven
fashion: queries will be modified, rewritten and extended to integrate
quality parameters in a flexible and automatic way.

The adequacy of our declarative specification of quality indicators, and
the efficiency of query refinement and query answering, along with analytical tasks
leveraging such indicators will be assessed by domain experts on real
representative datasets collected by the project consortium.

Project coordinator


The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


LIMOS Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes
INSERM U1016 Institut Cochin
LIS Laboratoire d'Informatique et Systèmes
UBC University of British Columbia / Department of Computer Science

Help of the ANR 744,591 euros
Beginning and duration of the scientific project: January 2019 - 48 Months

Useful links