CE45 - Mathématiques et sciences du numérique pour la biologie et la santé

Federated statistical learning for new generation meta-analysis of large-scale and secured biomedical data – FED-BIOMED


Federated Statistical Learning for New Generation Meta-Analyses of Large-scale and Secured Biomedical Data

Scalable and reliable federated learning in healthcare

The initial objectives of the project consisted in developing a methodological and computational framework for the effective application of federated learning in the domain of healthcare, with a particular focus in medical imaging applications. <br />From the methodological perspective (WP1), the proposal identified the need of developing novel frameworks adapted to the federation of probabilistic models (such as Gaussian processes and Bayesian neural networks). Novel optimization techniques were envisaged to enable scalable optimization in a federated setting, as well as the development of secure mechanisms for parameter sharing.<br />This kind of methodology was intended to be applied in translational research (WP2), in particular in the field of imaging-genetics in brain imaging applications, and on cardiac image analysis in a French multi-centric study.<br />From the computation perspective (WP3), the project proposed the development of a dedicated software package and network infrastructure to deploy federated learning in the target applications.

The project contributes to the emerging field of federated learning.
It extends the federated optimization paradigm to the Bayesian setting, and develops novel scalable approaches to probabilistic modeling and prediction from heterogeneous data of potentially high-dimension. From the technical point of view, we are developing our federated learning framework through a self-contained software framework that can be securely deployed across different centers and collaborators. Finally, from the translational point of view, we are working to demonstrate our federated learning initiative on several clinical applications with a variety of hospital and research partners.

- We developed a novel Bayesian framework for federated learning with heterogeneous and missing data. The proposed approach formulated federated learning as a hierarchical modeling problem, where variability is modelled coherently at both clients and server level. The proposed approach was demonstrated on the analysis of heterogeneous data (multi-modal brain images and clinical information) in Alzheimer’s disease.
- We proposed a novel federated learning scheme called « clustered sampling », in which the heterogeneity of clients can be better taken into account during the parameters aggregation step, leading to improved convergence speed and robustness of the final federated model.
- We investigated a novel kind of weakness of standard federated learning schemes, named « free-riding ». This weakness arises when malignant clients develop strategies to obtain the federated learning result (the final model), without however contributing with any data during the optimization process.

We are currently consolidating and preparing the application of our framework on the data provided by the partners.
From the administrative side, we obtained the approval from the Inria security department for the first deployment of a simplified version of the software on the hospital data, and we are currently discussing with the Inria DPO for the full application of our framework compatibly with GDPR.

An intense development activity has been carried on since the beginning of the project. A scientific paper on the software Fed-BioMed has been published on the workshop « Distributed and Collaborative Learning » 2020, organised by NVIDIA. The software is accessible at the project’s page: fedbiomed.gitlabpages.inria.fr

In the next steps of the project

- we will further investigate the privacy mechanisms related to our Bayesian framework, and will develop novel strategies to account for asynchronous contribution of the clients to federated learning.
- we aim to obtain the complete approval for the use of our software, and move forward with the installation and setup of Fed-BioMed in the proposed applications.
- we will start the translation of the project methodology and framework on the proposed clinical application

- 5 papers published in high impact scientific conferences (e.g. ICML, AISTATS, IPMI, MICCAI)
- Strong dissemination activity through invited talks
- Organisation of the Special Session on Security and Fairness in Collaborative Healthcare Data Analysis (https://biomedicalimaging.org/2021/special-sessions/) at the past edition of the International Symposium on BIomedical Imaging (ISBI 2021).
- Depot APP for software Fed-BioMed
- Collaboration agreement with Accenture Labs for contributing to the development of the software Fed-BioMed
- Collaboration with hospital Centre Antoine Lacassagne of Nice
- Additional funding obtained from Université Côte d’Azur and Inria through the French National Artificial Intelligence Research Program

Applying statistical learning to healthcare data must comply with anonymity, security, and non-transferability of data across centers, while accounting for overwhelming data dimension and variability. Fed-BioMed will tackle this challenge through methodological, technical, and translational advances. We will account for data complexity and uncertainty by reformulating Bayesian non-parametric modeling in a federated setting. In this way, training on secured and multicentric data can be performed without sharing individual information, but only parameter distributions. Reduced communication costs and risk of information leakage are obtained by leveraging on the settings of variational inference and differential privacy. Fed-BioMed will allow us to exploit data for
thousands of individuals from two of the largest existing multi-centric studies: imaging-genetics analysis in the ENIGMA consortium, and sudden cardiac death prediction from a network of French clinical sites.

Project coordination

Marco Lorenzi (Centre de Recherche Inria Sophia Antipolis - Méditerranée)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


Inria Centre de Recherche Inria Sophia Antipolis - Méditerranée
CMIC University College London / Centre for Medical Image Computing
Illinois Institute of Technology / ARMOUR COLLEGE OF ENGINEERING

Help of the ANR 196,059 euros
Beginning and duration of the scientific project: February 2020 - 42 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter