ChairesIA_2019_2 - Chaires de recherche et d'enseignement en Intelligence Artificielle - vague 2 de l'édition 2019

Knowledge And Representation Integration on the Brain – KARAIB

Knowledge And RepresentAtion Integration on the Brain

Cognitive science describes mental operations, and functional brain imaging provides a unique window into the brain systems that support these operations. Neuroimaging research has provided significant insight into the relations between psychological functions and brain activity. However, obtaining a systematic mapping between structure and function faces the roadblock that cognitive concepts are ill-defined and may not map cleanly onto the computational architecture of the brain.

Leveaging existing cognitive neuroscience resources through machine learning

To tackle this challenge, we propose to leverage rapidly increasing data sources: text and brain locations described in neuroscientific publications, brain images and their annotations taken from public data repositories, and several reference datasets. Our aim here is to develop multi-modal machine learning techniques to bridge these data sources. Aim 1 develops representation techniques for noisy data to couple brain data with descriptions of behavior or diseases, in order to extract semantic structure. Aim 2 challenges these representations to provide explanations to the observed relationships, based on two frameworks: i) a statistical analysis framework; ii) integration into a domain-specific language. Aim 3 outputs readily-usable products for neuroimaging: atlases and ontologies and focuses on implementation, with contributions to neuroimaging web-based data sharing tools.

Multi-modal machine learning

We will use 3 types of technologies
* Natural language processing techniques to extract information from large corpora of publications
* Multivariate analysis methods to extract information from neuroimaging datasets
* Deep neural networks to build intermediate representations that mix information from different sources of information (corpora of publications, repositories of images).

Results

We describe two main results below:
1. Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant conditions such as movie watching, where the experimental setup does not imply well-defined cognitive operations. We propose a novel MultiView Independent Component Analysis (ICA) model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. Contrary to most group-ICA procedures, the likelihood of the model is available in closed form. We develop an alternate quasi-Newton method for maximizing the likelihood, which is robust and converges quickly. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects. Moreover, the sources recovered by our model exhibit lower between-session variability than other methods. On magnetoencephalography (MEG) data, our method yields more accurate source localization on phantom data. Applied on 200 subjects from the Cam-CAN dataset it reveals a clear sequence of evoked activity in sensor and source space.
2. We consider shared response modeling, a multi-view learning problem where one wants to identify common components from multiple datasets or views. We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances. We then show that in some cases multi-set canonical correlation analysis can recover the correct unmixing matrices, but that even a small amount of sampling noise makes Multiset CCA fail. To solve this problem, we propose to use joint diagonalization after Multiset CCA, leading to a new approach called ShICA-J. We show via simulations that ShICA-J leads to improved results while being very fast to fit. While ShICA-Jis based on second-order statistics, we further propose to leverage non-Gaussianity of the components using a maximum-likelihood method, ShICA-ML, that is both more accurate and more costly. Further, ShICA comes with a principled method for shared components estimation. Finally, we provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.

Prospects

We will futrher develop generative models of brain imaging data.

Scientific productions and patents

We produce open-source software, and open publications.

Submission summary

Cognitive science describes mental operations, and functional brain imaging provides a unique window into the brain systems that support these operations. A growing body of neuroimaging research has provided significant insight into the relations between psychological functions and brain activity. However, the aggregation of cognitive neuroscience results to obtain a systematic mapping between structure and function faces the roadblock that cognitive concepts are ill-defined and may not map cleanly onto the computational architecture of the brain.

To tackle this challenge, we propose to leverage rapidly increasing data sources: text and brain locations described in neuroscientific publications, brain images and their annotations taken from public data repositories, and several reference datasets. Our aim here is to develop multi-modal machine learning techniques to bridge these data sources. We further specify it into three main aims:

Aim 1 develops representation techniques for noisy data to couple brain data with descriptions of behavior or diseases, in order to extract semantic structure.

Aim 2 challenges these representations to provide explanations to the observed relationships, based on two frameworks: i) a statistical analysis framework; ii) integration into a domain-specific language.

Aim 3 outputs readily-usable products for neuroimaging: atlases and ontologies and focuses on implementation, with contributions to neuroimaging web-based data sharing tools.

Bertrand THIRION (Centre de Recherche Inria Saclay - Île-de-France)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Inria Saclay - Ile-de-France - équipe PARIETAL Centre de Recherche Inria Saclay - Île-de-France

Help of the ANR 534,600 euros
Beginning and duration of the scientific project: February 2020 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.