CE40 - Mathématiques, informatique théorique, automatique et traitement du signal

Statistical Modeling and Inference for unsupervised Learning at largE-Scale – SMILES

Statistical Modeling and Inference for unsupervised Learning at LargE-Scale

Transform heterogeneous high-dimensional and potentially big data into structured knowledge via original latent variable models learned by controlled complexity algorithms

Large-scale latent variable models

Large-scale data analysis is an inherently multidisciplinary area and is becoming increasingly important in the today’s society. SMILES is a collaborative fundamental research project that aims at introducing an unsupervised statistical modeling framework and scaled inference algorithms for transforming large-scale data into knowledge. It considers the large-scale context as a whole, with its main issues related to the inference from a big volume of data of very high dimension and with underlying complex hidden structures. The key tenet of SMILES is to introduce large-scale regression-based sparse and (non-)parametric models for data representation, and large-scale latent data models for unsupervised data classification. The knowledge extraction will namely consist in automatically retrieving hidden structures, summarizing prototypes, groups, sparse representations. We consider different data settings, including functional data, multimodal bioacoustical data, and biological data.

- latent variable models for high-dimensional regression, including functional regression and functional regressions mixture models
- latent variable models for high-dimensional classification and clustering in high-dimensional scenarios
- latent variable models for unsupervised learning and bioacoustics

- theoretical, methodological and computational results on approximation, estimation and model selection capabilities in latent variable models, in particular for mixtures and mixtures of experts

deep latent variable models and distributed mixtures for clustering

see the mid-term report

Large-scale data analysis is an inherently multidisciplinary area and is becoming of broader interest for today's society. SMILES is a collaborative fundamental research project that aims at introducing an unsupervised statistical modeling framework and scaled inference algorithms for transforming large-scale data into knowledge. It considers the large-scale context as a whole, with its main issues related to inference from a big volume of data of very high dimension and underlying complex hidden structures. The key tenet of SMILES is to introduce large-scale regression-based sparse (non)parametric models for data representation, and large-scale latent data models for unsupervised data classification. The knowledge extraction will namely consist in automatically retrieving hidden structures, summarizing prototypes, groups, sparse representations. We consider different data settings, including functional data, multimodal bioacoustical data, and biological data.

Project coordination

Faicel CHAMROUKHI (LABORATOIRE DE MATHÉMATIQUES NICOLAS ORESME)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LMNO LABORATOIRE DE MATHÉMATIQUES NICOLAS ORESME
LMRS LABORATOIRE DE MATHEMATIQUES RAPHAEL SALEM
LIS Laboratoire d'Informatique et Systèmes
MODAL MOdel for Data Analysis and Learning

Help of the ANR 338,904 euros
Beginning and duration of the scientific project: October 2018 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter