Representation Learning for Modeling Rich Dynamic Interaction Traces – LOCUST
Human interactions conducted either via the web and mobile services or with artifacts and intelligent sensors generate large flows of complex dynamic data. These user traces correspond to sequences of observations: events, measurements, semantic content, etc. They may have a space component (e.g. geo-localization) and are often composed of multiple types of information. Analyzing multiple heterogeneous information sources and integrating their structural, spatial, semantic and temporal dimensions is challenging. Because of their diversity and their quantity, it is natural to turn towards statistical machine learning for analyzing these data. Current methods and algorithms however fall far from providing an answer to many of the problems raised by the complexity and variety of such data.
Locust objective is to build formal models and algorithmic tools aimed at understanding, modeling and analyzing complex dynamic traces for a set of generic machine learning tasks and for target applications. Locust will develop theory and methods based on ideas coming from the representation learning domain, a recent trend in statistical machine learning. Two use cases concerning respectively semantic information diffusion and temporal recommendation on one side, and urban computing on the other side will support the theoretical contributions and serve for evaluating the models and algorithms. Representation learning methods allow one learning the latent factors that underlie the generation of observed data or that are relevant for a given task. The challenge here is to learn latent representations for dynamical processes corresponding to the multiple sequences of temporal or spatio-temporal observations. The first task of Locust is dedicated to the development of representation learning models and algorithms for spatio-temporal data. We will exploit ideas coming from deep learning, neural networks, and algebraic methods such as dictionary learning and will revisit a series generic machine learning problems in the context of spatio-temporal data corresponding to human traces. While this first task will develop general representation learning models covering a large spectrum of situations, task 2 focuses on incorporating knowledge into the statistical models in order to help their design and their training. For this we propose a methodology to transfer this knowledge from spatio-temporal diffusion models, expressed as diffusion or reaction-diffusion equations, to our statistical models. Conversely, the results of this task could be used to develop new models for statistical spatio-temporal statistics. A third task is dedicated to the collect of large size corpora for the two use cases and to the evaluation of proposed methods. Evaluation will be performed offline for both use cases on the collected datasets and online through one of the partner platform for the first use case (semantic information diffusion and temporal recommendation).
The consortium is composed of two academics UPMC-LIP6-Paris and UJF-LIG-Grenoble, and one industrial partner Deezer. LIP6 and LIG are specialists of machine learning and data science. They are furthermore complementary as LIP6 has a strong expertise in representation learning methods like neural networks or algebraic methods, while LIG has strong expertise on temporal processes, multiple time series analysis and probabilistic latent models. Both teams have been working on the analysis of social data (recommendation, prediction of content diffusion, etc) for several years as shown by their respective publications in top conferences. They collaborate with academic and industrial partners on the topic of urban computing. LIP6 and LIG will contribute for the theoretical and algorithmic aspects. Deezer will act as a data provider and End User for a proof of concept. Data and use cases for urban computing will be provided by partners outside the project with whom we already have close collaborations (VEDECOM, STIF, IFSTTAR).
Project coordination
Patrick GALLINARI (Universite Pierre et Marie Curie - Laboratoire d'Informatique de Paris 6)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIG Laboratoire d'informatique de Grenoble
DEEZER
UPMC - LIP6 Universite Pierre et Marie Curie - Laboratoire d'Informatique de Paris 6
Help of the ANR 487,278 euros
Beginning and duration of the scientific project:
September 2015
- 48 Months