CE23 - Données, Connaissances, Big data, Contenus multimédias, Intelligence Artificielle 2018

MAchine learning for environmental TIme Series – MATS

MAchine learning for environmental TIme Series

A huge trend in recent earth observation missions is to target high temporal and spatial resolutions (\emph{e.g.} SENTINEL-2 mission by ESA). Data resulting from these missions can then be used for fine-grained studies in many applications. In this project we will focus on three key environmental issues: agricultural practices and their impact, forest preservation and air quality monitoring.

Extracting Information from Large Environmental Data Sets

The rise of environmental data from sensors, satellites, and climate models presents a major challenge: how can we efficiently extract useful information despite the lack of annotations? This project aims to develop weakly supervised machine learning methods to analyze these large-scale datasets. The goal is twofold: to improve the accuracy of environmental analyses while reducing reliance on manual annotations, which are often costly and limited. By exploring approaches such as semi-supervised learning and self-learning, this project seeks to provide robust tools for detecting climate trends, monitoring biodiversity, and anticipating extreme events. Ultimately, these advancements could benefit both researchers and environmental decision-makers by enhancing monitoring and decision-making in response to current ecological challenges.

This project relies on advanced artificial intelligence methods to efficiently process large, sparsely labeled datasets. Deep learning enables the identification of structures and patterns without requiring exhaustive annotation, making analysis more flexible and automated. Time series alignment techniques help match data collected at different times or under varying conditions, improving the consistency of analyses. Additionally, tools derived from optimal transport theory are used to link different data distributions, facilitating the integration and harmonization of heterogeneous sources. By combining these approaches, the project aims to extract relevant information and enhance the reliability of analyses, paving the way for applications in various fields requiring efficient handling of complex data.

The project led to the development of tslearn, a dedicated library for time series analysis, offering tools comparable to those of scikit-learn. Another major advancement is the design of a domain adaptation algorithm specifically tailored for time series, enabling knowledge transfer between heterogeneous datasets. Applied to Earth observation data, this algorithm enhances the accuracy of environmental analyses.

The project has made significant contributions to weakly supervised learning, particularly in unsupervised domain adaptation, highlighting its relevance for analyzing weakly labeled time series data. However, with the rapid increase in available weakly annotated temporal data, the need for such methods remains crucial. Developing even more robust and generalizable approaches is essential to effectively leverage these growing data volumes and address current challenges in machine learning.

[0] Romain Tavenard et al.. Tslearn, A Machine Learning Toolkit for Time Series Data. In Journal of Machine Learning Research, vol. 21, pp. 1 - 6, 2020.
[1] Maël Guillemé, Simon Malinowski, Romain Tavenard, Xavier Renard. Localized Random Shapelets. In Proceedings of the International Workshop on Advanced Analysis and Learning on Temporal Data, Wurzburg, Germany, 2019.
[2] Yichang Wang, Rémi Emonet, Elisa Fromont, Simon Malinowski, Romain Tavenard. Adversarial Regularization for Explainable-by-Design Time Series Classification. In Proceedings of ICTAI 2020, Greece, 2020.
[3] David Guijo-Rubio, Pedro Gutiérrez, Romain Tavenard, Anthony Bagnall. A Hybrid Approach to Time Series Classification with Shapelets. In Proceedings of the Intelligent Data Engineering and Automated Learning -- IDEAL, Manchester, United Kingdom, 2019.
[6] Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, Nicolas Courty. Optimal Transport for structured data with application on graphs. In Proceedings of the ICML 2019 - 36th International Conference on Machine Learning, Long Beach, United States, 2019.
[7] Titouan Vayer, Rémi Flamary, Romain Tavenard, Laetitia Chapel, Nicolas Courty. Sliced Gromov-Wasserstein. In Proceedings of the NeurIPS 2019 - Thirty-third Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
[8] Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, Nicolas Courty. Fused Gromov-Wasserstein Distance for Structured Objects. In Algorithms, vol. 13, no 9, p. 212, 2020.
[9] Emilien Alvarez-Vanhard, Thomas Houet, Cendrine Mony, Lucie Lecoq, Thomas Corpetti. Can UAVs fill the gap between in situ surveys and satellites for habitat mapping? Remote Sensing of Environment, Elsevier, 2020, 243.
[10] Marc Rußwurm, Romain Tavenard, Sébastien Lefèvre, Marco Körner. Early Classification for Agricultural Monitoring from Satellite Time Series. In Proceedings of the AI for Social Good Workshop at ICML, Long Beach, United States, 2019.

A huge trend in recent earth observation missions is to target high temporal and spatial resolutions (\emph{e.g.} SENTINEL-2 mission by ESA).
Data resulting from these missions can then be used for fine-grained studies in many applications.
In this project we will focus on three key environmental issues: agricultural practices and their impact, forest preservation and air quality monitoring.

Based on identified key requirements for these application settings, MATS project will feature a complete rethinking of the literature in machine learning for time series, with a focus on large-scale methods that could operate even when little supervised information is available.
In more details, MATS will introduce new paradigms in large-scale time series classification, spatio-temporal modelling and weakly supervised approaches for time series.
Proposed methods will cover a wide range of machine learning problems including domain adaptation, clustering, metric learning and (semi-)supervised classification, for which dedicated methodology is lacking when time series data is at stake.
Methods developed in the project will be made available to the scientific community as well as to practitioners through an open-source toolbox in order to help dissemination to a wide range of application areas.
Moreover, the application settings considered in the project will be used to showcase benefits offered by methodologies developed in MATS in terms of time series analysis.

Project coordination

Romain Tavenard (LITTORAL, ENVIRONNEMENT, TELEDETECTION, GEOMATIQUE)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

LETG LITTORAL, ENVIRONNEMENT, TELEDETECTION, GEOMATIQUE

Help of the ANR 214,920 euros
Beginning and duration of the scientific project: - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter