LabCom_2024 - V1 - Laboratoires communs organismes de recherche publics – PME/ETI - Edition 2024 - eval vague 1 2024

Data observatory in the era of multi source Big Data – Aerial

Submission summary

The recent explosion in data acquisition capabilities and observing modes from satellites and facilities in astronomy and Earth observation is resulting in a rise in the level of data acquired, to be processed and analysed, presenting a technological challenge. The Gaia mission from ESA, for instance, has led to a catalogue of billions of celestial bodies (planets, stars, galaxies, etc.). With the recently launch of ESA's Euclid mission, which will similarly lead to a census of several billion galaxies and the preparation of the LISA mission which will observe the Universe via gravitational waves, the question of the means available for a joint use of these massive quantities of data, heterogeneous but complementary, becomes pressing. To this data from space, one must also add observation from the ground from a multiplicity of observatories and in a diversity of wavelength bands, notably in the radio domain with SKA soon to provide a dataset still significantly larger. The situation for Earth observation is similar with data spanning land, oceans etc. from space but also ground instruments, which need to be combined in order to understand the Earth as a system.
In this context, OCA and ACRI-ST propose the common laboratory AERIAL for the development of a data observatory in the era of multi-source big data. Its purpose is to develop hardware and software solutions which Gaia Data, ESA's Datalabs, the Centre de Données in Strasbourg or SKA for the network of SRCs for instance could choose to deploy. This effort builds on three pillars: (1) the development of solutions allowing to aggregate heterogeneous databases and to ensure their interoperability, able to manage the diversity of formats and qualities and to scale to geographically distributed big data; (2) the elaboration of observation services in the sense of exploring these databases, not only via a query system adapted to their diversity but also via intelligent crossmatching thanks to large-scale analysis carried out by Artificial Intelligence and via advanced visualisation capabilities providing a synthetic vision in spite of the large data space; (3) the ability to process the data in situ, at the scale of High Performance Computing, based on one hand on a software archive allowing for preserving and reusing existing codes and, on the other hand, on an environment allowing users to develop, execute and optimise their own codes, notably based on AI, to exploit this data. This research will open possibilities for more science for OCA through its strong involvement in the Gaia, Euclid, LISA and SKA projects and opportunities for new services and markets for ACRI-ST thanks to the new tools and technologies developed and the expertise acquired in these fields of science.
This proposal is perfectly aligned with the ASTRONET 2022-2035 roadmap recently published (chapter "Computing; big data, HPC and data infrastructure") and fits in a strategy allowing for changing the scale at which observation and code databases are used in order to get more from them. It will allow for reducing data transfers thanks to their exploration and their exploitation in situ and will concentrate processing in an effort to reduce the need for network infrastructures and to improve the energy efficiency of computations. This is in line with the sustainable development objective SDG 9 "Industry, innovation and infrastructure".
Finally, this proposal contributes to Open Science, both in terms of sharing the data and the codes and in terms of offering possibilities for reproducing the corresponding results.

Project coordination

Shan Mignot (GALILEE-OCA)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

ACRI-ST ACRI-ST
GALILEEOCA GALILEE-OCA

Help of the ANR 358,660 euros
Beginning and duration of the scientific project: November 2024 - 54 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter