COmpREssing networks and GRAPHs for efficIEnt computing – COREGRAPHIE
Graphs are ubiquitous. Also called networks, graphs are used to model many real-world problems and data: from social networks, to road networks, to genome fragment assembly, to images and 3D objects (for pattern recognition and classification), etc. Nowadays, many of these applications face a major problem: the volume of data increases to such an extent that even polynomial solutions are no longer sufficient. Parallel and distributed frameworks, such as MapReduce, which are effective approaches to deal with big data, are not necessarily suitable for massive graph computations, mainly because of the structure of graph data and the iterative nature of their algorithms.
In this project, we put compression at the core of massive graph data processing. We aim to define a graph reduction toolbox that allows to compute simpler and smaller representations of graphs, i.e., summaries, that can be used instead of the initial graphs. For this, we propose to develop algorithms that make it possible to perform such compressions, and to refine them according to the quality of the obtained summaries as well as the operations they allow to undertake. The advantage of such approach is to process massive graph data in linear or quasi-linear time. Our methodology relies mainly on searching for regularities into graphs in order to summarize and analyze them
Thus, the challenge we address is to define, prototype, and test such a graph engine so that to ensure an operational, efficient and scalable graph analysis tool that can be extended to all graph structured data. Our main use case is Software Heritage, the largest archive of software source code together with its development history, founded by INRIA. The archive data model is a fast growing graph consisting of 15 billion nodes and 200 billion edges.
The new tools and algorithms produced by the project, that will be available in open source, will create basis for developing new types of data analysis and information retrieval algorithms for massive graph structured data and will find applications in various domains and disciplines relying on graphs or networks.
Madame Hamida Seba (UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Centre de Recherche Inria de Paris
IRIF Institut de Recherche en Informatique Fondamentale
LIRIS UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION
LIB Laboratoire d'Informatique de Bourgogne - EA 7534
Help of the ANR 459,928 euros
Beginning and duration of the scientific project: March 2021 - 48 Months