Verified Foundations of Large-Scale Distributed Graph Systems – VERDI
Traditional relational data management systems are challenged by the abundance of highly interconnected heterogeneous data. This led to a surge in the popularity of graph databases in many industry and academic areas. For example, graph datasets with world-wide multi-omics data for genomic analyses and contact tracing were collaboratively curated during the pandemic (EU Datathon, CovidGraph, Covid-19 Knowledge Graph). Other critical societal graph databases use-cases include finance, telecommunications, journalism, and intelligent transportation. However, when processing geo-distributed graph data at scale, custom distribution models are needed, whose support still poses practical challenges. These are: replication, to mitigate slow and unreliable networks; sharding, to horizontally partition large graphs; and partial replication, to favor access locality, replicating data close to clients. Moreover, local-first models provide high availability, combining sharding and replication for read and write access to a relevant data subset, even when disconnected. While distribution mechanisms, such as Replicated Data Types (RDTs), have become well-established for local-first key-value stores, their usage in graph databases is largely unexplored. This is a more complex setting, due to its stronger demands to compositionally maintain connectivity invariants. VERDI proposes a novel interdisciplinary methodology to reliably devise such foundational distribution devices, cross-cutting the areas of databases, distributed systems, formal methods, and programming languages. It comprises four work packages (WPs). WP1 will build a unified formal foundation for prototyping and extracting correct-by-construction graph RDTs (GRDTs). WP2 will tailor these to provably enforce complex invariants under weak consistency models. WP3 will extend GRTDs with parametricity and transactional support and WP4 will evaluate their performance on a decentralized graph-based ledger industrial use-case.
Project coordination
Stefania Dumbrava (Telecom SudParis Evry)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Technical University of Kaiserslautern
LMF Ecole normale supérieure Paris-Saclay
IRIF Université Paris Cité
Universidade Nova de Lisboa (NOVA-LINCS)
LIST Commissariat à l'énergie atomique et aux énergies alternatives
Institute for Systems and Computer Engineering, Technology and Science - Porto
TSP - Samovar Telecom SudParis Evry
Delft University of Technology
Help of the ANR 196,355 euros
Beginning and duration of the scientific project:
January 2025
- 48 Months