CE23 - Intelligence artificielle

Decentralized Knowledge Graphs – DeKaloG

DeKaloG: Decentralized Knowledge Graph

Knowledge Graphs (KGs) penetrate our everyday life, telling us what to buy, what to learn, etc. Major companies maintain Knowledge Graphs to power voice assistants and search engines. However, access to these KGs is restricted and the way they are built and maintained is not transparent. This restricted access and the lack of transparency do not allow to build new KGs or to build on top of them. DeKaloG encourages the growth of a public and decentralized web of Knowledge Graphs.

Objective and issues

Following the linked data principles, the Linked Open Data (LOD) promotes a vision of a global decentralized Knowledge Graph. However, the LOD KGs face serious technical and non-technical issues: The size of KGs is increased dramatically, raising issues on scalability, and the current metadata available in the LOD cloud does not tell us a lot about how to access a KG, raising issues on findability and metadata formats. Consequently, these issues seriously hamper the growth of the global KG and limit their usage in real applications.<br /><br />DeKaloG follows the vision of a global decentralized knowledge graph that can be leveraged to answer questions at the scale of the web. For example, “What is the number of famous scientists men and women per birth year ?”. To face the LOD issues, DeKaloG promotes ATF (Accessibility, Transparency, and Findability) principles and a sustainable approach for implementing them:<br /><br />(A)ccessibility is the right to execute any query at any time on a KG and get complete answers. Currently, accessibility is challenged by critical availability and scalability problems. To ensure availability and fair access to KG, existing KG providers restrict access by reducing the expressivity of queries or implement fair access policies thanks to time quotas. Consequently, many queries cannot produce complete answers. DeKaloG aims to propose a model to provide fair access policies to KGs without quota while ensuring complete answers to any query. Such property is crucial for enabling web automation, i.e. to allow agents or bots to interact with KGs. Preliminary results on web preemption open such a perspective, but scalability issues remain. <br />(T)ransparency ensures the right to know who built the KG, how it was built, and from which sources. Transparency requires data provenance information and more generally contextual information that are not widely adopted in the LOD. Transparency can be defined at many levels from the whole KG down to individual facts within a KG. The trade-off between transparency granularity and query performance is still an open issue. DeKaloG aims to propose models for capturing different levels of transparency, a method to query them efficiently, and especially, techniques to enable web automation of transparency. <br />(F)indability is the right to find efficiently pertinent KGs for a query. DeKaloG aims to propose a sustainable index for achieving the findability principle. The index itself is envisioned as an accessible and transparent KG, indexing accessible and transparent KGs. For the KG index, accessibility means any query can be executed on the index and get complete results. Transparency means that a KG provider can know what the index knows about her KG, including ranking statistics and how they are computed and mostly the reproducibility of the index. The originality is to interact with the index as a KG and also to build and maintain the index just by querying KGs.

The scientific methodology of the DeKaloG project consists of four tasks (in addition to a fifth task Task 0, related to the management of the project and the dissemination of its results).

Task 0: Project Management: This task intends to ensure the smooth running of the project. Intensive collaboration between the three partners is crucial to the success of the project, involving frequent meetings and inter-partner visits.

Task 1: Accessibility-oriented Knowledge Graphs: This task intends to build an accessible SPARQL distributed datastore for DeKaloG. Thanks to web preemption, we demonstrated how it is possible to provide fair access policies to KGs and get complete results for queries. The scientific challenge is now to demonstrate how web preemption can scale with a big volume of data and a large number of concurrent queries including update queries.

Task 2: Findability of knowledge graphs: This task intends to build a semantic index of knowledge graphs. In this task, we suppose indexed KG are accessible thanks to Task1 and transparent thanks to Task 3. The objective is to discover KGs and index them in an accessible and transparent KG. The originality is to interact with the index as a KG and also to build and maintain the index just by querying KGs.

Task 3: Transparency-oriented knowledge graphs: The demand for transparency increases in many domains. However, there exists no structured and homogeneous representation of transparency metadata/information. This prevents its use in real applications through semantic technologies. Our goal is to propose an intuitive, extensible, canonical representation, aiming at a standard. We intend to design algorithms and provide tools enabling i) the inclusion of queryable transparency-related metadata/information in a KG, and ii) the estimation and verification of the transparency degree of a KG. An important challenge is to limit the overhead for KG providers and KG users.

Task 4: DeKaloG use-cases and applications: The uses-cases highlight how ATF principles enable the web automation of Knowledge Graphs. Web automation is the key to building a sustainable ecosystem of KGs. Use-cases also allow specifying which queries the index has to answer and consequently, to specify an optimized ontology for the index.

The different tasks are strongly connected: (i) Transparency requires Accessibility, (ii) Indexing and Ranking require Transparency, and (iii) Accessibility and Applications and use-cases mainly require indexing and ranking. However, each task can start immediately thanks to preliminary results and already available data.

In DeKaloG, we follow best practices of the semantic web domain using and extending standard models (RDF, VoID, PROV-O, etc). All our results: papers, software, and data will be accessible to everyone following the FAIR principles.

- Extension of the expressiveness of the SaGe preemptive server to handle aggregation queries. (Publications: 1 international conference paper (ESWC 2020, CORE A)).

- Extension of the expressiveness of the SaGe preemptive server to handle navigation requests (property path), (Publications: 1 international conference paper (ESWC 2020, CORE A) and a démonstration at ESWC2021).

- Improving the performance of the SaGe pre-emptive server by handling the count-distinct problem. (Publications: 1 international journal paper (Semantic Web Journal 2022)).

- Improving the performance of a SPARQL server by cooperative execution of SPARQL queries: (Publications: 1 international conference paper (DEXA 2020, CORE B)).

- State of the art on consistency and implementation of multi-version concurrency for SPARQL UPDATE queries in SaGe (Publications: Technical Report 2021).
- A technique for semantic index construction based on the preemptive server SaGe. (Publications: Technical report 2021).

- Definition of an ontology to complete the existing vocabularies for the description of - knowledge graphs in the semantic index.

- Definition of a framework for building a semantic index, based on the declaration of construction rules. This approach is based on formal definitions of tests and criteria in SPARQL and in a rule language extending this formalism.

- Implementation of this framework based on the CORESE semantic factory.

- Definition of rules for extracting the description of a knowledge graph according to three axes: extraction of existing data, verification and addition to this data, evaluation of the quality of the graph.

- Experimentation of the framework in a real situation on 200 bases.
- Wrote and submitted an article detailing this work and its results for a special call of the Journal of Web Semantics.

- Bibliographic study of transparency and the different notions that go with it

- Proposal of several general and formal definitions of transparency

- Proposal of a definition of transparency specifically adapted to KGs implemented with an RDF graph (Research report,2022)

The expected impact of DeKaloG is to build a sustainable web of Knowledge Graphs, i.e. an open global KG able to grow and to improve its quality. Behind our ATF (Accessibility, Transparency, and Findability) principles, we aim to enable web automation for KGs maintenance, building, and refinement.
Economical impact. DeKaloG allows to build applications on top of continuously improving global KG. Major companies use closed KGs to enrich search engine results, to power chatbot, or to power vocal assistants. DeKaloG allows to provide a global KG to all companies. DeKaloG also allows companies to specialize in KGs maintenance, writing KG bots to cure the global KG. In the short term, DeKaloG aims to demonstrate the technical feasibility of such a vision. In the mid-term, we aim to demonstrate how such vision is tractable. A long-term objective is to demonstrate the sustainability of the vision.

Societal impact. Ensuring open access to knowledge is a common goal for every society. As Wikipedia provided open access to a global encyclopedia, DeKaloG promotes open access to an open global knowledge graph where citizens can ask questions, verify the quality of answers and contribute to its improvements.
Dissemination. DeKaloG proposes an original vision of the future of Linked Data. We expect to demonstrate how web preemption that unlocks accessibility can lead to web automation that unlocks the sustainability of the linked data. These are “hot” topics in the Semantic Web community. We, therefore, plan to publish our results in top-level conferences and journals of Semantic Web such as WWW, ISWC or ESWC. DeKaloG is likely to be an attractive vision of the future of Linked Data, we expect to federate other research groups around this vision at the national and international level. We plan to advertise (at various levels) the scientific results and tools that will be developed during the DeKaloG project.
First, a public website will be created. In addition to the usual description of the project and related events, it will also be used to make public and transparent tools, experimentations, evaluations and companion pages associated with our publications.
Of course, we will continue to publish our results in the leading scientific journals and conferences.
In the last year of the project, we propose to organize a workshop on DeKaloG related topics, in conjunction with top conferences in the Semantic Web. This will be an opportunity to promote our scientific results and draw the attention of the international community.

- Julien Aimonier-Davat, Hala Skaf-Molli, Pascal Molli, Arnaud Grall, Thomas Minier. Online approximative SPARQL query processing for COUNT-DISTINCT queries with Web Preemption. Semantic Web Journal, 2022 ?hal-03563595?

- Julien Aimonier-Davat, Hala Skaf-Molli, Pascal Molli. SaGe-Path: Pay-as-you-go SPARQL Property Path Queries Processing using Web Preemption. Demo at Extended Semantic Web Conference (ESWC 2021), Jun 2021 (nominated best demo) ?10.1007/978-3-030-77385-4_4?. ?hal-03277622?

- Julien Aimonier-Davat, Hala Skaf-Molli, Pascal Molli. Processing SPARQL Property Path Queries Online with Web Preemption. Extended Semantic Web Conference (ESWC 2021), Jun 2021. ?hal-03277623?

-Arnaud Grall, Thomas Minier, Hala Skaf-Molli, Pascal Molli. Processing SPARQL Aggregate Queries with Web Preemption. 17th Extended Semantic Web Conference (ESWC 2020), Jun 2020, Heraklion, Greece. ?hal-02511819?

- Arnaud Grall, Hala Skaf-Molli, Pascal Molli, Matthieu Perrin. Collaborative SPARQL Query Processing for Decentralized Semantic Data. 31st International Database and Expert Systems Applications- DEXA 2020?10.1007/978-3-030-59003-1_21?. ?hal-03154375?

- Arnaud Grall, Thomas Minier, Hala Skaf-Molli, Pascal Molli. Traitement des requêtes d’agrégation sur un serveur SPARQL préemptif. 31es Journées francophones d'Ingénierie des Connaissances, Jun 2020, France. ?hal-02888207?

- Hala Skaf-Molli: Querying Decentralized Knowledge Graphs. 10th International Conference on Data Science Technology and Applications (DATA), Invited Talk. 2021 ?hal-03581892?

- Julien Aimonier-Davat,, Pascal Molli, Hala Skaf-Molli, Thomas Minier. SaGe: A Preemptive SPARQL Server for Online Knowledge Graphs, Technical Report LS2N, Université de Nantes. 2021 ?hal-03481686?

- Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel. IndeGx: A Model and a Framework for Indexing Linked Datasets and their Knowledge Graphs with SPARQL-based Test Suits. 2022. Journal of Web Semantics (Soumis)

-Jennie Andersen, Sylvie Cazalens, and Philippe Lamarre. Research Report “Requirements and first models”, 2022.

DeKaloG follows the vision of a global decentralized Knowledge Graph (KG) that can be leveraged to answer questions at the scale of the web. For example, “give me information about companies related to Monsanto” or “What is the number of famous scientists men and women per birth year?”. To achieve this vision, DeKaloG promotes 3 principles and a sustainable approach for implementing them.

Accessibility is the right to execute any query at any time on a KG and get complete answers. Currently, accessibility is challenged by critical availability and scalability problems. To ensure availability and fair access to KG, existing KG providers restrict access by reducing the expressivity of queries or implement fair access policies thanks to time quotas. Consequently, many queries cannot produce complete answers. For instance, asking for all entities in Wikidata will return a partial set of entities only. DeKaloG aims to propose a model to provide fair access policies to KGs without quota while ensuring complete answers to any query. Such property is crucial for enabling web automation, i.e. allow agents or bots to interact with KGs. Preliminary results on web preemption open such perspective, but scalability issues remain.

Transparency ensures the right to know who built the KG, how it was built and from which sources. Transparency requires data provenance information and more generally contextual information. Transparency can be defined at many levels from the whole KG down to individual facts within a KG. The trade-off between transparency granularity and query performance is still an open issue. DeKaloG aims to propose models for capturing different levels of transparency, a method to query them efficiently, and especially, techniques to enable web automation of transparency.

Findability is the right to find efficiently pertinent KGs for a query, i.e. which KGs contain relevant facts for my query. DeKaloG aims to propose a sustainable index for achieving the findability principle. The index itself is envisioned as an accessible and transparent KG, indexing accessible and transparent KGs. For the KG index, accessibility means any query can be executed on the index and get complete results. Transparency means that a KG provider can know what the index knows about her KG, including ranking statistics and how they are computed and ideally, the reproducibility of the index. The originality is to interact with the index as a KG and also to build and maintain the index just by querying KGs.

The main idea behind these principles is to consider KGs as a first class citizen and build a sustainable eco-system of KGs. ATF principles target the automation of the web of Knowledge Graphs. As Wikipedia’s bots contribute to the quality of Wikipedia, the principles of DeKaloG allow to write KG bots to maintain and improve the web of KGs. Enabling an open and sustainable global knowledge graph is essential to ensure access to open knowledge for citizens, organizations, and companies.

Project coordination

Hala Skaf-Molli (Laboratoire des Sciences du Numérique de Nantes)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

Inria Centre de Recherche Inria Sophia Antipolis - Méditerranée
LIRIS UMR 5205 - LABORATOIRE D'INFORMATIQUE EN IMAGE ET SYSTEMES D'INFORMATION
LS2N Laboratoire des Sciences du Numérique de Nantes

Help of the ANR 652,615 euros
Beginning and duration of the scientific project: February 2020 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter