JCJC SIMI 2 - JCJC - SIMI 2 - Science informatique et applications 2012

Practical algorithms for ontology-based data access – PAGODA

PAGODA

L’interrogation de données en présence d’ontologie est un nouveau paradigme dans la gestion de données qui vise à exploiter des connaissances sémantiques décrites par une ontologie afin d’améliorer les réponses aux requêtes. L’objectif de ce projet est de proposer de nouveaux algorithmes d'interrogation en présence d'ontologie qui passent à l'échelle ainsi que de nouvelles méthodes pragmatiques pour gérer de façon raisonnée les données incohérentes.

Enjeux et objectifs

L’interrogation de données en présence d’ontologie a de nombreuses applications potentielles, mais d’importants défis fondamentaux doivent être surmontés avant que ces techniques puissent être largement adoptées. Ce projet cible les deux verrous scientifiques suivants : 1. Passage à l’échelle des algorithmes de réponse aux requêtes L’efficacité des systèmes de bases de données relationnelles repose sur des décennies de travaux sur l’algorithmique et l'optimisation des mécanismes de réponse aux requêtes. En revanche, l’interrogation de données en présence d’ontologie est un sujet tout jeune, et malgré des avancées importantes, concernant notamment l’identification de langages d’ontologie intéressants de faible complexité, d'importants travaux restent à accomplir avant de disposer d’algorithmes qui passent à l’échelle. 2. Gestion raisonnée des données incohérentes Dans les applications traitant de grands volumes de données ou ayant des données issues de plusieurs sources, il y a une forte probabilité pour que l’ensemble de données soit incohérent avec l’ontologie, rendant impuissants les algorithmes d'interrogation classiques (car tout est conséquence d’une contradiction). Des mécanismes de gestion raisonnée des données incohérentes (soit par réparation de la base pour restaurer la cohérence, soit par l’adoption d’une sémantique alternative robuste aux incohérences) sont donc indispensables. L’objectif principal de ce projet est de répondre à ces défis majeurs en développant de nouveaux algorithmes d'interrogation en présence d'ontologie qui passent à l'échelle ainsi que de nouvelles méthodes pragmatiques pour gérer de façon raisonnée les données incohérentes.

Approche Générale

Pour s'attaquer au premier verrou scientifique - le passage à l’échelle - nous allons examiner comment combiner différents types d’algorithmes de réponse aux requêtes afin d’obtenir de nouveaux algorithmes avec de meilleures propriétés. En particulier, nous allons considérer la combinaison d’approches basées sur le chaînage en arrière (réécriture de requêtes) et celles basées sur le chainage en avant (saturation). Une étude de complexité approfondie permettra de comprendre le coût d’une combinaison particulière et de pouvoir choisir pour une application donnée l’algorithme le plus adapté. On étudiera également différents types d’optimisation pour des algorithmes d’interrogation de données en présence d’ontologie.

Pour gérer des données incohérentes, deux approches complémentaires seront explorées. La première consiste à réparer la base de données, afin de rétablir la cohérence, tandis que la deuxième adopte une sémantique alternative pour la notion de réponse aux requêtes qui est robuste aux incohérences. Pour la première approche, l’enjeu est de fournir des outils pour aider l’utilisateur à choisir la bonne réparation. Pour la deuxième approche, la difficulté est de trouver des algorithmes de réponse aux requêtes avec la sémantique alternative qui soient suffisamment performants, soit par la mise en évidence de cas traitables, soit par des algorithmes génériques qui montrent de bonnes performances sur les cas typiques.

Results

Les résultats attendus de nos travaux de recherche fondamentaux seront essentiellement de deux types :

-- de nouveaux algorithmes et des optimisations pour l'interrogation en présence d’ontologie, ainsi que pour la gestion raisonnée de données incohérentes ;

-- des résultats fins de complexité qui aident à mieux comprendre d’où viennent les difficultés et à sélectionner un algorithme adapté à une application donnée.

La valorisation de nos résultats se fera principalement par des publications dans les meilleurs congrès du domaine. Nous ciblerons les grandes conférences d’intelligence artificielle (IJCAI, AAAI, ECAI), la conférence spécialisée KR, et éventuellement les conférences prestigieuses des domaines connexes (e.g. bases de données – PODS, ICDT, ou web sémantique – ISWC).

Le côté plus applicatif du projet produira :

-- une implémentation et une expérimentation d’un ensemble d’outils pour l’algorithmique de réponse aux requêtes élaborée dans le projet ;

-- une étude de cas qui examinera l’apport des techniques d’interrogation de données en présence d’ontologie dans une application médicale portant sur l’anatomie.

Prospects

L'interrogation de données en présence d'ontologie est largement reconnu comme un sujet important, et il y aura sans doute une multitude de façons de poursuivre PAGODA au-delà des quatre ans du projet. Mais il est impossible à ce stade de prédire quelles seront les continuations les plus prometteuses, car cela va dépendre des résultats du projet, ainsi que le développement de cette thématique à l'échelle mondiale.

Scientific productions and patents

Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter:
Ontology-based Data Access: A Study through Disjunctive Datalog, CSP, and MMSNP.
Proceedings of the 32nd International Conference on the Principles of Database Systems (PODS'13).

Meghyn Bienvenu, Carsten Lutz, and Frank Wolter:
First Order-Rewritability of Atomic Queries in Horn Description Logics
A apparaître dans 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).

Meghyn Bienvenu, Magdalena Ortiz, and Mantas Simkus:
Conjunctive Regular Path Queries in Lightweight Description Logics
A apparaître dans Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).

Meghyn Bienvenu, Magdalena Ortiz, Mantas Simkus, and Guohui Xiao:
Tractable Queries for Lightweight Description Logics
A apparaître dans Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).

Meghyn Bienvenu and Riccardo Rosati:
Tractable Approximations of Consistent Query Answering for Robust Ontology-based Data Access
A apparaître dans Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).

Michaël Thomazo:
Compact Rewritings for Existential Rules
A apparaître dans Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).

Mélanie König, Michel Leclère, Marie-Laure Mugnier, Michaël Thomazo:
On the Exploration of the Query Rewriting Space with Existential Rules
A apparaître dans Proceedings of the 7th International Conference on Web Reasoning and Rule Systems (RR'13).

Submission summary

Ontology-based data access (OBDA) is a new paradigm in data management that seeks to exploit the semantic knowledge expressed in ontologies when querying data. Ontologies can improve query answering by enriching the vocabulary of data sources, relating the vocabularies of different data sources during data integration, and palliating data incompleteness by allowing inference of new facts. OBDA has the potential to revolutionize health data management by allowing sophisticated semantic querying of patient data; it is also poised to have a major impact in the life sciences by facilitating the exchange of experimental data among researchers. More generally, the semantically-enriched querying capabilities and seamless data integration which are the hallmarks of the OBDA approach are relevant to practically every application area which currently relies on relational databases, notably, enterprise information systems. However, before OBDA can be widely adopted in applications, some important foundational challenges need to be addressed.

This project is centered on the following two challenges.

- Scalability: Modern-day relational database management systems benefit from decades of research on querying algorithms and optimizations. By contrast, ontology-based data access is a young area of study, and despite important recent advances, including the identification of interesting tractable ontology languages, much work remains to be done in designing scalable OBDA query answering algorithms.

- Handling data inconsistencies: In real-world applications involving large amounts of data or multiple data sources, it is very likely that the data will be inconsistent with the ontology, rendering standard querying algorithms useless (as everything is entailed from a contradiction). Appropriate mechanisms for dealing with inconsistent data are thus crucial to the successful use of OBDA in practice, yet have been little explored thus far.

The primary aim of this project is to help address these challenges by developing novel OBDA query answering algorithms and practical methods for handling inconsistent data.

More precisely, we will explore how different approaches to OBDA query answering can be combined to obtain new algorithms with better properties, and how optimizations can make OBDA querying algorithms more efficient. For the second challenge, two complementary approaches will be considered: repairing the data to restore consistency, and query answering under inconsistency-tolerant semantics. The project will also include a practical component consisting of a case study of OBDA in an anatomy application, as well as a prototype implementation and testing of the new OBDA querying algorithms.

In order to successfully achieve these ambitious goals, the coordinator has assembled a team that includes the five French researchers with the most experience in OBDA. Three of these researchers have a background in description logics, which form the basis of most ontology languages, including the W3C standards OWL and RDFS. The other two researchers contribute their expertise on rule-based OBDA formalisms. All have excellent academic records, with regular publications in prestigious venues. The project team also counts a PhD student investigating efficient querying algorithms for rule-based OBDA, and a professor of anatomy who is developing an ontology-based software for accessing and visualizing patient data that is the basis of our case study. Altogether, the team composition ensures an excellent ratio of involvement of young researchers and of permanent personnel.

It is worth noting that while OBDA is currently a very active research topic at the international level, it has not yet attracted much attention in France. By bringing together the top French researchers on the topic, some of whom have had little interaction thus far, this project contributes to the formation of a French OBDA community.

Bienvenu Meghyn (Université Paris Sud / Laboratoire de Recherche en Informatique)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

PSUD/LRI Université Paris Sud / Laboratoire de Recherche en Informatique

Help of the ANR 260,312 euros
Beginning and duration of the scientific project: December 2012 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.