Knowledge Compilation for Data Analysis – KCODA
KCODA - Knowledge COmpilation for Data Analysis
KCODA is an ANR JCJC project which aims to design new algorithms for answering complex aggregation queries on databases using techniques and data structures from knowledge compilation.
Goal
The main goal of KCODA is to understand what optimization and learning problems can be solved efficiently when data is structured, which is the case when the data to be analysed is the answer set of a database query. Many algorithms exploits the structure of the query to solve aggregation problems more efficiently than the naive way of enumerating every answer before aggregating. In KCODA, we aim at explaining these aggregation algorithms as elementary operations on specialized data structure able to represent query answers in a factorized way. We also aim at using this unified framework to discover new tractable aggregation problems, such as solving linear programs whose variables are the answers of a database query.
Our method is based on the fact that one can often represent every answer of a database query in a factorized, yet tractable way. To do so, one can use data structure inspired by the ones used in knowledge compilation. These data structures have originally been used to represent Boolean functions in a way that is more succinct than its truth table. Yet, they still support efficient operations such as model counting. These data structures can be generalized to represent database tables on non-Boolean domains, and appear in the literature under the name of Factorized Databases, introduced by Dan Olteanu. These data structures have how never been systematically studied. We aim at conducting such a study in order to unify and generalize many existing results in data base theory that are spilled over several papers and to propose in-database implementations.
The goal of KCODA is to study how succinct representation of data sets can be used to solve various optimization problems and AI tasks that are relying on big data sets. We propose to use data structures studied in the field of Knowledge Compilation that can succinctly represent data sets by factorizing common part while still allowing efficient analyses of the data they represent. The first goal of KCODA is to study how one can solve optimization and learning problems efficiently on these representations. The second goal of KCODA is to provide better integration of these techniques in Database Management Systems by improving the algorithmic techniques to build such factorized representations to represent the answer set of a database query and by providing in-database encoding of these representations.
Project coordination
Florent Capelli (Centre de Recherche en Informatique, Signal et Automatique de Lille)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille
Help of the ANR 152,496 euros
Beginning and duration of the scientific project:
March 2021
- 48 Months