Constraint Event-Driven Automated Reasoning – CEDAR
The CEDAR project is the study of, and experimentation with, Knowledge Representation that is an alternative to what has prevailed so far.
The two main challenges for the coming to pass of the Semantic Web are (1) scalability and (2) distribution. The problem of scalability is that a well-designed web-oriented Knowledge Base (KB) system must be able to handle larger and larger volumes of knowledge without unbearable degradation of performance. Dealing with the second challenge (distribution) is as complex an issue since it must deal efficiently and seamlessly with knowledge spread all over the net under “real-life” conditions.
We believe that a key to a satisfactory handling of both challenges is offered by the Order-Sorted Feature constraint approach.
The objectives of the CEDAR project are: (1) to develop, implement, and test a constraint-based approach to Knowledge Representation (KR) and automated reasoning where all knowledge is expressed in a universal graph-based representation format such as RDF (e.g., Linked Data), in the same manner as all data has been represented as tables in the Relational Model; (2) to enable such a constraint-based system to handle time-aware reasoning using multiple knowledge sources for event-driven computing in a distributed KB context, where the environment evolves in real-time (e.g., intelligent adaptive Quality-of-Service monitoring, maintaining evolving KBs, reconciling distributed KBs, etc., …). Attaining these objectives will be a contribution in an essential area, with original and innovative results of important potential – offering scalable Semantic Web processing over distributed KBs – and this, thanks to a formal basis that differs from that of most similar pursuits, and tested on challenging benchmarks.
Our overall technical goal is to provide a tangible and testable proof that the OSF-constraint logic approach to knowledge representation can: (1) be expressed and used computationally on the emerging standard representation format for all Semantic Web knowledge bases (RDF triple-based) – and used both for expressing and maintaining structural and temporal constraints; (2) through testing and simulation, experiment with architectural issues in managing and accessing distributed RDF-based knowledge that must be scalable; (3) use, test, and demonstrate the new OSF-constraint engine on actual RDF-expressed knowledge by using the most efficient architecture as indicated per simulation benchmarks for scalable distributed knowledge processing. The work proposed in this project must innovate in addressing two essential technological challenges regarding triple-based ontologies: knowledge reasoning and managing. (A) The key innovation of the proposed work is that its ontological reasoning technology is to be based on OSF-graph Constraint Logic rather than Description Logic like the OWL-family of KB representation and reasoning that constitute the majority of extant KB systems. The originality of OSF-graphs is that they map directly to RDF and to formal constraints that may be interpreted both as structural and temporal constraints. (B) The other essential contribution of this project is the management of very large amounts of distributed triple-based knowledge. It is to experiment with low-level organization and optimization, through testing and simulation, of the RDF-represented KBs upon which an OSF-constraint processing engine could be used. The project’s success measure will be in demonstrating the outcome of (A) to be fully operational on actual benchmarks thanks to the results of (B).
What we propose can be characterized as a synthesis of various prior work in Artificial Intelligence (AI), Knowledge Representation (KR), and Constraint-Logic Programming (CLP), with the latest technology for maintaining and accessing distributed knowledge in the new context of interlinked media. The adoption of RDF as a W3C Semantic Web standard for a universal triple-based idiom expressing graph-format knowledge happens to be another serendipitous timely justification. Indeed, the formal basis upon which the essentials of OSF constraint-solving relies uses exactly the same basic universal labeled-graph representation. The approach we advocate is to see such graphs as very simple and easily enforceable constraints—and this essentially for practical reasons. It allows representing and manipulating graph-based objects (e.g., record types) as order-sorted featured objects that is simple, efficient, and practical. This formalism is a basic formal rendition of the essential informal insights underlying Semantics Networks of the 80's and 90's. The most interesting aspect of it is that it offers a direct interpretation of labeled graphs representing structural knowledge as efficiently solvable constraints. Reasoning with large and complex structures is done by interpreting such graphs as conjunctive or disjunctive sets of elementary constraints. Moreover, it turns out that these elementary graph constraints map naturally into a triple-based representation such as offered by RDF—proposed by the W3C as the universal format to represent all SW knowledge on Internet—and heirs of RDF, such as RDF Schema, RDFa, LinkedData, SKOS, etc. In simple terms, OSF technology provides a set of formal and practical tools appropriate for the Semantic Web.
We are encouraged and comforted by the initial results we have obtained. The prospects of our work at this point is to extend the expressivity of the knowledge that can be expressed. The next challenge is to process complex structures, not just taxonomies. This will be done with the addition of features and aggregates so as to enable role-denoting concepts. As a later effort, we shall tackle event-based reasoning using scheduling-constraint techniques. On the experimental track, we must now proceed with more complex Hadoop/MapReduce configurations distributed over larger number of physical nodes able to handle incremental querying. The end objective is to consolidate all parts of the project, each part contributing to enhance the rest.
Using very large taxonomies, we compared FaCT++, HermiT, Pellet, TrOWL, Racerpro, and SnoRocket, with our prototype. The results show that our system is among the best for concept classification and several orders-of-magnitude more efficient in terms of response time for query-answering. This was published at ISWC 2013. We also developed a tool for anyone to verify our claims and made it available on our project's web site. We have now undertaken implementation of a full-fledged OSF reasoner over RDF triples. As for management and querying of RDF triple-stores of enormous size, using the Lehigh University Benchmark, we can now generate up to 1.6 billion RDF triples. We have experimented with two triplestore management systems: SHARD (fixing several of its shortcomings) and Jena HBASE. We are now doing the same with RDFPig, Jena TDB, and Virtuoso. We have also undertaken the implementation of our own triplestore.
The CEDAR project is the systematic formal study of, and practical
experimentation with, an approach to Knowledge Representation (KR) that
is an alternative to what has prevailed so far. The two main challenges
to overcome for the coming to pass of the Semantic Web are scalability
and distribution. The problem of scalability is that a well-designed
web-oriented KB system must be able to handle larger and larger volumes
of knowledge without unbearable degradation of performance. Dealing
with the second challenge — distribution — is as complex an issue since
it must deal efficiently and seamlessly with knowledge spread all over
the net under “real-life” conditions (cache faults, handling faulty
connections and time delays, query distribution, etc.).
This project proposes to address both challenges using Order-Sorted
Featured (OSF) graph-constraint approach. Contrary to most mainstream
approaches to the Semantic Web using Description Logic reasoning (e.g.,
OWL), the OSF graph-constraint formalism uses a proof technique that is
operationally lazy (i.e., it does not do anything that is not needed),
endowed with instant (i.e., 0-cost) “memoing” (viz., proof caching), and
capable of handling very large concept hierarchies using modulated
binary encodings, as well as techniques taking advantage of the specific
structure of the OSF-graphs making up a KB.
The CEDAR project may be summarized as putting all the above claims to
the test on very large RDF-based KBs.
Project coordinator
Monsieur Hassan AIT-KACI (Laboratoire d'Informatique en Image et Systèmes d'Information) – hassanaitkaci@gmail.com
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIRIS Laboratoire d'Informatique en Image et Systèmes d'Information
Help of the ANR 500,760 euros
Beginning and duration of the scientific project:
January 2013
- 24 Months