MDCA - Programme "Masse de Données - Connaissances Ambiantes"

analysis, monitoring and optimization of Web documents and services – DocFlow

Submission summary

Context and motivation Since the 60's, the database community has developed the necessary science and technology to manage data in central repositories. From the early days, many efforts have been devoted to extending these techniques to the management of distributed data as well, and in particular to its integration. However, the Web revolution is setting up new standards, primarily because of: the high heterogeneity and autonomy of data sources, the increasing complexity and richness of data, and the scale of the Web and the diversity of interaction among its users. On the other hand, the increasingly global economy calls for tighter integration of global enterprises and OEM-supplier chains. At the same time, global enterprises and OEM-supplier chains are becoming more and more widely distributed and OEMs get constantly seeking for best suppliers. Such distributed workflow activities must rely on a light weight infrastructure, yet capable of providing predictable, safe, and secure workflow execution. Recently, standard languages for service workflow have even been proposed such as IBM's Web Services Flow Language or Microsoft's XLang, which converged to the BPEL4WS proposal and subsequently WSCDL proposal for choreographies. A recent overview of existing work can be found in. The implementation of orchestration and choreography description languages raises a number of difficulties related to efficiency and clean semantics and reproducibility of executions that are impairing their industrial acceptance. A serious shortcoming of approaches to Web Service orchestration and choreography is that they mostly abstract data away. Symmetrically, current approaches to Web data management typically based on XML and XQuery rely on too simplistic forms of control. We believe that time has come for a convergence of sophistication in terms of control and richness in data, for workflow and data management over the Web. We believe that active Peer-to-Peer XML-based documents provide the basis for an adequate infrastructure for this. The overall objective of this project is thus to propose such an infrastructure and study its mathematical foundations. Novelty, high objectives and key expected results Ensuring convergence of data and workflow management with a focus on Web information management. Defining an infrastructure of active Peer-to-Peer documents able to perform stateful distributed activities. Providing Web compliant alternatives to existing distributed database technology, making use of no locking mechanism. Developing a technology for Web services orchestrations and choreographies, based on the central notion of document. Developing models and approaches to handle performance, monitoring, and other Quality of Service aspects, for our infrastructure of active Peer-to-Peer documents. Developing novel techniques to strengthen some recognizedly weak aspects of Web Services technology regarding security. Establishing all the above on a formally sound basis. Related work The DocFlow project relates to several different research areas and uses background from various communities. We briefly review these. Distributed systems, P2P and distributed query optimization In the context of distributed data management, distributed query processing has been studied since the early days of databases, and in particular in the context of mediator systems and P2P environments. Peer-to-peer This term refers to a class of systems and applications performing a function using distributed resources, with no centralized control and a dynamically evolving set of peers. Together, peers may produce computing power as in, e.g., setiQhome, or storage space as in, e.g., Napster or KaZaA. Distributed hash tables are an example of popular P2P technique. Peer computing is gaining momentum as a large-scale resource sharing paradigm by promoting direct exchange between equal peers. In this project, we propose a system where interactions between peers are at the core of the data model, through the use of service calls. XML documents with embedded Web services calls Service calls in semi-structured data have been considered in the context of Lore and Lorel. Other systems recently proposed languages based on XML or other documents with embedded calls to Web services. AXML is more powerful as it provides means of controlling and enriching the use of Web service calls for data and workflow management purposes, in a distributed setting. Also, AXML is a continuation of the work on ActiveViews. The main differences with ActiveViews are that AXML promotes peer-to-peer relationships vs. interactions via a central repository. The activation of service calls is also closely related to the use of triggers in relational databases, or rules in active databases. Active rules were recently adapted to the XML/XQuery context. A recent work considered firing Web service calls. AXML goes beyond those by promoting the exchange of AXML data. Data integration systems These typically consist of data sources, which provide information, and of mediators or warehouses, which integrate it with respect to an integration schema. AXML takes a hybrid path between mediator systems (the integration is virtual) and warehouses (all data is materialized). Mappings between data sources are captured in AXML by service calls embedded in the data. Service composition and workflow The integration and composition of Web services has recently been an active field of research. Standard languages for service workflow have been proposed such as BPEL, and the WSCDL proposal for choreographies. A recent overview of existing work about service composition can be found in; therein, services are communicating Mealy machines together with input/output signatures on messages (given by XML Schema types). Mobile code Mobile codes are programs that use mobility as a mechanism to adapt to resource changes, cf. the Join-Calculus and the Sumatra language. In our case, peer to peer architectures and asynchronous communication are used; also active documents are exchanged, but our active documents are more restricted than general code. Distributed monitoring of networked systems Attention has been paid to dealing with large distributed systems that cannot be monitored as a whole, for reasons of size. Some work deviates from the above by explicitly handling available concurrency in large distributed systems; unfolding and similar techniques are used in combination with modular algorithms, resulting in a supervision architecture that is itself distributed.

Project coordination

Anca MUSCHOLL (Université)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - (INRIA Saclay)

Help of the ANR 489,122 euros
Beginning and duration of the scientific project: - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter