Contextual Analysis and adaptive Search – CAAS
Information Retrieval Systems (IRS) aim at retrieving information that meet a user’s need expressed in a query. Retrieving relevant information to a query implies a two step process: off line, the system indexes documents, the system computes the similarity between the user’s query and the document representations (indexing terms) to retrieve the most similar documents. Current IRS, e.g. search engines on the web are general search tools implementing the same mechanisms and the same methods of data processing and matching, whatever the context of the search, the user, the type of information needs, or information usage are.
The assumption of the project CAAS is that context could improve the performances of the IRS, explicating certain elements of the information retrieval. The contextual aspect refers to tacit or explicit knowledge concerning the intentions of users, the environment of users and the system itself.
The fundamental scientific issues that we can quote are:
. Control of the variety of the contexts: To raise this issue, we will have to define models making it possible to represent the various aspects of the context in IR. It is also a question of studying the variety of the treatments and their adequacy with the variety of the contexts.
. Training of the contexts: Modelling context is not an end in itself. The system must be able to have the intelligence to decide the most adequate technologies compared to a given context, i.e.: to adapt the methods of IR to the context
. Recognize a context: when a context comes across, the system has to detect it among the leant contexts in order to be able to decide which method it should apply.
To tackle these challenges, CAAS will consider the various aspects that may impact the IR process first as independently as possible, then considering the cross-effects. We will focus on the following contextual elements:
. the users’ expectation and users’ queries
. the documents
. the system components
For each of them, we will consider various collections and will qualify them, then we will analyse them deeply in the aim of extracting models and behaviour. Once each contextual element will be analysed, we will consider the cross effect. For example, one of the results could be that query reformulation using relevance feedback is useful when the query contains proper nouns.
We will consider both benchmark collections from international program and more realistic collections from companies. CAAS also aims at developing modules from our findings. These modules will be integrated in IR platforms so that they could be re-used as components of complete IR systems. Because analysis and modelling is the core of the project, the partners are all academics. However companies are largely considered: first one major IR web search engine will provide us with query logs as well as a smaller company. Companies will also be considered in the spreading results activities: we will contact different companies in order to show our finding and either will suggest customizing the developed modules for them or transferring the technologies. For example, one application is to suggest adds to be associated to users’ queries in a web site.
To tackle the challenges, the consortium is composed of two institutes in computer sciences, both specialists in IR, but with complementary skills. LIA (Laboratoire Informatique Avignon) works on Question Answering problems, while IRIT (Institut de Recherche en Informatique de Toulouse) is more specialists in Adhoc retrieval and detecting novelty. IRIT works in close relation with IMT (Institut de Mathématique de Toulouse) and for this project with the Statistique et Probabilité group. Even if IMT does not appear as a partner they will be working in this project. CLLE (Cognition, Langues, Langage, Ergonomie) is partner of this project regarding their linguistic skills and past work in IR and natural language processing.
Project coordination
Josiane MOTHE (UNIVERSITE TOULOUSE III [PAUL SABATIER])
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
CLLE CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE - DELEGATION REGIONALE MIDI-PYRENEES
IRIT UNIVERSITE TOULOUSE III [PAUL SABATIER]
LIA UNIVERSITE D'AVIGNON ET DES PAYS DE VAUCLUSE
Help of the ANR 438,568 euros
Beginning and duration of the scientific project:
- 42 Months