CE23 - Données, Connaissances, Big data, Contenus multimédias, Intelligence Artificielle

SEarch-oriented ConverSAtIonal systeMS – SESAMS

Search-oriented conversational systems

The project envisions a novel paradigm in IR in which the user can interact with the search engine in natural language through the intermediary of a conversational system. We refer to this as search-oriented conversational systems. There are several important challenges underlying this novel paradigm: 1) understanding the user's information need; 2) designing a proactive system; and, 3) evaluating this novel paradigm.

Learning from users' interaction

Search-oriented conversational systems are characterized by a heterogeneous context involving: 1) implicit feedback with respect to the search engine and/or 2) natural language information needs expressed through the conversational system. This complex framework gives rise to novel challenges:<br />•Understanding and contextualizing the information need for document ranking. The variable expression of information needs in natural language poses great challenges to its understanding. This problem could be tackled for instance through deep neural translation models (e.g., encoder-decoder approaches) by casting the understanding problem as a translation. In addition, the understanding process should also cope with the heterogeneous context according to the retrieval objective. One may, for instance, combine translation models with reinforcement learning techniques that jointly learn the representation of the context and the documents.<br />•Engaging the search-oriented conversational system in pro-active interactions by anticipating user’s needs and requesting feedback from users (e.g., asking users’ preferences over reformulated queries or retrieved documents). This challenge is related to interactions in a dynamic setting in which the search-oriented conversational system stimulates users’ feedback to enhance the retrieval effectiveness. This, for instance,would lead to adaptive reinforcement learning methods driven by an IR-guided policy.

The objective of the project SESAMS is to propose interaction-guided machine learning (ML) methods applied to conversational search systems allowing to bolster «human in the loop« within machine learning approaches.

Given a IR black-box system based on query, we have shown that natural language is less suitable for classic IR systems than queries. Then we have proposed a model aiming at translating users’ information expressed in natural language into keyword-based queries. This model is based on two components : a translation model enhanced with reinforcement learning techniques guided by the final objective (the IR task). This model has been designed during a summer intern in 2018 (after the project submission and before its beginning).
At the beginning of the project, we came back to this model with another summer intern (2019) to better analyze the model components and propose alternative methods. Among others, we have analyzed the semantic encoding model, lowered the overconfidence of the reinforcement learning component. Experiments have shown, as expected, that self-attention encoding can enhance the effectiveness of the model.

Moreover, Laure has participated in November 2019 to the Dagstuhl Seminar on conversational search (https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19461). She has been involved in several working groups.

She investigates emerging domains which could serve as basis of the project :
- Continual learning/Lifelong learning with a work under submission at an international conference.
- Conditioned/controlled language generation with an accepted work at INLG 2020. This work aims at controlling language generation through reinforcement learning models. Natural language generation is the basis of conversational systems and is conditioned to the search context.
- She has also initiated a state-of-the-art of conversational system prototypes to foster the future experiments and the development of a prototype is in discussion (master internship in 2020).
- We had planned a research period for Jian-Yun Nie in Paris for Spring 2020 which was unfortunately canceled due to sanitary reasons. This travel is reported to Spring 2021.

- We have recruited another Phd student who will start hopefully in January 2021 (the administrative process is in progress). He will focus on reinforcement learning models for conversational IR (axis 2) and will contribute to axis 3 through the evaluation of the proposed models.
- We have recruited a postdoc starting in january 2021. He will pursue the running work of axis 1 and integrate lifelong learning strategies. He will also contribute in axis 3 (evaluation of axis 1).

Sharon L. Oviatt, Laure Soulier: Conversational Search for Learning Technologies. CoRR abs/2001.02912 (2020)

Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari: PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation. INLG (2020)

Recent advances in Artificial Intelligence (AI), and more particularly in deep learning, have open tremendous perspectives for designing intelligent systems in which a conversation between a human and a computer is no longer an illusion. One radical change is the capability of systems to reason as humans in tasks requiring more semantic understanding, offering more natural ways for users to request actions/information. This perspective may revolutionize the way users access information. Indeed, until now in a traditional information retrieval (IR) research setting, the user's information need is represented by a set of keywords and the returned documents are mainly determined by their inclusion of these keywords.
The project SESAMS envisions a novel paradigm in IR in which the user can interact with the search engine in natural language through the intermediary of a conversational system. We refer to this as \textbf{search-oriented conversational systems}. There are numerous challenges underlying this novel paradigm we will address in this project: 1) understanding the user's information need by leveraging both interactions in natural language and users' implicit feedback; 2) designing a proactive system that anticipates users' actions and users' search intent by directly soliciting the user; and, 3) evaluating this novel paradigm by designing new theoretical evaluation frameworks for search-oriented conversational systems and building adapted large-scale datasets that would enable the proposed models to be learned and evaluated.

On the methodological aspects, SESAMS focuses on deep learning methods which have shown their effectiveness for reasoning over semantic-based applications, and more particularly deep reinforcement learning models which are particularly suitable for leveraging users' interactions. Deep learning models are known to be data hungry for training. In practical search contexts, we can only have a limited amount of training data. So, an important problem we address is learning with a limited amount of data. In summary, the project will introduce a new paradigm in information retrieval and novel deep learning methods augmented by interactions.

SESAMS will be developed at LIP6 under the supervision of Laure Soulier (MCF) who is specialized in Information Retrieval (in particular, interactive IR) and Representation Learning. She will collaborate with specialists with complementary skills: Ludovic Denoyer from LIP6 (reinforcement learning and deep neural networks), Vincent Guigue from LIP6 (representation learning and natural language processing), Philippe Preux from CRIStAL/Inria Lille (reinforcement learning and deep neural networks), and Jian-Yun Nie from DIRO/Montreal University (information retrieval and deep learning).
We plan to recruit one Ph.D. student and one postdoctoral researcher to investigate the presented research axes, and two master students for working on exploratory aspects (end-to-end models) or evaluation platforms that we plan to release to the community.

Project coordination

Laure Soulier (Laboratoire d'informatique de Paris 6)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

Université de Montréal / Département d’informatique et recherche opérationnelle
LIP6 Laboratoire d'informatique de Paris 6
CRIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille

Help of the ANR 223,560 euros
Beginning and duration of the scientific project: - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter