Today's overabundance of textual data makes it difficult to find the correct answer to a user query. In this context, Question Generation (the ability to automatically generate questions from a document) is rapidly gaining traction as a key technology. This project aims to investigate the task of passage-level abstractive question generation: we will focus on how to generate questions whose answer is distributed in the text and where the words that make up the question are not necessarily present in that text. We will develop machine reading approaches that take into account the structure of the document (both typographically and rhetorically), and generate complex questions that inherently rely on this structure. In order to train our models, we will construct relevant datasets and annotations with limited supervision. Our models will be evaluated both intrinsically and by integrating them into a conversational agent as an in-vivo testbed. By developing more advanced models of question generation that go beyond the current state of the art, our aim is to improve numerous industrial and societal applications: the kind of complex question generation we aim to solve would be a substantial asset for the automatic construction and enrichment of knowledge bases used within conversational agents, decision support systems, and technical question answering systems.
Monsieur Tim Van De Cruys (Institut de Recherche en Informatique de Toulouse)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
UMR 7503 - LORIA Laboratoire lorrain de recherche en informatique et ses applications (LORIA)
SYNAPSE DEVELOPPEMENT
IRIT Institut de Recherche en Informatique de Toulouse
Help of the ANR 436,642 euros
Beginning and duration of the scientific project:
September 2019
- 48 Months