It is becoming increasingly realistic to exploit transcriptions of spoken data for tasks that require comprehension of what is said in a conversation. SUMM-RE will combine expertise in theories of discourse interpretation with recent developments in distant supervision to improve the automatic production of meeting summaries and minutes from spoken data.
State of the art approaches to abstractive summarization treat discourse as a mere linear sequence of utterances. SUMM-RE posits that by exploiting information about discourse relations and the rich structures determined by relations between utterances, we can significantly improve models for abstractive summarization. A major hurdle to developing more sophisticated models of discourse structure for spoken, multiparty conversation is a lack of appropriate training data. SUMM-RE will address this problem in two ways. First, it will create a new and unique corpus of meeting-like interactions in French. Second, it will label this corpus and a large corpus of meeting-like interactions in English for discourse structure. The annotation approach will extend recent developments in distant supervision to develop labelling functions that can be used to automatically label large amounts of data. This approach has the very attractive advantage of harnessing linguistic expertise while keeping manual annotation to a minimum.
The automatically annotated data will be used to improve algorithms for both short topic summaries and more detailed meeting minutes. These algorithms in turn will be integrated into the lead partner's (LINAGORA's) semi-automatic summarization tool to significantly improve the output for its users. All project results (corpus and algorithms) will be released under an open-source license as a part of LINAGORA's LinTo/Conversation Manager offer.
Madame Julie Hunter (LINAGORA GRAND SUD OUEST)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LPL Laboratoire Parole et Langage
IRIT Institut de Recherche en Informatique de Toulouse
LINA LINAGORA GRAND SUD OUEST
LIX Laboratoire d'Informatique de l'Ecole Polytechnique
Help of the ANR 669,891 euros
Beginning and duration of the scientific project: December 2020 - 42 Months