Disfluent Utterances, Exclamations and Laughter in Dialogue – DUEL

Disfluency, Exclamations and Laughter in Dialogue (DUEL)

Although disfluent speech is pervasive in spoken conversation, disfluencies have received little attention within formal theories of<br />grammar. Over recent years much evidence has accumulated that disfluencies contain much useful information that guides language users' actions and evaluations of their interlocutors' states of mind. DUEL will also tackle another significant semantic phenomenon neglected by formal grammarians, namely laughter.

DUEL aims to show how disfluencies, laughter, and exclamations across a number of languages can be formally analyzed, as well as designing dialogue systems which can robustly handle these phenomena.

Although disfluencies, exclamations and laughter occur frequently in spoken conversation, they have received little attention both within formal theories of grammar, where they are widely perceived as phenomena outside of its range, and practical dialogue modelling, where they are perceived as distractions to be filtered out. Key empirical insights into disfluencies (or self-repairs in an alternative terminology) came from work in psycholinguistics, showing that, for instance, disfluent utterances rather than being noise actually aids comprehension and that its effects linger after correction. Laughter has been studied within Conversation Analysis since the late 1970s and is the source for many insights about where laughter can occur, its elicitation, and at times the need to avoid responding using laughter. But generally, the assumption has been that laughter lacks meaning akin to what words and phrases possess and that it does not contribute to the compositional construction of meaning. There is very little higher-level modelling of the use and function of laughter in dialogue. The objectives of DUEL were: (a) to acquire through an empirical program comparative French, German, and Chinese data on disfluency, exclamations, and laughter; (b) to use these data to inform a theoretical, grammar-based model of the role of these phenomena in influencing dialogue meaning; (c) to use the theoretical model to inform a computational implementation of a spoken dialogue system that can handle these phenomena.

Work was divided into three main areas: Area E(mpirical Basis): this work area provides focussed empirical support for the project. Spontaneous interaction data was collected and annotated for the relevant phenomena; Area T(heory Construction) In this area, an existing theory of dialogue semantics, KoS was extended to cover disfluencies, laughter, and interjections. This required construction and integration of a new model of word-by-word meaning construction (`incremental semantics’); Area C(omputational Modelling): This work area was devoted to implementing tools for disfluency, incremental semantics, and laughter processing for a spoken dialogue system. The processing approach was informed by the data collected in E, using it where possible to evaluate the computational model and / or to learn parameters for it; it also attempt to stay as closely as possible to the theory built in T.

Novel evidence of the rule-based nature of disfluencies was obtained by cross-linguistic studies of editing phrases (e.g., `I mean’ (Eng), `(en)fin’ (Fr), `ja’ (Ger)) and of self-addressed questions (`What’s the word?’, `comment dire?’). New generalizations about the placement of laughter in relation to speech were obtained and it was demonstrated that laughter can and needs to be be integrated into the meaning construction process of an utterance. State of the art algorithms for detecting disfluency and for meaning construction as a word-by-word process were developed.

The project has on the whole attained its objectives: a substantial multilingual, multimodal corpus with disfluency, exclamation and laughter annotations was compiled and is freely available to the research community; new empirical generalizations concerning these phenomena were discovered; formal analyses of the phenomena were developed---the very possibility of such analyses for laughter was unclear before this project (see discussion of our work on tthe popular Dutch blog Sargasso and the reference to our work in (Schlenker et al, 2017) a debate about the extent of formal semantic analyses); substantial progress was achieved in incremental computational processing of disfluency, semantic composition, grounding, all key components of any spoken dialogue system which can understand and generate utterances containing the phenomena at issue. Both theoretical and computational aspects of the project lead to reconsider long standing assumptions concerning the nature of grammar and semantic composition, as discussed in several high impact publications.

In part on the strength of their contributions to the project both post docs gained permanent posts, Hough (Bielefeld) as a lecturer at QMUL, London, Tian as a research scientist at Amazon, and Ginzburg received a 5 year senior fellowship from the Institut Universitaire de France.

The resources developed in the project as well as the theoretical and computational tools open up several prospects: scaling up the perspective to consider other non-verbal social signals such as smiling. frowning, and winking, as well as their manifestation in text as emojis; developing spoken dialogue systems and embodied computational agents can understand and generate utterances containing the phenomena at issue; testing the neural reality of the classifications proposed in our work. We have already initiated collaborations on these fronts with colleagues in Paris, Gothenburg, and London.

The project compiled a corpus of natural, face-to-face, loosely task-directed conversations in French, German, and Mandarin Chinese. The conversations in three languages were recorded using near-identical technical setups and the corpus includes audio, video and body tracking data and is transcribed and annotated for disfluency, laughter and exclamations. Publications include four articles in prominent international journals, a book chapter, and over twenty conference papers in leading conferences in computational, theoretical, and psycho- linguistics.

Although dysfluent speech is pervasive in spoken conversation, dysfluencies have received little attention within formal
theories of
grammar---they are widely perceived as meaning-free production errors. The majority of work on dysfluent language has
come from psycholinguistic models of speech production and comprehension and from structural approaches designed to
improve performance in speech applications. Over recent years much evidence has accumulated that dysfluencies, far
from being meaningless noise, contain much useful information that guides language users's actions and evaluations of
their interlocuters' states of mind. Moreover, they exhibit rule-like regularities on all levels (including phonology, syntax, and
semantics.). In DUEL we aim to show how dysfluent speech across a number of languages (including
French, German, English, and Chinese) can be analyzed in a precise way on the basis of formal grammatical tools, using this
theory to guide the design of dialogue systems which can deal head on with dysfluent speech, exploiting the information
therein rather than filtering it away. DUEL will also tackle another phenomenon that has not hitherto received attention
from formal grammarians, namely laughter. Empirical studies over recent years have shown that these occur relatively
frequently in conversation and do not typically involve `humorous' utterances. They play a significant semantic role, e.g. in
indicating an utterance is not to be taken seriously or in enabling a socially delicate utterance to be made without causing
offence. Our aim is to develop precise analyses of how laughter is integrated in the emergence of meaning, precise
enough to enable dialogue systems that understand and respond to laughter to be implemented. The tools developed in
DUEL to analyze disfluency and laughter will enable a variety of other dialogical phenomena that have been somewhat
marginal to be analyzed, e,g, exclamations, tag questions, and corrective particles such as `No'. Both theory and
implementation in DUEL will draw on carefully collected parallel data in French, German, and Chinese, as well as a small
number of experimental studies. This will enable subtle cross-linguistic
and cross-cultural differences to be described, as well as deeper commonalities to be hypothesized.

Project coordination

Jonathan GINZBURG (Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus)

CLILLAC-ARP EA 3967 Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus
Universitat Bielefeld Fakultät für Linguistik und Literaturwissenschaft

Help of the ANR 177,955 euros
Beginning and duration of the scientific project: March 2014 - 36 Months

