DS08 - Sociétés innovantes, intégrantes et adaptatives

Analysis and automatic processing of discourse – TALAD

Submission summary

This project investigates how Discourse Analysis (DA) should benefit from the integration of Natural Language Processing (NLP) techniques to reinforce its methodological tools and to conduct deeper and wider studies. Its main goal will be the development and/or adaptation of NLP solutions (in particular Named Entity Recognition and Disambiguation, Coreference Resolution and Entity Linking), which will be specifically dedicated to DA, in order to reach more complex descriptors and deeper discursive levels - going beyond than the bag of words model now dominating lexicographic-based DA approaches. We expect a deep impact of this methodological jump for what concerns both tooled and untooled DA. In return, DA will provide NLP with complex phenomena and research questions that should promote further advances in the domain. The impact will be evaluated on the linguistic phenomenon of nomination, notably of persons, places, events, and generally of those entities that are defining public space. Nomination is often used in public discourse as a way of re-categorisation. According to the enunciative position, it contributes to define the referents, to “colour” their perception, as well as to build possible associations (which may lead to confusion) that in turn will have an impact on public debate. A good example in this sense is the recent usage, in notably French politics and media, of terms such as “migrants/immigrants/réfugiés/demandeurs d’asile/ candidats à l'asile /déracinés …” (migrants, immigrants, refugees, asylum seekers, uprooted) or of phrases such as “réfugiés et migrants / réfugiés climatiques / migrants économiques?/migrants mineurs/ migrants désespérés” (refugees and migrants, environmental refugees, underage migrants, desperate migrants). A comprehensive corpus research will allow scholars to observe the spreading of nominations, to identify how new axiological and polarity values are associated with them, how they evolve and shift in meaning. The project has theoretical and methodological issues, but also societal and political ones.
In order to transpose the research questions related to nomination from DA to NLP we need to produce an ontology of well defined concepts. In fact discourse analysis has built a solid conceptual apparatus of notions but their interpretation may vary with respect to the different schools and currents. This project will gather discourse analysts around the issue of nomination and in collaboration with NLP specialists, creating an annotated corpus that has the ambition to structure current research practices into operational an annotation model. The adoption of an ontological approach will enable us to master the terminological complexity within discourse analysis, to clarify its objects of study, to stabilise the conceptual apparatus in order to improve the theoretical discussions starting from a common background. The proposition of an annotation scheme defined on such basis will lead to the distribution of the ontology and of an annotated research corpus, which will answer the needs of the DA community. This corpus will be freely available and distributed together with the documentation issued from the reflexions in the project; it will constitute both a novel scientific advancement in the domain and and an opportunity for the community take up from there and to continue the discussion thus initiated. This corpus will equally be the locus of methodological and technological interaction between the communities of DA and NLP, something which will facilitate the dissemination of the results in both fields, making the added value of the results clear for both of them.

Julien Longhi (AGORA - EA7392)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

LI Laboratoire d'informatique
Praxiling Praxiling
RETICULAR PROJECT
ERTIM EQUIPE DE RECHERCHE : TEXTES, INFORMATIQUE, MULTILINGUISME
AGORA - EA7392

Help of the ANR 368,536 euros
Beginning and duration of the scientific project: September 2017 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.