DS0705 -

Language register transformation using linguistic pattern extraction – TREMoLo

TREMoLo : Language register transformation using linguistic pattern extraction

The TREMoLo project studies the use of language registers and seeks to develop automatic methods towards the transformation of texts from a register to another.

General objectives

The objectives of the project TREMoLo are:<br />1. To study and characterize the usages of language registers in written texts.<br />2. To develop automatic methods to transform texts from its original register to another.<br />3. To build fundations for generalizing the approach to other stylistic components of the natural language.<br /><br />These objectives are mainly related to natural language processing in the field of computer science and the study of language variations in the fields of linguistics and socio-linguistics.

The global method of the project relies on 2 main steps:
1. Describing register-specific texts with linguistic features and extracting discriminant sequential patterns from them.
2. Automatically paraphrase generation of texts with the objective to fit patterns of a target register.

/

The conducted research is exploratory as it aims the production of fundamental knowledge in linguistics, and longer-term extensions to other types of stylistic variations. Application domains of this work are human-machine interaction, and assistance to language mastery.

/

Linguistic registers are known to have a strong influence on the expressivity conveyed by utterances. However, their study in natural language processing (NLP) is still marginal. To compensate for this deficiency, the TREMoLo project focuses on their analysis and automatic manipulation, with a particular attention on French. Beside its originality, this research work will be complementary with the wide-spread activities in textual information extraction in NLP.

The main objectives of the project are to study linguistic registers per se, and to develop methods for automatic transformation of linguistic registers across texts, i.e., translating a text from a register to another. This work will rely on the extraction of register-specific linguistic patterns and their integration in an automatic paraphrase generation process. These objectives are enabled by the strong and complementary skills of the consortium members.

The project is driven from a perspective of exploratory research where the goal is the production of fundamental knowledge for style-specific pattern extraction and automatic natural language generation. Linguistic registers are a well-suited case study to achieve this long term objective.

The project is part of the growing interest towards stylistics in NLP, domain for which the number of potential applications increases. For instance, stylistics can take part in authorship authentication, access to information, human-machine dialogue or interaction, and language learning. Societal consequences of the project are thus naturally in these domains by opening possibilities for automatic text modulation according to a specific goal or audience. Scientific advances mainly stand in the joint use of data mining and statistical NLP approaches, along with the discovery of new linguistic and sociolinguistic findings. All these aspects provide a high industrial valorisation potential to the project.

Project coordination

Gwénolé Lecorvé (Institut de recherche en informatique et systèmes aléatoires)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

IRISA Institut de recherche en informatique et systèmes aléatoires

Help of the ANR 268,274 euros
Beginning and duration of the scientific project: September 2017 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter