CE28 - Cognition, comportements, langage

Predicting speaker's usages in oral French : quantitative, experimental and comparative approaches of syntactic alternation – PULCO

PULCO : Predicting speaker usages in Oral French

The project's ambition is to explain speakers' oral usages and some of the production mechanisms underlying them. To this end, we will develop predictive models of certain speakers' uses in order to gain an overall picture of the phenomena, and we will experimentally collect controlled oral productions to understand how certain factors intervene and interact in speech production.

Stakes and Objectives

One of the main aims of current work in quantitative and experimental syntax is to identify factors with predictive power on speakers' choices when faced with syntactic alternations, i.e. cases where the speaker has a choice between two or more syntactic structures to express equivalent meanings. Drawing on the intersection of quantitative corpus methods (e.g. Bresnan et al 2007) and experimental methods (e.g. Bresnan & Ford 2010), the study and modeling of predictive factors has been developed in a wide range of languages, for a variety of phenomena. Over the past ten years, French has given rise to modeling of these kind of phenomena, for example on the position of the attributive adjective (une agréable soirée / une soirée agréable; Thuilier et al. 2012), the order of verb complements (donner un verre de lait au chat / donner au chat un verre de lait; Thuilier 2012) or the alternation between active and passive voice (un enfant a trouvé le chat / le chat a été trouvé par un enfant; da Cunha & Abeillé 2020). To date, however, studies have focused mainly on written French, due to the availability of large, richly annotated resources. The time has come to include oral French data in this field of study: for the quantitative dimension, this is now feasible, thanks to the recent online availability of the Corpus d'Etude pour le Français Contemporain (CEFC, ANR Orfeo) (Debaisieux & Benzitoun 2020). On an experimental level, the sentence recall paradigm is a complementary protocol to the study of attested usage, in that it enables oral production to be studied under controlled conditions, as has been done for French by Thuilier et al (2021). Our project proposes to intensify research in quantitative and experimental syntax on oral language, and to enrich these two strands by adding a comparative dimension, considering that the study of syntactic alternations provides us with information on the functioning of human language when comparing varieties of the same language (cf. Bresnan & Ford 2010, Szmrecsanyi et al 2017). Our project aims to apply this triple approach to the study of syntactic alternations in the verbal domain: the order of verb complements, active/passive alternation, anticausative alternation (Paul ferme la porte / la porte se ferme), and verb subcategorization alternations, as in toucher une question / toucher à une question (Huyghe & Corminboeuf 2018).

The project is based on a three-part methodology: quantitative, experimental and comparative.

Quantitative component:
We envisage a supervised quantitative approach, based on clean, carefully documented and richly annotated data, so as to obtain interpretable models for syntactic theory, the results of which can be cross-referenced with experimental results. Of the 10 million words in the CEFC, almost 4 million correspond to oral transcriptions (Debaisieux and Benzitoun 2020). As far as possible, the aim is to work on unplanned speech data. In all cases, the CEFC metadata will make it possible to take the communication situation into account, and thus to get an idea of the degree of planning of the utterances. The CEFC oral sub-corpus has a size and annotations (sentence segmentation, POS, lemmas, dependency parsing) suited to our approach.

Experimental component:
In order to study speakers' oral production in a controlled way, we propose to work from the sentence recall paradigm (Potter & Lombardi 1990; Lombardi & Potter 1992). We will set up a first experimental protocol, inspired by Tanaka et al. (2011) and Thuilier et al. (2021), in which each participant is exposed to a list of oral stimuli (sentences) that he or she will have to recall after a distraction task (simple mental arithmetic) and thanks to an oral cue that will be provided for each sentence to be recalled. The aim is to see whether there are any deviations from the stimulus sentences, which could indicate the effect of certain factors on the chosen structure and hence on the production mechanisms involved. In particular, we will use this protocol to test accessibility effects in interaction with the prototypicality of verb arguments.

Comparative aspect:
The homogenization of formats and annotations in CEFC will enable us to offer a quantitative comparison of usage between different regions (Paris/CFPP2000, Belgium/Valibel and Switzerland/OFROM), where studies on a single variety have been carried out (see Corminboeuf et al 2020 for OFROM; Liang et al 2021 for CFPP2000). The size of certain regional corpora or the absence of certain varieties will also lead us to develop an experimental approach to regional variation. In Dagnac & Thuilier (2020), we studied the alternation between marked and unmarked OD (je la connais, à Mélanie / je la connais, Mélanie) in South-Western spoken French using a protocol for collecting acceptability judgments. The success of this study shows that it is possible to study regional varieties and their specificities experimentally, and paves the way for the development of a sentence recall protocol adapted to the study of regional varieties.

The CEFC corpus will be used to develop robust models capable of predicting speakers' actual oral choices, and to compare them in different varieties of French (France, Belgium and Switzerland). In addition, we will develop oral sentence recall protocols to improve understanding of two dimensions at play in oral speech: 1) the interface between syntax and prosody; 2) the interface between syntax and semantics, by studying the role of the verb, the accessibility of its arguments and their prototypicality. Finally, we will apply quantitative and experimental methodology to the comparison of French varieties.

The results will be :
- tables of manually sorted, richly annotated syntactic alternation data ;
- prediction models for the choice of oral syntactic structures;
- sentence recall protocols adapted to the study of oral syntactic production, to the exploration of the syntax-prosody interface in production, and to the comparison of the syntax of French varieties;
- experimental results relating to these protocols.


Bîlbîie, Faghiri, Thuilier (Eds) (2021) Syntaxe expérimentale, Langages 223, Armand Colin www.cairn.info/revue-langages-2021-3.htm

Thuilier, Grant, Crabbé, Abeillé, (2021) Word order in French: the role of animacy, Glossa: a journal of general linguistics 6(1) doi.org/10.5334/gjgl.1155

Thuilier, Faghiri (2019) Canonicité des arguments verbaux, caractère animé et ordre linéaire, Journée d’études Les constructions verbales (non) canoniques : de la réalisation argumentale à la structure propositionnelle, Fribourg.

One of the major goals of the quantitative and experimental study of syntax is to identify factors that have predictive power on speaker choices in the face of syntactic alternations, i.e., cases where the speaker has the choice between two syntactic structures to express identical meanings. Written French is studied from this perspective for about 10 years (i.e. Thuilier et al 2012, da Cunha & Abeillé 2020). Now is the good time for including oral French in this field. The quantitative part deserves to be considered thanks to the very recent release of Corpus d’Etude pour le Français Contemporain (CEFC, ANR Orféo). On the experimental side, the sentence recall experimental paradigm is a tool complementary to the study of attested data, in the sense that it allows to investigate oral production under controlled conditions, as it has been done for French by Thuilier et al (2021). We enhance these two components of the project by adding a comparative part, considering that highly targeted study on the syntactic alternations tells us about how human language functions when we compare languages and language varieties. The project aims to apply this threefold approach to the study of syntactic alternations in the verbal domain: verb complements ordering, passive/active alternation and alternation of verb subcategorization frames. We will put the emphasis on two dimensions at stake in oral production: 1) syntax/prosody interface in terms of words grouping; 2) syntax and semantic interface, by studying the key role of the verb, the accessibility of its arguments and their prototypicality. The goal of the project is to create predictive models of some syntactic usages of oral French, taking into account geographical variation, and to provide a better understanding of oral French syntax in the interface with semantic and prosody, being based on typological comparison.

Project coordination

Juliette THUILIER (Université Toulouse - Jean Jaurès)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


CLLE Université Toulouse - Jean Jaurès

Help of the ANR 276,895 euros
Beginning and duration of the scientific project: February 2023 - 48 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter