FRAL - Programme franco-allemand en Sciences humaines et sociales

The transition from Latin to French: constitution and analysis of a Latin-French digital corpus – PaLaFra

The transition from Latin to French: constitution and analysis of a Latin-French digital corpus

Filling a gap in our knowledge of the history of French: the transition from Latin to French (PaLaFra)

How can we follow the transition from one language to another through a continuous and partially hidden linguistic evolution?

While there is no gap in the written record of sources preserved since Antiquity, the transition from Latin to French is still poorly understood. One of the first reasons for this is the fact that languages change continuously. Different language registers corresponding to the situation, the region, the social origin of speakers... coexist in the same space. It is not easy to determine the precise moment when two registers or dialects separate into two distinct languages. Furthermore, Latin texts perpetuated for centuries old patterns inherited from the ancient culture, obscuring the growing divergence between the Latin of the litterati and that of the illiterati. At the same time, the nascent French did not have enough prestige to deserve to be used as a written language. Therefore, there is a linguistic gap between the written Latin and the vernacular language contemporaneous with the first French texts, and the intermediate stages of its formation are poorly documented. The PaLaFra project proposes resources and a new methodology in order to fill this gap in the scholarship. Its purpose is twofold: build a new corpus of texts, develop a sustained collaboration between Latinists and French specialists through shared use of the corpus.

The PaLaFra corpus is the result of the collaboration between Latinists (mainly in Germany) and historical linguists of French (mainly in France). It consists in a compilation of Latin texts, coming from the Monumenta Germaniæ Historica, French texts coming from the Base de français médiéval and parallel translated texts. The texts are chosen, formatted, described and organized to allow a comparative Latin-French study. Digital tools and procedures have been developed to annotate the data with linguistic information (grammatical categories, etc.) and a specialised software, the TXM platform, gives access to analytical tools. A team of 47 researchers from 11 countries is in charge of the linguistic analysis based on the corpus. The main focus is on (morpho-)syntactic changes occurring during the transition from Late Latin to French. Sociolinguistic and textual variation is also taken into account. This research will be published in a handbook whose chapters are written by a tandem of a Latinist and a specialist of medieval French. The book will cover all major topics in the domain of morpho-syntax and will serve as a framework for future research on the transition from Late Latin to French. It will moreover confer visibility to the PaLaFra corpus as a tool for research on language evolution.

The PaLaFra project has built a set of digital resources useful for linguistic research. These resources are composed of a corpus of texts and lexicons, lists of grammatical categories, etc. Thanks to its bilingual composition, its size (about 1,400,000 words), the variety of texts and the richness of its linguistic annotations, the corpus represents a unique tool for studying the transition from Latin to French. It is accessible to all kinds of users with the TXM web platform upon simple free registration on the Base de français médiéval portal (http://txm.bfm-corpus.org).

New fields of research have been explored by the partners, in particular the study of coreference chains (successive mentions of the same referent in a text) from Latin to French. The definite and indefinite articles constituting one of the major innovations of the Romance languages, the paradigmatic reconfiguration of Latin pronouns / adjectives and the transformations in the composition of the chains play a central role in the linguistic transition. These phenomena are combined with syntactic and semantic evolution which is of main importance for the typological changes marking the transition from Latin to French. Annotating and studying the coreference chains would allow to take benefit from the current corpus while integrating a pragmatic-discursive new perspective.

The main result of the project has been the production of a handbook covering (morpho-) syntactic aspects of the linguistic transition from Latin to French. This publication and the related corpus establish a new field of investigation and will serve as a framework for researchers and students interested in the history of Latin, Romance languages and French. The findings of the project should also be of use to those interested in linguistic change and fragmentation. A first collection of studies has already been published under the title Latin tardif, français ancien : continuités et ruptures (éds A. Carlier & C. Guillot-Barbance, 2018).

It is well known that the comparative grammar of Romance languages does not allow us to trace the way back to Late Latin. There is a kind of "no man's land" between Late Latin and the language stage which can be reached by means of the method of reconstruction. As has been shown by Banniard, this gap is conceptual rather than chronological, because Late Latin and the vernacular coexist in the same communicative space, and while they begin as two varieties on the same linguistic continuum - the proportion of conservative and innovative features varying according to the register (e.g. sermo altus, stylus simplex) - this continuity is later disrupted as the vernacular becomes identifiable as an autonomous linguistic system.
This project will contribute to our understanding of the relationship between Late Latin and one of the Romance languages, French, in the complex evolution from a lingua mixta to diglossic variants. It will initiate a collaboration between the scientific community of Latin linguistic community and researchers in French historical linguistics.

The three partners in this project will have the following role:
(i) The German team will provide general expertise on Late Latin as well as a methodology developed to reconstruct Romance innovations. On the basis of the Monumenta Historica Germaniae, it will set up a Late Latin corpus, which will be digitized, annotated and lemmatized morphosyntactically. It will also enrich this corpus with pragmatic-discursive annotations and mark-up.
(ii) The Lyon team will bring its strong expertise in the field of digital corpora for the medieval period. It will enrich the Medieval French Corpus (BFM), in particular by introducing lemmatization. It will also combine the Late Latin corpus and the Medieval French corpus into one Latin-French corpus, that will freely accessible for research and maintained in the long term. Moreover, the team will develop a corpus of aligned translations for Late Latin and the corresponding Old French texts, which will be a valuable tool for studying the relationship between Late Latin and Old French in a precise manner.
(iii) The task of the Lille partner is to coordinate linguistic research on the bilingual corpus in order to describe and to analyze the relationship between Late Latin and Old French. She will produce a multi-author reference book that will serve as a framework for future research on the issue. Researchers using the Latin-French corpus will also evaluate the user-friendliness of the corpus and the appropriateness of the choice of texts and annotations, giving feedback to the German and Lyon teams in order to optimize the development of the corpus.

Project coordination

Céline GUILLOT-BARBANCE (Interactions, corpus, apprentissages, représentations)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

STL Lille Savoirs, Textes, Langage
Institut fuer Romanistik, Uni Regensburg Université de Regensburg, Institut fuer Romanistik
ICAR Lyon Interactions, corpus, apprentissages, représentations

Help of the ANR 227,619 euros
Beginning and duration of the scientific project: September 2014 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter