DS10 - Défi des autres savoirs 2017

Towards Innovative Ways of Visualising, Exploring, and Linking Resources for Medieval Latin – VELUM

Submission summary

The project is a first step towards an innovative digital environment for the study of the language and culture of medieval Europe. The medieval civilization can only be investigated by means of the study of traces that have survived to our times. The best source of our knowledge is the texts, preserved in huge quantity and variety. Written mainly in Medieval Latin, within a social context that had nothing in common either with ancient or our times, they have not benefited from recent advances in computational linguistics and digital humanities in general.

To challenge this situation we will build, firstly, a large and balanced corpus of Medieval Latin texts composed between 500 and 1500 AD all across Europe. Apart from wide geographical and temporal coverage, the corpus will also reflect the variety of genres practised in the Middle Ages, as well as the functional richness of the medieval textual culture. In order to enable automatic processing, the texts will be annotated with Part-of-Speech, lemma, time and place labels. The compilation and annotation of the corpus, albeit extremely important, will be only a first step of the project. Secondly, a corpus search engine will be built with the help of the CQP-Web software. The users will be able to query the texts and benefit from their rich linguistic annotation through a user-friendly interface. Thirdly, the project aims at developing a set of efficient statistical analysis and data visualisation tools that researchers would embed in their own workflows. Written mostly in R, scripts, programs, wrapper functions will allow for advanced study of Medieval Latin vocabulary, but will be applicable to other languages as well.

The project will take advantage of the outstanding documentary and digital infrastructure of the IRHT-CNRS, with its library containing circa 120,000 publications, a pool of IT specialists providing their support for every stage of the project’s workflow. The contributors to the project are practising computational linguists, lexicographers, and historians which will work in close collaboration. During the project a young or early-career scholar is expected to be recruited.

Both the texts and the tools will be made freely available to the scientific community through the project’s website and public code repositories. This way of dissemination should not only facilitate research, but is also expected to influence the current practices in historical and philological research by promoting automatic, “distant reading” approaches to ancient texts.

Bruno Bon (Institut de recherche et d'histoire des textes)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

IRHT Institut de recherche et d'histoire des textes

Help of the ANR 256,122 euros
Beginning and duration of the scientific project: February 2018 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.