Application "The language of the country" – YAR - Yezh Ar vRo
The YAR pilot research project is a collaboration between civil society players and academic researchers in linguistics and computer science (automatic language processing, ergonomics). This project takes part in the development of technologies for language preservation. The project is based in Brittany, France, and is being developed in conjunction with the speakers of a Celtic language in serious danger of extinction according to UNESCO: Breton. The Dastum association (collecting, safeguarding and disseminating Breton oral heritage since 1972) and the scientific cluster (CNRS and three universities) are mobilizing three associations of teachers, and coordinating with the support of the Bretagne Numérique endowment fund.
YAR includes the provision of (i) a geolocated speech collection smartphone application for the public, (ii) a participative transcription web platform integrating sound and text, for use in educational contexts. The sound material collected will feed the participatory transcription platform designed with and for the pedagogical teams. The transcribed oral corpus will support the development and refinement of automated speech recognition (ASR) for Breton.
Actions to preserve linguistic diversity through the deployment of automatic language processing tools depend on their effective appropriation by speaking communities. Our fundamental research hypothesis is that this collective appropriation depends on the early involvement of users (speakers, learners and teachers) in the design of these technologies. The Breton ecosystem provides a suitable testing context for our hypothesis. The technical solutions envisaged meet the needs expressed by civil society: lack of teaching aids including sound; lack of a transcribed sound corpus preventing the development of voice recognition tools for locating (titling, indexing) existing sound files, automatic subtitling of audio-visual productions, or dictation of SMS messages; as well as learners' difficulty in socializing in the language. The geolocation function is adapted to the context (rural dimension of the territory, seasonal or permanent demographic flows). It provides a link and exchange support that mobilizes the various publics present in the ecosystem.
The software developed is open source, and is intended to be used for augmented data collection in any other language, minority or otherwise.
Project coordination
Mélanie Jouitteau (CENTRE DE RECHERCHE SUR LA LANGUE ET LES TEXTES BASQUES)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
IKER CENTRE DE RECHERCHE SUR LA LANGUE ET LES TEXTES BASQUES
Dastum
Bretagne Numérique
Help of the ANR 99,993 euros
Beginning and duration of the scientific project:
- 18 Months