Blanc SHS 2 - Sciences humaines et sociales : Développement humain et cognition, langage et communication 2010

African lexicons : reference corpus, quantitative studies – RefLex

Submission summary

The RefLex project aims at testing a set of fundamental hypotheses concerning the structure and the evolution of African languages that are often mentioned in the literature, but whose validity was never demonstrated on an empirical basis. These hypotheses share the peculiarity that they can only be tested by means of a quantitative approach, which in turn presupposes the existence of a comprehensive documentation. The more than 2,200 languages spoken in Africa are characterized by great typological diversity, but also display some common characteristics, on each level of linguistic analysis, that go beyond the linguistic phyla and areas. So far, it has never been possible to conduct an in-depth study of these characteristics (e.g., logophoric pronouns, labiovelar consonants, etc.), due mainly to a lack of available data on the majority of African languages. Reflex solves this problem by fully exploiting the existing lexical documentation, which is in fact much larger than the grammatical documentation and yet often ignored in especially typological studies. One of the goals of RefLex is to make the scattered and hard to find lexical documentation available to interested researchers. Indeed, the lexical corpus of African languages, which will be available on line for the whole scientific community at the end of the project, will give immediate access to a considerable wealth of data (at least 2,000,000 lexical units for more than 1,000 languages). This corpus will allow dramatic progress in several domains: typology, phylogeny, lexical semantics, lexical spread, areal linguistics. RefLex will be the largest online comparative database worldwide. Moreover, the database will be different from other existing databases at two crucial levels: (i) the possibility to have a direct online access to the original documents which are the basis of the digital data, which makes this corpus a true reference corpus, allowing corrections, checking, argued feedback, replication and even falsifications; (ii) the setting up and the enrichment by the users of a “library” of computational tools for the scientific use of the data, which will also be unified to facilitate research, retrieval and comparisons.
The RefLex project thus conforms to the emerging domain of quantitative approaches to complex linguistic issues. It represents one of the very few projects based on data coming from various languages and the only one to enable easy manipulations of and experiments with the data itself.
This project, in which researchers from the LLACAN and DDL research units of the CNRS cooperate, gathers some 20 researchers: well-known specialists of several African language families, comparativists and typologists.

Project coordination

SEGERER Guillaume (Langage, Langues et Cultures d'Afrique Noire)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

DDL CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE - DELEGATION REGIONALE RHONE-AUVERGNE
LLACAN Langage, Langues et Cultures d'Afrique Noire

Help of the ANR 250,000 euros
Beginning and duration of the scientific project: - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter