Automatic Extraction of Geolinguistic Atlas Content and Spatial Analysis: application to Dialectology – ECLATS
Automatic Extraction of Geolinguistic Atlas Content and Spatial Analysis: application to Dialectology
The ECLATS project deals with the enhancement and the treatment of ancient maps, a historical and cultural heritage widely recognized as a very important source of information, but not easily to use. We focus on Linguistic Atlas of France (ALF), built between 1902 and 1910. This cartographical heritage produces first-rate data for dialectological researches.
Linguistic Atlas of France : a cartographic heritage to be exploited
Dialectology addresses the study of the linguistic features of languages having a strong oral tradition such as local dialects. These linguistic features are diverse : phonetic, morphosyntactic, lexical, semantic or prosodic. They are also context-dependant: they evolve according to (geographical) space, time, and socio-cultural environment. To study local dialects, corpuses of phonetic data have been transcribed into linguistic atlases. These atlases consist of a set of maps on which are registered, for a given lexical entry, the phonetic forms collected at several geographical points. Made up of 1900 maps, representing 639 points of inquiries, the ALF is a data corpus of 1.214.100 reliable lexical data, noted in a homogeneous way, collected from a single questionnaire, with details about places, dates, and circumstances. <br />While theoretical approaches used for the construction of linguistic atlas are structured, reliable and homogeneous, the processing and the phonetic analysis of data, as well as the elaboration of interpretative maps, are still performed manually, based on unstandardized approaches. Moreover, software solutions lack in geolinguistics: the dematerialization of old atlases is not systematic; the extraction of data from old paper maps is made manually and very time-consuming; the use of Geographic Information Systems and spatial analysis methods is undeveloped and lexical or interpretative maps are still hand-drawn. This delays both the processing and the diffusion of data, limits the cartographic production capacities, and impedes an efficient exploitation of geographical knowledge by researchers in dialectology.
The ECLATS project is a multidisciplinary project that gathers researchers in geomatics, computer science and geolinguistics. This global objective is the design and the development of innovative methods and tools for extracting and analysing the semantics and geographical data included in collection of old cartographic documents used by researchers in dialectology. More precisely, four goals will be pursued:
– The design and definition of new models and standard format(s) for geolinguistic data fostering interoperability between geolinguistic software.
– The development of new and innovative digitalization and character and symbol recognition techniques for the automatic extraction of old map contents. Such techniques will be tested on the ALF but should be applicable on any old map document.
– The development of storage methods of digitalized maps in order to improve their use and dissemination; the development of new and innovative digitalization and character and symbol recognition techniques for the automatic extraction of old map contents. Such techniques will be tested on the ALF but should be applicable on any old map document.
– The definition of a methodology implemented in a software suite dedicated to experts in geolinguistics, with advanced geovisualization and spatial analysis functionalities, for geolinguistic data processing. Such an (r)evolution is required and crucial in Geolinguistics for upgrading and optimizing both the production and the analysis of new interpretative atlases.
– Saving and dissemination a unique linguistic heritage and promoting a collaborative approach to for encouraging the sharing and the dissemination of geolinguistic data.
Partners of the project are research groups in Computer Science specialized in geomatics (LIG, Grenoble), in digitalization of ancient documents (LIRIS, Lyon), in automatic content extraction (Lyon and Li3, La Rochelle) and a research team in dialectology (Gispsa-lab, Grenoble).
CartoDialect: a Webmapping application to consult and explore the maps of Linguistics Atlas of France cartodialect.imag.fr/cartoDialect/.
Spatial interpolation Algorithms for qualitative data and cartographic tool to design interpretative geolinguisitic maps.
DialectoLOD: A webmapping application to explore dialectological data (phonetic phonetic, morphosyntactic, lexical, semantic data from ALF, ritamitsouko.imag.fr/dialectoLOD-1.0
Project in progress
Project in progress
The ECLATS project deals with the enhancement and the treatment of ancient maps, a historical and cultural heritage widely recognized as a very important source of information, but not easy to use. We focus on the Linguistic Atlas of France (ALF), built between 1902 and 1910. This cartographical heritage produces first-rate data for dialectological researches.
Dialectology addresses the study of the linguistic features of languages having a strong oral tradition such as local dialects. These linguistic features are diverse : phonetic, morphosyntactic, lexical, semantic or prosodic. They are also context-dependant: they evolve according to (geographical) space, time, and socio-cultural environment. To study local dialects, corpuses of phonetic data have been transcribed into linguistic atlases. These atlases consist of a set of maps where, for a given notion, linguistic forms are saved and collected at several geographical points. Made up of 1900 maps, representing 639 points of inquiries, the ALF is a data corpus of 1.214.100 reliable lexical data, noted in a homogeneous way, collected from a single questionnaire, with details about places, dates and circumstances.
While theoretical approaches used for the construction of linguistic atlas are structured, reliable and homogeneous, the processing and the phonetic analysis of data, as well as the elaboration of interpretative maps, are still manually achived, based on unstandardized approaches. Moreover, software solutions are missing in geolinguistics: the dematerialization of old atlases is not systematic; the extraction of data from old paper maps is made by hand and very time-consuming; the use of Geographic Information Systems and spatial analysis methods is undeveloped and lexical or interpretative maps are still hand-drawn. This situation delays both the processing and the diffusion of data and it also limits the cartographic production capacities, by impeding an efficient exploitation of geographical knowledge by researchers in dialectology.
The ECLATS project is a multidisciplinary project that will take 48 months. It gathers researchers in geomatics, computer science and geolinguistics. Its global purpose is the design and the development of innovative methods and tools for extracting and analysing the linguistic features and the geographical data included in collection of old cartographic documents used by researchers in dialectology. More precisely, four goals will be pursued:
– The design and definition of new models and standard format(s) for geolinguistic data fostering interoperability between geolinguistic software.
– The development of storage methods of digitalized maps in order to improve their use and dissemination; the development of new and innovative digitalization for the phonetic characters’ recognition techniques for the automatic extraction of old map contents. Such techniques will be tested on the ALF but should be applicable on any old map document.
– The definition of a methodology implemented in a software suite dedicated to experts in geolinguistics, with advanced geovisualization and spatial analysis functionalities, for geolinguistic data processing. Such an (r)evolution is required and crucial in Geolinguistics for upgrading and optimizing both the production and the analysis of new interpretative atlases.
– To save a unique linguistic heritage and to promote a collaborative approach for encouraging the sharing and the dissemination of geolinguistic data.
– Partners of the project are research groups in Computer Science specialized in geomatics (LIG, Grenoble), in digitalization of ancient documents (LIRIS, Lyon), in automatic content extraction (LIRIS & Li3,) and in geolinguistics and dialectology (Gipsa-lab, Grenoble).
Project coordination
Paule-Annick Davoine (Laboratoire d'Informatique de Grenoble)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIG Laboratoire d'Informatique de Grenoble
GIPSA-Lab Laboratoire Grenoble Images Parole Signal Automatique
L3I Laboratoire Informatique, Image et Interaction
INSA Lyon - LIRIS Institut National des Sciences Appliquées de Lyon - Laboratoire d'Informatique en Image et Systèmes d'Information
Help of the ANR 529,433 euros
Beginning and duration of the scientific project:
September 2015
- 48 Months