Development of analysis tools for automatic indexing and enrichment of sound archives, related needs/objectives of ethnomusicologists, ethnolinguists and archivists.
Oral archives from the CNRS-Musée de l’Homme build a unique recordings collection of a great historical importance, containing more than 5000h. A powerful and collaborative web platform will ensure its management and ease its consultation thanks tools for audio content analysis. <br /> <br />The challenge is to make this intangible world heritage accessible and scientifically exploitable. Thanks to innovative technologies recently developed in the audiovisual sector, especially in the sound processing, it is possible to position itself at an international level, highlighting in a concrete way the cultural heritage to a wide audience, while exposing the great technological expertise of our CNRS teams. Moreover, in the current European context of data sharing, the distribution of our cultural heritage, through innovative technologies, is particularly relevant. <br /> <br />The countries of origin of these recordings (in Africa, Europe, Asia, America ...) also have legitimate expectations on a «return« from their ancestral and often endangered heritage. They rely heavily on technology transfer for the operation and the promotion of this intangible heritage, which represents their history and cultural identity, and is therefore an important issue for the people. <br /> <br />The ongoing digitizing process, lengthy and expensive, must be valorised and easily accessible in order to benefit the greatest number: French and foreign researchers, students, musicians, music lovers, cultural minorities, etc. The computer tools developed by our specialists will make these documents more visible and attractive while meeting the requirements of the scientific community. The multidisciplinarity of the project proposes much differentiates possibilities of document analysis than those offered by the tools used to date. They improve the indexing during archiving and facilitate access to a wide audience through the availability of these tools directly online on the web audio platform.
The project is the fruit of many discussions between all partners from different fields aiming at identifying the main challenges. The methodology used is as follows:
1. developing signal processing and pattern recognition tools for the purpose of structuring and indexing a sound document or a collection of documents. First, relevant sound categories are defined and acoustically characterized. Then, the segments of interest are identified by the specialists. Different degrees of structuring are then coordinated: for example, musical areas should be extracted before identifying the musical instrument.
2. the integration of these tools on the Telemeta platform. This work has two aspects: IT integration of the tools including the definition of formats, programming specifications, but also the definition and implementation of the user interface. This interface is defined and implemented so that computation batches can be configured and planned by the users from their standard web browser.
3. validation and evaluation of all tools and interfaces through scenarios. This task is crucial because the tools and indexes are meaningless unless they meet the needs of users and unless the defined and detected sound event categories refer to cultural realities specific to the analysed recordings. Scenarios have to be defined based on the user profiles, the proposed analysis tools have to be regularly validated at each stage of the development, and researchers specialized in the related geocultural area need to be involved for this validation. Solution for sharing the test results should also be provided, knowing that the interpretation of a given document may depend on the user.
In ethnomusicology, categories have been determined in the domain of the voice enunciation situated in a continuum between singing and speaking. They have then been analysed acoustically in order to provide an objectively sound description.
Most of the indexing tools provided through the project have been packaged together inside a plugins library (licence GPL). This library named Timeside-Diadems is an optional plugins module of the Timeside framework which is the audio processing engine of the Telemeta framework. It is freely usable and publicly available through the Github web platform.
The tools concern the identification specific technical sounds, of the presence of the human voice, the instrumental type, rhythmic similarity between different items and the distinction between song and speech as well as between monody and polyphony.
Many corpora were studied:
• the African music corpus of the MNHN which has partly been digitized during the project: 30 magnetic tapes and 40 DAT cassettes, as well as the integration of 800 items.
• the CNRS-Musée de l’Homme corpus: the tools allowed for an indexation of several collections digitized by the BNF (3300 items).
• the LESC corpus of Mayan ritual speech (Mexico, Guatemala): about 50 hours on various media have been digitized.
• development of new questions in anthropology by the confrontation with unusual tools and methods: use of sound archives for the comparative study of musical heritage, definition of scientific or endogenous categories through open-access digital platforms, etc.
• sensibilization of the scientific community of anthropologists to the new approach of the multimodal nature of voice and speech through publications and presentations at conferences of the worldwide network of ethnomusicology, ICTM (Internat. Council for Trad. Music).
The sound indexing tools developed within this project were of course presented through communications in conferences and journal publications (ICASSP, ISMIR ...).
See previous sections.
The DIADEMS project led to 33 deliverables with many multidisciplinary works: 11 multi-partner publications out of 34. More than 20 communications have presented the project in the different scientific communities of the consortium.
The Laboratoire d’Ethnologie et de Sociologie Comparative (LESC) including the Centre de Recherche en Ethnomusicologie (CREM) and the center d’Enseignement et de Recherche en Ethnologie Amérindienne (EREA) as well as the Laboratoire d’Eco-anthropologie du Muséum National d’Histoire Naturelle (MNHN) are dealing with the need to index the audio archives they manage, while keeping track of the contents, which is a long, fastidious and expensive task.
During the CNRS interdisciplinary summer school (Science et Voix 2010), a common interest has risen between acousticians, ethnomusicologists, and computer scientists: there nowadays exist advanced audio analysis tools, developed by indexing specialists (acousticians and computer scientists) that can provide easier content access and indexing.
The context of this project is to index and to improve the access to the LESC audio archives: the CREM data and the EREA data on the Maya « singing/speaking » distinction, as well as the MNHN data (traditional African music). Since 2007. as no open-source application exists on the market on how to access to the audio data recorded by researchers, the CREM-LESC, the LAM and the sound archives of the MMSH began the conception of a innovative and collaborative tool that answers the trade needs (linked to the documents temporal span), while being adapted to the researchers requirements. With financial support from the CNRS Très Grand Equipement (TGE) ADONIS and the Ministry of Culture, the Telemeta platform, developed by PARISSON, is online since May 2011: archives.crem-cnrs.fr .
On this platform, basic signal analysis tools are already available. It is however mandatory to have a set of advanced and innovative tools for automatic or semi-automatic indexing of this audio data, that includes sometimes long recordings, with quite heterogeneous content and quality.
The aim of the DIADEMS project is to supply some of these tools, to integrate them into Telemeta, while bearing with the user needs. This implies a complementarity of the scientific objectives of each partner:
• for the technology providers, IRIT, LIMSI, LaBRI and LAM, the aim is twofold :
o To provide existing technologies, such as speech and music detections, speakers segmentation. These tools aim at extracting homogeneous segments of interest for the users. These systems have been regularly tested during numerous (inter)national evaluation campaigns, with increasingly difficult contexts. However none of these campaigns contains such diversity as there exist in the audio archives studied in this project. This heterogeneity is linked to the recording conditions, to the kind of the documents, as well as their geographical origin. The challenge for all of these « state-of-the-art » systems is therefore to adapt them to the users needs.
o To propose new tools for exploring the contents of the homogeneous segments. The research on the singing/speaking voice opposition, the singing voice, the singing turns and the musical similarity are not achieved yet.. A real research study on defining relevant features and how to take them into account has still to be carried out. To be able to interact with musicologists and ethnomusicologists is a major advantage in this context.
• For ethnomusicologists and musicologists, the aims are different, depending on the usage:
o For documentalists, the aim is to learn to use the tools and to add their practical knowledge in order to adapt them to their indexing needs. An important exchange must take place between the tool provider, the integrator and the user. The focus must be put on the visualisation of the processing results, which should provide a useful help for indexing.
o For ethnomusicologists and musicologists, the aim is beyond the indexing capabilities of the tools. There should therefore be exchanges with the technology providers to define which the most relevant information retrieval tools are.
Monsieur Julien PINQUIER (Institut de Recherche en Informatique de Toulouse) – firstname.lastname@example.org
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
EAE Museum National d'Histoire Naturelle
LAM LAM - Institut Jean le Rond d'Alembert
LIMSI CNRS Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
LaBRI Laboratoire Bordelais de Recherche en Informatique
LESC Laboratoire d’Ethnologie et de Sociologie Comparative
IRIT Institut de Recherche en Informatique de Toulouse
Help of the ANR 656,678 euros
Beginning and duration of the scientific project: December 2012 - 36 Months