CONTINT - Contenus numériques et interactions 2013

Editing and Rendering for next generation of 3D sound – EDISON 3D

Editing and Rendering for next generation of 3D sound

The EDISON 3D project aims at helping the development of 3D audio content and technology by facilitating the adoption of upcoming 3D audio formats.<br />From channel based (audio channels assigned to loudspeakers intended to be located at defined positions, e.g. stereo or 5.1), there is a shift towards an object based description (audio scene described as sources with associated positions, independently of loudspeaker positions) leading to a modification of production and rendering techniques.

New tools and rendering method for 3D audio

A strong multi-disciplnary approach

The EDISON project 3D addresses cinq critical points:
1 Technological watch. Throughout the project, new and upcoming 3D audio coding formats will be analyzed to ensure compatibility of the developments with emerging standards.
2 Adaptation of existing input formats. There is a proliferation of input formats (mono, stereo, 5.1, Binaural, High Order Ambisonics, etc.) originating from new or existing contents. We propose a neutral description of 3D sound consisting in sound objects with associated spatial position. By means of recent methods for source analysis and separation and a strong interaction with the user, we will develop an adaptation tool giving the possibility to convert back and forth existing or upcoming audio input formats into this description.
3 New tools for space-time authoring in simple 3D audio production. The new generation of human-computer interfaces (gestural interaction, multitouch, 3D gestures «in the air«, tangible interaction) and data visualization techniques opens new possibilities. An intuitive, efficient and handy 3D audio production interface will be realized in close contact with the users. In contrast with existing solutions, this will allow sound engineers to perform a simple authoring of the 3D sound field.
4 Consumer focused 3D audio rendering solutions. We aim at offering a 3D sound rendering device that can easily be integrated «at home« with an excellent rendering quality. Such sound systems with a limited number of loudspeakers blocks allow lowering the positioning and wiring constraints that delayed the adoption of the 5.1 format and might again play against any comparable upcoming format.
5 An ongoing evaluation of the quality of work. Developments will be evaluated all along the project with user tests (sound rendering quality, usability, ergonomics, etc.) in an iterative process in order to gear the technical tasks with user feedback.

Results

1 Active participation to meetings at the EBU and in the MPEG-H 3D standardization committee (new associated partner B-Com to Edison 3D consortium). BCom and sonic emotion are gathering a state of the art report on mastering formats for the creation of audiovisual content. We compare the formats within an interpretation chart in relation to the needs of the project.
2 Exploitation of fusion approaches for source separation: combination of models of different orders leading to a significant improvement of separation quality [1]. Alternative strategy using neural networks for realizing a so-called “local fusion”. Modulation of fusion depending on an objective criterion dedicated to 3D rendering [2].
3 Definition of interactional challenges for the editing and manipulation of 3D curves and for the navigation within 3D scenes [3]. Definition of interactional needs from interviews with Radio France and Sonic Emotion Labs. Ongoing developments of a prototype combining 3D gestures for 3D curves editing and navigation with head movements.
Development of a 3D control and visualization interface for DJs.
4 Analysis of the radiation of a loudspeaker inserted in a parallelepiped. Comparison between numerical, analytical and measurement approaches for soundbar prototype from Sonic Emotion [5].
Study on the invariant of auditory elevation perception, creation of a simplified model and perceptual evaluation of perceived elevation as a function of the complexity of the model [4].
5 Hardware testing, installation and exploitation of a WFS sound reinforcement installation with Sonic Emotion Wave1 in studio 105 at Maison de la Radio. Installation of tests studio in studio 155 at Maison de la Radio and University of Western Brittany for sharing experiences on object oriented productions in 3D.

Prospects

Definition of a master format for 3D audio production, independent of the final encoding format. Synchronization with ongoing normalization initiatives.
PhD Thibaut Jacob, Télécom ParisTech IC2, Interaction and visualization of spatio-temporal signals, application to 3D sound.
Development of interfaces and control techniques for spatial characteristics of sound scenes: control plugins for object oriented production, 3D visualization interfaces, 3D trajectory monitoring and editing.
PhD Simon Leglaive, Télécom ParisTech AAO, Underdetermined sound source separation under reverberant conditions. Developments of methods providing high quality separation for spatial remixing or channel/object count increase for improving listening experience.
PhD Vincent Roggerone, Ecole Polytechnique, LMS, Characterization and control of the radiated sound field by a plurality of loudspeakers for three dimensional reproduction. Application to the reproduction of elevation using soundbars.
Ongoing recruitment of a post doctoral student at university of Western Brittany. Perceptual studies on auditory scene analysis and spatial unmasking, benefits of sound-vision coherence, rendering and perception of elevation.
Sound reinforcement of various musical genres using a WFS system in studio 105, sharing experience from sound engineers and continuous feedback on developed interfaces. Tests on object oriented production in studio 155 and interaction with all partners of the project.

Scientific productions and patents

[1] X. Jaureguiberry, E. Vincent, and G. Richard. Multiple-order non-negative matrix factorization for speech enhancement. In Proc. of Interspeech, 2014.
[2] X. Jaureguiberry, E. Vincent, and G. Richard. Variational Bayesian model averaging for audio source separation. In Proc. of IEEE Statistical Signal Processing Workshop (SSP), pages 33–36, 2014.
[3] Jacob, T., Bailly, G., Lecolinet, E., Foulon, R., and Corteel, E. Un Espace de Caractérisation Pour L’édition de Courbes à Trois Dimensions. ACM IHM'14, 2014.
[4] Pérotin, L. Etude des invariants de la perception auditive de l’e´le´vation et cre´ation d’un mode`le simplifie´ de filtrage e´quivalent. Rapport de stage de l’ Ecole Polytechnique. Stage réalisé à Sonic Emotion Labs, 2014.
[5] Li, Y. Analyse du rayonnement d'un haut-parleur monté sur une boîte parallélépipédique: comparaisons entre les approches numérique et analytique. Rapport de stage de l’université du Maine, stage réalisé à l’Ecole Polytechnique, 2014.

Submission summary

In recent years, the term 3D has been increasingly used in many areas: video games, stereoscopic image, etc. The 3D image has been accepted in cinema theaters but hardly succeeds in reaching the domestic environment (technologies not yet fully mature, wearing glasses difficult to envisage at home, etc.).
3D technologies in sound rendering are also developing with the global purpose of providing the user with increased sensations of space. However, sound does not address the same third dimension as image. Whereas image rendering focuses on depth in the axis of the screen (behind and in front), the extension of sound rendering mostly deals with the apparent height of the sources. Existing sound formats manage (i.e. code and reproduce) the azimuth and distance of the sources in the horizontal plane, with some restrictions (size of the listening area, limited depth behind the speakers).
Ongoing initiatives in standardization bodies (MPEG, ITU) and from the major players in the movie industry (Dolby, Barco, etc.) suggest that 3D audio content productions for audio and audiovisual consumer applications will emerge in the next years. However, there is no consensus yet on the tools for the creation of contents, the formats or the distribution methods of 3D sound. The EDISON 3D project aims at increasing knowledge and proposing technical tools that will help actors in the audio-industry to be as independent as possible from the various upcoming formats or methods. Our objective is to better understand the physical and perceptual relevant factors in a 3D audio production (e.g. recording, transformation, rendering) and to propose technical tools that adapt to one or the other upcoming format/method.
Audio solutions developed in the EDISON 3D project will help to respond to new but growing needs for 3D audio content production and rendering at home. The EDISON project 3D addresses six critical points:
• Technological watch. Throughout the project, new and upcoming 3D audio coding formats will be analyzed to ensure compatibility of the developments with emerging standards.
• Adaptation of existing input formats. There is a proliferation of input formats (mono, stereo, 5.1, Binaural, High Order Ambisonics, etc.) originating from new or existing contents. We propose a neutral description of 3D sound consisting in sound objects with associated spatial position. By means of recent methods for source analysis and separation and a strong interaction with the user, we will develop an adaptation tool giving the possibility to convert back and forth existing or upcoming audio input formats into this description.
• New tools for space-time authoring in simple 3D audio production. The new generation of human-computer interfaces (gestural interaction, multitouch, 3D gestures "in the air", tangible interaction) and data visualization techniques opens new possibilities. An intuitive, efficient and handy 3D audio production interface will be realized in close contact with the users. In contrast with existing solutions, this will allow sound engineers to perform a simple authoring of the 3D sound field.
• Consumer focused 3D audio rendering solutions. We aim at offering a 3D sound rendering device that can easily be integrated "at home" with an excellent rendering quality. Such sound systems with a limited number of loudspeakers blocks allow lowering the positioning and wiring constraints that delayed the adoption of the 5.1 format and might again play against any comparable upcoming format.
• An ongoing evaluation of the quality of work. Developments will be evaluated all along the project with user tests (sound rendering quality, usability, ergonomics, etc.) in an iterative process in order to gear the technical tasks with user feedback.
The EDISON 3D project helps the development of 3D audio contents and technology by facilitating the adoption of upcoming 3D formats throughout the production and sound reproduction chain.

Etienne CORTEEL (sonic emotion labs)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

se labs sonic emotion labs
IMT/TPT Institut Mines Telecom / Télécom ParisTech
LMS Ecole Polyecthnique, Laboratoire de Mécanique des Solides, UMR 7649
UBO Université de Bretagne Occidentale, Lab-Sticc UMR 6285 (Pôle CID équipe IHSEV)
RF Radio France, Direction de la Production et des antennes
LMS Délégation régionale IDF SUD

Help of the ANR 1,236,042 euros
Beginning and duration of the scientific project: October 2013 - 42 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.