CE23 - Données, Connaissances, Big data, Contenus multimédias, Intelligence Artificielle

Learning to understand audio scenes – LEAUDS

Submission summary

Machines can now watch and interpret images, recognize speech and
music genres, yet they are hardly capable of understanding ambient
audio scenes, e.g., the sounds occuring in a kitchen in the morning or
the sounds occuring on a road nearby a vehicle. Today’s research
dealing with audio scene understanding is mostly limited to the
detection and recognition of audio events and audio context
classes. While such tasks are useful, the ultimate goal of audio scene
understanding goes far beyond the assignment of labels to sound
classes. Instead, it is to develop machines that fully understand
audio input. LEAUDS will make a leap towards this goal by achieving
breakthroughs in three intertwined directions that are essential for
next-generation audio understanding systems: detection of thousands of
audio events from little annotated data, robustness to ``out-of-the
lab'' applications, and language-based description of audio
scenes. Accordingly, LEAUDS will develop machine learning algorithms
able to learn from few weakly-labeled audio recordings and to discover
novel audio events. Robustness to real-world conditions is a key
challenge for applications beyond academic laboratories. LEAUDS will
address this issue through the prism of attention models, audio
enhancement and domain adaptation. Developing tools for manipulating
and composing sound events is the cornerstone for producing
higher-level semantic interpretations. This third challenge will be
addressed by learning models able to transform a sequence or graph of
audio events into a sentence. These scientific breakthroughs will be
implemented into a prototype of smart home security sensor featuring
audio understanding. While LEAUDS addresses issues related to audio
perception and machine learning, its outcomes are expected to
significantly impact domains such as home security, home care, urban
sensing, and context awareness for mobile devices.

Project coordination

Gilles Gasso (Laboratoire d'Informatique, du Traitement de l'information et des Systèmes)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

NETATMO
LITIS Laboratoire d'Informatique, du Traitement de l'information et des Systèmes
Inria Centre de Recherche Inria Nancy - Grand Est

Help of the ANR 546,518 euros
Beginning and duration of the scientific project: - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter