CE23 - Intelligence artificielle et science des données 2023

Deep generative and inference models for weakly-supervised speech enhancement – DEGREASE

Submission summary

Remote human interaction and human-machine interaction require reliable speech-processing technologies that can work in unconstrained real-world acoustic conditions. Speech recordings are inevitably contaminated by interfering sound sources and by the presence of reverberation. Whether for human or artificial listening, speech enhancement algorithms are necessary to improve speech quality and intelligibility. The vast majority of current algorithms rely on the use of deep neural networks trained in a supervised manner, using a dataset of noisy speech signals labeled with the corresponding clean-speech reference signals. Given the impossibility of acquiring such data in real conditions, datasets are artificially generated by creating synthetic mixtures of isolated speech and noise signals. However, the performance of supervised algorithms drops drastically when these synthetic data differ from the real conditions of use. The current trend is to create larger and larger synthetic datasets, in the unrealistic hope of covering all possible acoustic conditions. In contrast, the DEGREASE project proposes a weakly-supervised learning framework with the aim of developing more flexible, robust and ecologically-valid algorithms that can be trained on real unlabeled data and that are able to adapt to new acoustic conditions. At the crossroad of audio signal processing, probabilistic graphical modeling, and deep learning, we propose a deep generative learning methodological framework for multi-microphone speech signals, which combined with amortized variational inference techniques will allow models to be trained efficiently in a weakly-supervised manner.

Simon Leglaive (Institut d'Electronique et des Technologies du numéRique (IETR))

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

EMOBOT
Universität Hamburg
GIPSA-lab Grenoble Images Parole Signal Automatique
IETR Institut d'Electronique et des Technologies du numéRique (IETR)
LTCI Laboratoire Traitement et Communication de l'Information
Centre Inria de l’Université Grenoble Alpes

Help of the ANR 274,151 euros
Beginning and duration of the scientific project: March 2024 - 42 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.