Robust and Efficient Deep Learning based Audiovisual Speech Enhancement – REAVISE
Speech enhancement is a fundamental problem in signal processing that aims to improve the quality and intelligibility of a speech signal recorded in a noisy environment. This is of paramount practical importance, e.g. for automatic speech recognition systems and hearing assistive devices. While human speech perception involves both audio and visual modalities (lip movements), the majority of speech enhancement algorithms exploit only the audio modality. Audiovisual speech enhancement (AVSE) aims at incorporating the complementary information provided by the visual modality, which is less affected by acoustic noise, to further improve the performance of speech enhancement, especially in challenging acoustic environments. AVSE methods fall into two categories: supervised or unsupervised, depending on whether a parallel corpus of clean and noisy audiovisual speech is used for training or not.
Supervised AVSE approaches involve noise-aware training, with diverse acoustic and visual noise instances to generalize well. So, they lead to complex networks with a huge amount of parameters. They are also missing a systematic way to handle acoustic and visual noises. On the other hand, unsupervised AVSE approaches are based on noise-independent training, leading to more compact models, with more potential for generalization and robust learning. Nevertheless, as opposed to supervised AVSE, they have been significantly less explored.
In this context, the general objective of REAVISE is to make a leap towards developing a unified AVSE framework that recovers an intelligible, high-quality speech signal with low computational power and independently of the noise environment. This will be achieved by bridging the gap between the supervised and unsupervised AVSE approaches, benefiting from the best of both worlds. The proposed methodology achieves this by combining recent advances in statistical machine learning, numerical optimization, and state-of-the-art deep learning techniques.
Project coordination
Mostafa SADEGHI (Centre de Recherche Inria Nancy - Grand Est)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
INRIA Centre de Recherche Inria Nancy - Grand Est
Help of the ANR 290,636 euros
Beginning and duration of the scientific project:
March 2023
- 42 Months