Vocal Effort from Recognition to Synthesis – VERS
This project aims to tackle the problems linked to changes in vocal effort for the processing of voice and speech characteristics. Effort is a major source of variability in speech production (Liénard, 1999) but remains complex to define (as a function of distance, glottal airflow, etc.). One of its main products is a change in Voice Strength (VS), as defined by Liénard (2019): voice strength is the (C-weighted) sound pressure level (SPL) produced by a speaker, measured in free field at one meter in front of the speaker’s mouth. The vocal output during speech production results from complex interactions between subglottal, glottal, and supraglottal adjustments. Titze & Sundberg (1992) have shown that the effort required to produce a given voice strength is speaker-dependent, and permanent adjustments of muscular settings are necessary to produce any voice, whether it consists of a soft or a loud voice. The SPL is unfortunately lost in most recordings because of uncalibrated recording chains and inadequate microphones (Švec, 2018). However, listeners can estimate the original voice strength from spectral characteristics. This project aims to construct an estimation of voice strength for corpora that were not calibrated — typically for broadcast or datasets of spontaneous speech — but for which effort is a dominant dimension for explaining voice characteristics. Being able to have such a robust voice strength estimation and to modify by speech synthesis methods, the acoustic characteristics linked with voice strength in a given signal would be an important step to making progress in the linguistic understanding of face-to-face communication and for technologies of speech processing.
Project coordination
Marc EVRARD (Laboratoire Interdisciplinaire des Sciences du Numérique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LISN Laboratoire Interdisciplinaire des Sciences du Numérique
GIPSA-lab Grenoble Images Parole Signal Automatique
Service de la Recherche
Help of the ANR 244,994 euros
Beginning and duration of the scientific project:
December 2023
- 48 Months