DS0707 - Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance

Phonetic Articulatory Synthesis – ArtSpeech

Submission summary

The objective is to synthesize speech from text via the numerical simulation of the human speech production processes, i.e. the articulatory, aerodynamic and acoustic aspects.

Corpus based approaches have taken a hegemonic place in text to speech synthesis. They exploit very good acoustic quality speech databases while covering a high number of expressions and of phonetic contexts. This is sufficient to produce intelligible speech. However, these approaches face almost insurmountable obstacles as soon as parameters intimately related to the physical process of speech production have to be modified. On the contrary, an approach which rests on the simulation of the physical speech production process makes explicitly use of source parameters, anatomy and geometry of the vocal tract, and of a temporal supervision strategy. It thus offers a direct control on the nature of the synthetic speech.

The project is organized in 5 work packages:
1. Aerodynamic and acoustic simulations so as to produce a speech acoustic signal from the knowledge of the transversal area at any point of all the cavities of the vocal tract,
2. Source and coordination scenarios so as to coordinate sources together with the temporal evolution of the vocal tract, which is crucial for the production of consonants in order to ensure their identification by human listeners,
3. Supervision of the temporal evolution of the vocal tract geometry so as to anticipate the production of upcoming sounds and generate realistic articulatory gestures,
4. Acquisition of speech production data essential to know the vocal fold activation, aerodynamic parameters, and the geometrical shape of the vocal tract (via MRI at a high sampling rate),
5. General architecture to incorporate the different levels and synthesize an acoustic signal from the text.

The development of realistic simulations of the speech production processes will be a key asset to understand the respective contributions of the anatomical characteristics, the coordination capabilities, and the control of the vocal folds in the resulting speech signal. The scope of this project goes far beyond the comprehension of speech production phenomena and concerns phonetics, motor control, and within the domain of automatic speech processing, at least text to speech synthesis.

There is a number of applications. They concern situations in which standard text-to-speech synthesis is not well suited as foreign language learning or language acquisition. This project also opens new perspectives in the domain of expressive speech synthesis, and thus within the framework of conversational agents. In the medical field applications involve MRI acquisition protocols offering a high sampling rate applicable to organs which deform quickly over time, speech production pathologies, or evaluating the impact of surgery on the vocal folds or vocal tract.

We firmly believe that ArtSpeech will realize scientific and major scientific and technical advances, and will demonstrate the interest of the physical approach whether to open new research perspectives, or develop highly innovative applications in the domain of speech production in the broadest sense.

The consortium consists of four remarkably complementary research teams with leading international theoretical and practical experiences in the domains of:
• aerodynamic and acoustic simulation of speech production, and modeling of the source and the geometry of the vocal tract,
• magnetic resonance imaging and other acquisition techniques of speech production data.

Project coordination

Yves Laprie (Laboratoire Lorrain de Recherche en Informatique et ses applications - UMR 7503)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LORIA Laboratoire Lorrain de Recherche en Informatique et ses applications - UMR 7503
Gipsa-lab Grenoble Images Parole Signal Automatique - UMR 5216
LPP Laboratoire de phonétique et phonologie - UMR 7018
IADI IMAGERIE ADAPTATIVE DIAGNOSTIQUE ET INTERVENTIONNELLE - INSERM U947

Help of the ANR 500,116 euros
Beginning and duration of the scientific project: September 2015 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter