CE28 - Cognition, comportements, langage 2025

Gestural Speech Perception: learning to perceive speech by integrating gestural priors with deep learning – GeSPer

Submission summary

The overall objective of this trans-disciplinary project is to improve both our understanding of how humans learn to perceive speech and our ability to build machines that can perceive speech. This will be achieved in three steps.
First, we will leverage modern machine learning techniques to carefully link hypotheses about infants' learning mechanisms to systematic predictions regarding observable developmental trajectories of phonetic perception in different language environments.
This will enable us, for the first time, to draw conclusions about infants' learning mechanisms from the empirical record on such developmental trajectories. Furthermore, if the available empirical record does not suffice to decide between some of the learning mechanisms considered, we will leverage our approach to identify new decisive experiments. We hypothesize that this approach can uncover the mechanisms of early phonetic learning in finer details than had previously been possible.
Second, motivated by recent results establishing the activity of a complex language network in very young infants, which recruits ‘motor’ areas and is involved in speech perception before the emergence of speech-like vocalizations, we also aim to introduce and evaluate a novel theoretical account of speech perception development. According to this account, infants leverage the gestural nature of speech—-the fact that it results from a sequence of co-articulated biological motion gestures of a small number of relatively slow articulators—-as an effective inductive bias for purely perceptual learning. To evaluate this account, we will include it among the learning mechanisms considered in the first step. This will require a computationnally efficient implementation of the hypothesized mechanism to carry out the necessary simulations, which we will develop. We hypothesize that including such a ‘gestural’ inductive bias in learning mechanisms will significantly improve the fit to observed developmental trajectories of phonetic perception.
Finally, we will test whether including such a ‘gestural’ inductive bias in state-of-the-art unsupervised representation learning algorithms also benefits machine speech perception, including in terms of improved data-efficiency, robustness and transparency.

Thomas Schatz (UNIVERSITÉ AIX-MARSEILLE)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

LIS UNIVERSITÉ AIX-MARSEILLE
University of Maryland, College Park
IRIT UNIVERSITÉ DE TOULOUSE EPE

Help of the ANR 375,231 euros
Beginning and duration of the scientific project: January 2026 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.