CE23 - Intelligence artificielle

End-To-end Evolutive Neural network for Speaker Recognition – ExTENSoR

Submission summary

ExTENSoR proposes fundamental research that aims to explore the potential of end-to-end and automatically learned / evolutive artificial neural networks for the automatic processing and classification of speech signals. ExTENSoR will investigate their use as an alternative to hand-crafted features and network topologies that characterise the current state of the art in many fields of speech processing. ExTENSoR also aims to bring new insights into what information in speech signals is being used by networks, with either automatic or hand-crafted topologies, in order to arrive at the scores or decisions they produce.
ExTENSoR has a focus on fundamental research and on the use of automatically learned, end-to-end networks for the processing of speech signals in a generic sense but it also includes an evaluation component in the form of applied research within the context of automatic speaker recognition and anti-spoofing, two fields of speech processing research showing burgeoning interest in end-to-end, evolutive learning.

Nowadays almost all state-of-the-art approaches to ASV employ deep neural networks and similarly inspired techniques in at least one component of what is today an ever-increasingly complex ASV architecture. Of course, this view of progress in the field over the last three decades is a greatly simplified one that neglects many intervening steps and some other major milestones. What is clear, though, even from this simplified view, is that today’s state-of-the-art approaches to ASV encode less and less what we humans think we already know. Today’s algorithms are less and less founded upon our understanding of, for example, speech production, speech perception, what make the most informative features, what information is most discriminative of different speakers, how this information should be captured, represented or modelled and how decisions should be made in some optimal sense. Instead, an increasing number of components of a state-of-the-art approach to ASV are being replaced with components learned automatically.

Other aspects of today’s neural network solutions include their tremendous complexity and the black box optimisation paradigm that have left researchers somewhat in the dark concerning their behaviour or, more precisely, with a feeling for how or why they arrive at the scores or decisions that they produce.

While the hierarchical nature of deep neural networks offers some potential to investigate the higher-level concepts being learned, the complexity is such that there is often a risk of overfitting; the size of models (in terms of connexions) is sometimes such that there is potential for very deep neural networks to learn precisely the input-output relationships in a given dataset, relationships that do not transpose easily to unseen data. It is the applicants’ hypothesis that reducing network size holds the key to tackling the time-old problem of overfitting while, at the same time, offering a better chance to investigate what neural network solutions are actually learning; the behaviour of networks with orders of magnitude fewer nodes and connections is inherently easier to interpret.

ExTENSoR objectives are thus to:

- enable real end-to-end processing of speech by jointly optimising the processing chain from the
raw signal to the decision / score;
- evaluate the longer term potential of using end-to-end and automatically learned / evolving
artificial neural networks for speech processing tasks;
- investigate whether the resulting networks offer a better chance to bring new insights into what
information in speech signals is being used in order to derive the decisions or scores they
produce;
- given the expertise of the two partners involved, to demonstrate successful E2E, evolutive
techniques for automatic speaker recognition and anti-spoofing.

Anthony Larcher (LABORATOIRE D'INFORMATIQUE DE L'UNIVERSITE DU MANS (LIUM))

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

EURECOM EURECOM
LIUM LABORATOIRE D'INFORMATIQUE DE L'UNIVERSITE DU MANS (LIUM)

Help of the ANR 333,718 euros
Beginning and duration of the scientific project: December 2019 - 24 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.