End-to-End Neural Approaches for Speech Translation – ON-TRAC
ON-TRAC: Translate speech without transcribing it
The ON-TRAC project proposes to radically change the architectures currently used in speech translation by exploring end-to-end neural approaches.<br />By performing this task with a single deep neural network, it is possible to better optimize its performance compared to a cascade system which requires first to transcribe automatically and then translate this transcription.<br />With ON-TRAC, it becomes possible to translate without transcribing the source language.
Stakes and objectives
The ON-TRAC project aims to explore emerging technologies from the field of deep learning in order to design, implement, experiment and disseminate a new approach to automatic speech translation that overcomes the constraint previously exposed (no transcription of the speech in the source language). As a consequence, the development of automatic translation systems for oral dialects, but also for speech in general, would be greatly accelerated and its cost greatly reduced, offering the opportunity for greater reactivity and easier access to new means for students. state departments and businesses involved.<br /><br />The technologies developed in the ON-TRAC project will be tested on three language pairs, with written French as the systematic target language. The first pair of languages ??studied will be spoken English into written French for reasons of simplicity and for a better perception of the phenomena appearing during translation through the analysis of the outputs of our systems, English being sufficiently mastered by all the actors of the project. The Pashto language will be the source language for the second language pair. This choice is dictated by the fact that the processing of an oral dialect falls within the stated objectives of the project, and by the fact of a minimized collection cost since the consortium already has around a hundred hours of audio recordings. in Pashto, with their textual translations in French (as well as their transcription in Pashto). Finally, the third language pair will be the source language Tamacheq, an oral dialect spoken by the Tuaregs in different areas of interest for intelligence and security (Sahel, Niger, Mali, Burkina Faso, Libya ...).
The methodology followed within the framework of this project is classic for the exploration of a new approach related to the use of machine learning algorithms. By relying on deep neural network architectures that have been proven in other tasks, we will propose new formalisms for the direct translation from oral to written languages. We will implement them in order to experiment with them in order to achieve a level of quality sufficient for use in operational situations. This is why we will integrate these new technologies in demonstrators that the industrial partner of the project will be able to disseminate. We will prepare the training data by collecting the data necessary for the optimization of the neural networks for the targeted translation task. In order to minimize collection costs, we will use and enrich two existing corpora. A third corpus will be collected under real conditions, from scratch, and will concern an oral dialect reported to be of a very high level of interest for intelligence and security services. The risks of the project have been identified, and for each of them a fallback solution is envisaged.
The ON-TRAC project is not yet finished but some results are already significant.
Thus, the performance of these new purely neural systems with a single model reaches and sometimes exceeds the classic cascade-type systems which require the development of two distinct modules: an automatic speech recognition model and a translation module.
In addition, the ON-TRAC project was able to show the very positive impact of using continuous representations of speech calculated by neural models in a self-supervised manner for the processing of language pairs with little digital resources.
It is highly probable that the performance of the technology developed in the ON-TRAC project exceeds that of the systems currently in the state of the art.
It will not only be possible to improve speech translation for language pairs well endowed with training data, but also to accelerate the deployment of translation systems aimed at processing spoken dialects.
Nguyen, H., Tomashenko, N., Boito, M. Z., Caubrière, A., Bougares, F., Rouvier, M., ... & Estève, Y. (2019). ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task. IWSLT 2019
Nguyen, H., Bougares, F., Tomashenko, N., & Estève, Y. (2020). Investigating self-supervised pre-training for end-to-end speech translation. Interspeech 2020
Elbayad, M., Nguyen, H., Bougares, F., Tomashenko, N., Caubrière, A., Lecouteux, B., ... & Besacier, L. (2020). ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020. IWSLT 2020
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems, Natalia Tomashenko, Christian Raymond, Antoine Caubrière, Renato De Mori, Yannick Estève, ICASSP 2020, May 2020, Barcelona, Spain
Error analysis applied to end-to-end spoken language understanding, Antoine Caubrière, Sahar Ghannay, Natalia Tomashenko, Renato De Mori, Antoine Laurent, Emmanuel Morin, Yannick Estève, ICASSP 2020, May 2020, Barcelona, Spain
A data efficient end-to-end spoken language understanding architecture,
Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier, ICASSP 2020, May 2020, Barcelona, Spain
The ON-TRAC project proposes to radically change the architectures used currently in speech translation. It is based on end-to-end neural models for machine translation and focuses on light and portable speech translation applications that Airbus is developing for security operations in theaters of operation.
Beyond the study of end-to-end approaches based on language pairs associated with large-scale learning data, ON-TRAC will study the development of models for poorly endowed oral or dialect languages.
An end-to-end approach to speech translation as we envision it would allow us to review the methodology of data collection for the development of a speech translation system.
Indeed, with this approach, a transcription of the source language becomes unnecessary: ??the cost of producing the data needed to learn a speech translation system is therefore greatly reduced and the development of such a system for new languages ??(including those without a writing system) would be facilitated and accelerated.
Since the project targets portable translation applications, ON-TRAC is also interested in studying the computational time and memory footprint required for neuronal translation of speech.
ON-TRAC will allow the processing of three pairs of distinct languages ??with increasing operational, security and defense interest and difficulty (English-French, French-Paschal, French-Tamacheq).
The ON-TRAC project is part of Axis 4 "Data, Knowledge, Big Data, Multimedia Content, Artificial Intelligence" of Challenge 7 "Information and Communication Society" of the 2018 Action Plan of the ANR.
By its main scientific theme dedicated to the translation of speech through end-to-end neural approaches, it is clearly positioned in the themes '' Data to knowledge '' and '' Treatment of multimedia content ''.
The technologies developed in the ON-TRAC project will be tested on three language pairs, with written French as a systematic target language.
The first pair of languages ??studied will be spoken English to written French for the sake of simplicity and for a better perception of the phenomena manifested during the translation through the analysis of the outputs of our systems, the English being sufficiently mastered by all the actors of the project.
The pashto language will be the source language of the second language pair. This choice is dictated by the fact that the treatment of an oral dialect is part of the project's stated objectives, and because of a minimized cost of collection since the consortium already has about 100 hours of audio recordings in pashto, with their textual translations in French (as well as their transcription in pashto).
Finally, the third language pair will have for its source Tamacheq, an oral dialect spoken by the Tuaregs in different areas of interest for intelligence and security (Sahel, Niger, Mali, Burkina Faso, Libya ...). As such, it is of great interest and has already been expressed by the State services concerned.
Project coordination
Yannick ESTÈVE (Laboratoire Informatique d’Avignon)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
LIUM LABORATOIRE D'INFORMATIQUE DE L'UNIVERSITE DU MANS (LIUM)
UGA Université Grenoble Alpes
ADS AIRBUS DEFENCE AND SPACE SAS
LIA Laboratoire Informatique d’Avignon
Help of the ANR 599,998 euros
Beginning and duration of the scientific project:
- 36 Months