DS0707 - Interactions des mondes physiques, de l'humain et du monde numérique

Advanced quality methods for post-edition of machine translation – KEHATH

Submission summary

The translation community has seen a major change over the last five years: machine translation has become good enough so that it has become advantageous for translators to post-edit it rather than translate from scratch. This is due to recent progress in statistical machine translation, that is, the training of a translation engine with a corpus of existing translations. Current enhancement of machine translation (MT) systems from human post-edition (PE) of raw outputs are somewhat efficient yet rather basic: the post-edited output is added to the training corpus and the translation model and language model are re-trained, with no clear view of how much has been improved and how much is left to be improved. In this approach, only the final PE result is used, no other user feedback on the raw MT quality is provided, such as the cognitive processes of the post-editor or the logging of the post-edition actions he has performed. The KEHATH project intends to address these issues in two ways:
Firstly, leverage advanced machine learning (ML) techniques in the MT+PE loop. Our goal is to boost the impact of PE, that is, reach the same performance with less PE or better performance with the same amount of PE. In other words, we want to improve machine translation learning curves. For this purpose, active learning and reinforcement learning techniques will be proposed and evaluated. In the industrial context of KAHATH, we will have to face challenges such as MT systems heterogeneity (statistical and/or rule-based), and ML algorithms scalability to improve a domain-specific MT.
Secondly, quality prediction (QP) on MT outputs is crucial for translation project managers. We have developped over the years a number of confidence estimation and error detection techniques in the laboratory and we will implement and evaluate them in real-world conditions. A shared concern will be to work on continuous domain-specific data flows to improve both MT and the performance of indicators for quality prediction.
The overall goal of the KEHATH project is straightforward: gain additional machine translation performance as fast as possible in each and every new industrial translation project, so that post-edition time and cost is drastically reduced. Basic research is the best way to reach this goal, for an industrial impact that is powerful and immediate.

Project coordination

François Brown De Colstoun (Lingua et Machina)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LIG Laboratoire d'Informatique de Grenoble
LIFL UNIVERSITE LILLE I
L&M Lingua et Machina

Help of the ANR 498,844 euros
Beginning and duration of the scientific project: September 2014 - 42 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter