Data creation

NMT 135 Recovering Low-Frequency Words in Non-Autoregressive Neural MT

Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction Non-Autoregressive Translation (NAT), in which the target words are generated independently, is raising a lot of interest because of its efficiency. However, the assumption that target words are independent of each other leads to errors which affect translation quality. In this post we take a look at a paper by Ding et al. (2021) which confirms...

Read More
NMT 94 Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation

Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic Introduction Curating corpora of quality sentence pairs is a fundamental task to building Machine Translation (MT) systems. This resource can be availed from Translation Memory (TM) systems where the human translations are recorded. However, in most cases we don’t have TM databases but comparable corpora, e.g. news articles of the same story in different languages. In this post,...

Read More
Issue-28-–-Hybrid-Unsupervised-Machine-Translation

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

In Issue #11 of this series, we first looked directly at the topic of unsupervised machine translation - training an engine without any parallel data. Since then, it has gone from a promising concept, to one that can produce effective systems that perform close to the level of fully supervised engines (trained with parallel data). The...

Read More
Issue-21-Revisiting-Data-Filtering-for-Neural-MT

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

The Neural MT Weekly is back for 2019 after a short break over the holidays! 2018 was a very exciting year for machine translation, as documented over the first 20 articles in this series. What was striking was the pace of development, even in the 6 months since we starting publishing these articles. This was...

Read More
Issue-16-Revisiting-synthetic-training-data-for-Neural-MT

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

In a previous guest post in this series, Prof. Andy Way explained how to create training data for Neural MT through back-translation. This technique involves translating monolingual data in the target language into the source language to obtain a parallel corpus of "synthetic" source and "authentic" target data - so called back-translation. Andy reported interesting findings whereby,...

Read More
Issue-5-Creating-training-data-for-Neural-MT

Author: Prof. Andy Way, Deputy Director, ADAPT Research Centre

This week, we have a guest post from Prof. Andy Way of the ADAPT Research Centre in Dublin. Andy leads a world-class team of researchers at ADAPT who are working at the very forefront of Neural MT. The post expands on the topic of training data - originally presented as one of the "6 Challenges in...

Read More