Low Resourced Languages

Issue-16-Revisiting-synthetic-training-data-for-Neural-MT

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

In a previous guest post in this series, Prof. Andy Way explained how to create training data for Neural MT through back-translation. This technique involves translating monolingual data in the target language into the source language to obtain a parallel corpus of "synthetic" source and "authentic" target data - so called back-translation. Andy reported interesting findings whereby,...

Read More
Issue-6-Zero-Shot-Neural-MT

Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic

As we covered in last week’s post, training a neural MT engine requires a lot of data, typically millions of sentences in both languages which are aligned at the sentence level, i.e. every sentence in the source (e.g. Spanish) has a corresponding target (e.g. English). During a typical training, the system looks at these bilingual...

Read More
Issue-5-Creating-training-data-for-Neural-MT

Author: Prof. Andy Way, Deputy Director, ADAPT Research Centre

This week, we have a guest post from Prof. Andy Way of the ADAPT Research Centre in Dublin. Andy leads a world-class team of researchers at ADAPT who are working at the very forefront of Neural MT. The post expands on the topic of training data - originally presented as one of the "6 Challenges in...

Read More