Knowledge

NMT 97 Target Conditioned Sampling Optimising Data Selection for Multilingual Neural Machine Translation

Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic Introduction It is known that neural machine translation (NMT) is particularly tricky in the case of low-resource languages. Thus, it is not surprising that researchers are actively investigating how to improve the performance on NMT systems for low-resource languages and many approaches are currently being explored. In issue #88 of our blog we reviewed a method to use...

Read More
Using annotations for machine translating Named Entities

Author: Dr. Carla Parra Escartín, Global Program Manager @ Iconic Introduction Getting the translation of named entities right is not a trivial task and Machine Translation (MT) has traditionally struggled with it. If a named entity is wrongly translated, the human eye will quickly spot it, and more often than not, those mistranslations will make people burst into laughter as machines can, seemingly, be very creative. To a...

Read More
NMT 95 Constrained Parameter Initialisation for Deep Transformers in Neural MT

Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction As the Transformer model is the state of the art in Neural MT, researchers have tried to build wider (with higher dimension vectors) and deeper (with more layers) Transformer networks. Wider networks are more costly in terms of training and generation time, thus they are not the best option in production environments. However, adding encoder layers...

Read More
NMT 94 Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation

Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic Introduction Curating corpora of quality sentence pairs is a fundamental task to building Machine Translation (MT) systems. This resource can be availed from Translation Memory (TM) systems where the human translations are recorded. However, in most cases we don’t have TM databases but comparable corpora, e.g. news articles of the same story in different languages. In this post,...

Read More