Terminology / Vocabulary coverage

Author: Dr. Jingyi Han, Machine Translation Scientist @ Language Weaver Introduction Nowadays, Byte Pair Encoding (BPE) has become one of the most commonly used tokenization strategies due to its universality and effectiveness in handling rare words. Although many previous works show that subword models with embedding layers in general achieve more stable and competitive results in neural machine translation (NMT), character-based (see issue #60) and Byte-based subword...

Read More
NMT 135 Recovering Low-Frequency Words in Non-Autoregressive Neural MT

Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction Non-Autoregressive Translation (NAT), in which the target words are generated independently, is raising a lot of interest because of its efficiency. However, the assumption that target words are independent of each other leads to errors which affect translation quality. In this post we take a look at a paper by Ding et al. (2021) which confirms...

Read More
Using annotations for machine translating Named Entities

Author: Dr. Carla Parra Escartín, Global Program Manager @ Iconic Introduction Getting the translation of named entities right is not a trivial task and Machine Translation (MT) has traditionally struggled with it. If a named entity is wrongly translated, the human eye will quickly spot it, and more often than not, those mistranslations will make people burst into laughter as machines can, seemingly, be very creative. To a...

Read More