Terminology Tag

Author: Dr. Jingyi Han, Machine Translation Scientist @ Language Weaver Introduction Nowadays, Byte Pair Encoding (BPE) has become one of the most commonly used tokenization strategies due to its universality and effectiveness in handling rare words. Although many previous works show that subword models with embedding layers in general achieve more stable and competitive results in neural machine translation (NMT), character-based (see issue #60) and Byte-based subword...

Read More

Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic

In many commercial MT use cases, being able to use custom terminology is a key requirement in terms of accuracy of the translation. The ability to guarantee the translation of specific input words and phrases is conveniently handled in Statistical MT (SMT) frameworks such as Moses. Because SMT is performed as a sequence of distinct...

Read More