terminology/vocabulary coverage Tag

Author: Dr. Jingyi Han, Machine Translation Scientist @ Language Weaver Introduction Nowadays, Byte Pair Encoding (BPE) has become one of the most commonly used tokenization strategies due to its universality and effectiveness in handling rare words. Although many previous works show that subword models with embedding layers in general achieve more stable and competitive results in neural machine translation (NMT), character-based (see issue #60) and Byte-based subword...

Read More