Issue #91 – Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation
Author: Dr. Chao-Hong Liu, Machine Translation Scientist @ Iconic
Unsupervised Machine Translation (MT) is the technology that we use to train MT engines when parallel data is not used, at least not directly. We have discussed some interesting approaches in several previous posts for unsupervised MT (Issues #11 and #28) and some related topics (Issues #6, #25 and #66). Training MT engines requires the existence of parallel data, and these ideas are focused on how to do that with little or no parallel data. In this post, we review a two-step approach proposed by Pourdamghani et al. (2019), called translating “translationese”.
Translating with Translationese
The idea of the two-step approach is simple, whereby a dictionary is used to turn an input sentence into a “translationese”, a pseudo translation, and then an MT engine is trained with parallel data where the source side is also converted to translationese. For the dictionary, rather than using existing ones, Pourdamghani et al. (2019) decided to use automatically built dictionaries, following the approach proposed by Lample et al. (2018). A word-by-word translation is then performed using the automatically built dictionaries to turn input sentences into translationese.
The MT engine itself is a transformer model trained with parallel data which came from a diverse set of high-resource language pairs, where the source sides of these language pairs are replaced by translationese (from word-by-word translation via automatically built dictionaries). Therefore, there is only one model to translate all the translationese “sentences” into the target language.
Table 1. Translation results (in terms of BLEU scores) on Czech (cs), Spanish (es), Finnish (fi), Dutch (nl), Bulgarian (bg), Danish (da), Indonesian (id), Polish (pl), Portuguese (pt), and Catalan (ca). Excerpted from Pourdamghani et al. (2019).
Experiments and Results
The table shows the BLEU scores of the translationese and the resulting translation on ten language pairs. Pourdamghani et al. (2019) also show that the MT engine in general performed better on high-resource language pairs, e.g. French-to-English, and Russian-to-English. The benefits the approach brought about are:
1) using pseudo-translation (translationese) to minimize the complexity MT engines need to learn from the source language, and
2) training a single model to translate from translationese (which are automatically turned from various source languages) to the target.
Therefore, the MT engine itself can focus on learning to translate from translationese (a “kind” of target language) to the target language.
In this post we briefly reviewed a two-step approach to unsupervised MT proposed by Pourdamghani et al. (2019). The idea is simple: 1) turn the source sentence into pseudo-translation (translationese) using automatically built dictionaries, and 2) train an MT model to translate from translationese into the target language. At the moment, the results of the approach are better overall across many languages (14 language pairs) compared to other approaches. This might be due to the MT model being trained using parallel corpora from various language pairs, therefore the trained model is more robust across languages. However, for experiments on French-to-English and German-to-English, Lample et al. (2018) outperformed this approach. Adding more linguistic information in the building of dictionaries, or improving the mechanism to do pseudo-translation could potentially improve the performance for specific languages. The approach could also benefit from the methods developed for automatic post-editing (APE) for MT.