Language Challenge #2 – German

Language Challenge #2 – German


Sprechen sie Deutsch

Quick Facts

  • Official language status in 7 European countries – Germany, Austria, Switzerland, Belgium, Luxembourg, Liechtenstein, and Italy (South Tyrol).
  • German is the most widely spoken native language in the EU.
  • Obviously enough, it’s a member of the Germanic family of languages, with origins in Northern Europe.
  • The Goethe-Institut, a body of the German government named after famous writer Johann Wolfgang von Goethe, promotes the study of the language across the world.
  • Famous quote originating in German: Das Einzige, was mich beim Lernen stört, ist meine Bildung. (“The only thing that interferes with my learning is my education.”) – Einstein
  • Characteristics

    German uses the same alphabet as English in addition to three vowels with umlauts (ä, ö, ü) and the consonant Eszett (ß)

    For simple statements, German follows subject-verb-object word order. However, in subordinate clauses the verb occurs in the final position, while the verbs in the infinitive occur after the object. In other cases, word order can be relatively free because of the morphology of the language.

    German is a highly inflected language. Nouns are inflected for 4 cases (nominative, dative, accusative, and genitive), 3 genders (masculine, feminine, and neutral), and 2 numbers (singular and plural). In total, there are 8 inflectional endings and 12 different forms of definite and indefinite articles. All German nouns are capital letter at the start and nouns can be joined (compounded) to create new nouns, e.g. Weltmeister (“world champions”).

    Like English, German has strong and weak verbs inflected for person, number, and tense/mood. Sometimes, past participles are conjugated using the prefix ge-.

    German and MT

    In general, German is significantly more difficult for Machine Translation than French (which we examined in last week’s post) and other Romance languages.

    Word order is a challenge, particuarly in cases where the verb is placed at the end of the sentence like in the example below. Here we see that while the main verb – “like” – remains in the same position, in the subordinate clause the verb “scored” (highlighted in red) is moved to the final position.

    MT often puts these words in the wrong position in the translation. In order to overcome this, we try to identify the verb before translation and move it to where we think it should be positioned in German. This process, called “pre-ordering“, allows for a more one-to-one translation.

    German alignment

    Compound words also present a problem because, from an MT perspective, the compound is a totally new word. Even if we have Welt and Meister in our terminology, it doesn’t necessarily mean we can automatically translate Weltmeister. The solution depends on the translation direction. If we are translating from German into English, such compounds are identified and split into their constituent words prior to translation. However, this task can become tricky as German rules allow many words to be joined to create very long compounds, e.g. Generalstaatsverordnetenversammlungen (“general states representatives meetings”).

    When translating in the other direction, from English into German, we can try to identify and join the words in the German translation.

    Again, as with French, agreement between nouns and articles (and pronouns and determiners) is a challenge due to the high level of inflection in German. In fact, the English word “the” can be translated as der, den, das, die, dem, or des, depending on the combination of number, gender, and case in German, which makes resolving this challenge even more difficult.

    Data Availability

    Similar to French, German features as an official language in many European institutions such as the European Parliament, the United Nations, and the European Patent Office, which contributes to a relatively large amount of available training data for MT. While it is one of the more challenging languages, its importance and the large number of research groups spread out through German speaking Europe, means that many of the challanges above have been well tackled (if not completely solved, yet!).

    Coming next week…

    In next week’s post, we will take our first departure from Europe and visit one of the most difficult languages for Machine Translation… Japanese!

    About Iconic Translation Machines Ltd.

    Iconic Translation Machines provides intelligent domain-adapted machine translation solutions as a cloud-based service for targeted sectors of the translation industry. Our highly-tuned engines produce best-in-class translation quality, allowing Language Service Providers to increase throughput, productivity, and margins. Our flagship product, IPTranslator, provides high-quality machine translation for the patent and intellectual property sector.

    Thank you, your sign-up request was successful! Please check your e-mail inbox.
    That email address is already subscribed, thank you!
    Please provide a valid email address.
    Oops. Something went wrong. Please try again later.

    Trackbacks & Pings

    WordPress Lightbox Plugin
    Get Started