Language Challenge #8 – Russian

Language Challenge #8 – Russian

Русский (Russiky)


Quick Facts

  • Russian is the official language of four countries – Russia, Belarus, Kazakhstan, and Kyrgyzstan – but is widely spoken and understood in more than a dozen others.
  • It is a Slavic language and shares roots with many of the other languages in Eastern Europe, including the languages of the Baltic.
  • The Imperial Russian Academy (now the Russian Academy of Sciences) was founded in St. Petersburg in 1783 to govern the language. It was modelled on the Académie française.
  • Studies have shown that understanding and use of Russian is decreasing worldwide as the language is gradually being replaced by local languages, particularly in former U.S.S.R. states.
  • Famous quote in Russian: Все строят планы, и никто не знает, проживёт ли он до вечера (“Everyone is making plans, but no-one knows if he’s living to the evening.”) – Lev Tolstoy.
  • Characteristics

    Russian uses the Cyrillic alphabet which contains 33 characters.

    Russian is a very synthetic language and its grammar has a highly fusional morphology. Verbs are conjugated for person, number, and two simple tenses, while other forms such as aspect and moods can be expressed using free morphemes.

    Russian nouns are inflected for two numbers (singular and plural), three genders (masculine, feminine, and neutral) and six cases (nominative, genitive, dative, accusative, instrumental, and prepositional). Up to ten further cases are also identified including locative, partitive, and vocative, though they do not necessarily apply to all nouns

    Basic Russian word order is subject-verb-object but, because most relations are marked by the extensive inflection, word order is typically quite free. There are also many cases in which the subject is dropped completely, for example”

  • Russian sentence: Пришёл, увидел, победил (“Veni, vidi, vici”)
  • Literal translation: “came, saw, conquered”
  • Idiomatic translation: “I came, I saw, I conquered”
  • Russian and MT

    As we have seen in our previous articles, languages with rich morphology or significantly different word order increase the challenge for machine translation. Russian has both of these characteristics in spades which makes it a particularly difficult language for English MT.

    In Russian, word order is relatively free and different positioning of the same words in a sentence can alter the meaning. In the examples below, for instance, there are three translations for “I went to the shop”. Each example uses the same words but in a different order, and each example is used in a different context. The top example (A) is the basic sentence, while the bottom left example (B) is used to stress that “I” went to the shop. The bottom right example (C) is more common at the beginning of a narrative.

    Screen Shot 2014-10-07 at 17.39.36

    In addition to the challenge of learning where to place the words, we also have to determine what the words are. The rich morphology of Russian means that developing MT engines for English into Russian is much more difficult than the other direction. For Russian into English, the many Russian variants of, say, a personal pronoun all translate to a single English word. However, when translating that same word from English into Russian, we need to know which case variant to use. The table below illusrates the magnitude of this challenge, showing 48 Russian variants for just 8 English personal pronouns.

    (Source: Wikipedia)

    (Source: Wikipedia)

    In order to address these challenges for MT development, deep morphological and syntactic analysis is required. The morphological analysis helps to determine what word forms to use, which, in turn, has an impact on how the words are ordered in the translated sentence. However, tools for carrying out this analysis are not as widespread for Russian as they are for other, more studied languages. For this reason, Russian remains a big challenge for MT.

    Data Availability

    Russian is an official language on the United Nations and, as such, there are large corpora available for MT training. The Russian Academy of Sciences has also undertaken to develop a large corpus of Russian which has significant parallel portions.

    About Iconic Translation Machines Ltd.

    Iconic Translation Machines provides intelligent domain-adapted machine translation solutions as a cloud-based service for targeted sectors of the translation industry. Our highly-tuned engines produce best-in-class translation quality, allowing Language Service Providers to increase throughput, productivity, and margins. Our flagship product, IPTranslator, provides high-quality machine translation for the patent and intellectual property sector.

    Thank you, your sign-up request was successful! Please check your e-mail inbox.
    That email address is already subscribed, thank you!
    Please provide a valid email address.
    Oops. Something went wrong. Please try again later.

    Comments are closed.

    WordPress Lightbox Plugin
    Get Started