Language Challenge #1 – French

Language Challenge #1 – French

Le français


Quick Facts

  • Official language of 29 countries across Europe, Africa, North American, the Caribbean, and Australasia.
  • Kinshasa, capital city of DR Congo, is the second largest French speaking city in the world, after Paris.
  • It’s a Romance Language, descended from Vulgar Latin (the language of Ancient Rome).
  • The language is governed by the Académie française (French Academy) which was founded in 1685.
  • Famous quote originating in French: Je pense donc je suis (“I think therfore I am”) – Descartes, philosopher and mathematician.
  • Characteristics

    French uses the basic Latin alphabet, extended with 4 diacritics on vowels (for example ó, à, ï, and ê), a cedilla (ç), and two ligatures (œ and æ).

    In terms of word order, it is a subject-verb-object language with some exceptions, for instance, when speaking in the interrogative (like in the picture above).

    Nouns in French are inflected for number, while adjectives are also inflected for gender, and case is marked by word order. Verbs are conjugated in terms of tense, aspect, and mood.

    Another feature of the language is two-part negation where the particle ne combines with another negative word that modifies the verb. This can lead to split negatives, e.g. «Je ne sais pas» (“I do not know”), which can have implications for Machine Translation (MT).

    French and MT

    Considering this post is written (and you are reading it) in English, we will use English as a point of comparison when discussing the challenges of MT for French.

    Firstly, it must be said that English-French is one of the easier language combinations for MT. Aside from the relative compatibility of the languages, it is one of the most researched language pairs in MT academia and there is an abundance of training data available due to the official status of both languages in many jurisdictions (more on this in the next section).

    Typically, the closer two languages are in word order, the easier the translation task is as it becomes almost reduced to lexical choice. French and English share the same subject-verb-object structure, although there are variances which present a challenge for MT.

    In the example below, we see that the word order between the English and French sentences is relatively similar, except for the noun-adjective pair “red wine” where the order is swapped across the two languages. The solution to this for MT is to either try to capture all pairs of nouns and adjectives that might occur in the content being translated, and treat them as a single phrase. Alternatively, and perhaps more feasible, is to identify the adjectives and, having learned from the previous examples in our training data, move them one postition to the right or left (depending on the target language) during translation. This is known as reordering.

    Screen Shot 2014-08-08 at 17.42.49

    Another challenge is faced when it comes to agreement between articles and nouns. Unlike in French, English articles are not inflected for number and gender. Therefore, when translating into French, context must be taken into account to ensure agreement between the articles and nouns.

    e.g. the house/the houses vs. la maison/les maisons
    e.g. the house/the cheese vs. la maison/le frommage

    These points broadly apply to the other Romance languages (e.g. Spanish, Portguese) but they have their own exceptions and challenges which we will address when we treat those languages in a separate post. One option again for MT in this case is to try to learn article-noun pairs and translate them together. Alternatively, morphological information can be used to identify the features of the noun and only allow it to be preceded by an article that agrees. While the second option is better from a linguistic perspective, it requires more computing power and can lead to data sparsity issues.

    Data Availability

    As mentioned above, English-French MT is one of the most widely researched language pairs in academia, owing in large part to the abundance of available training data. This is due to the large number of multinational institutions featuring both English and French as official languages, including, but not limited to: the European Parliament, the United Nations, the Canadian Parliament, and the European Patent Office.

    Coming next week…

    We hope you enjoyed this inaugural post in our Language Challenges Series. Make sure to stay tuned for next week, when we step the difficulty up a notch and tackle… German!

    About Iconic Translation Machines Ltd.

    Iconic Translation Machines provides intelligent domain-adapted machine translation solutions as a cloud-based service for targeted sectors of the translation industry. Our highly-tuned engines produce best-in-class translation quality, allowing Language Service Providers to increase throughput, productivity, and margins. Our flagship product, IPTranslator, provides high-quality machine translation for the patent and intellectual property sector.

    Thank you, your sign-up request was successful! Please check your e-mail inbox.
    That email address is already subscribed, thank you!
    Please provide a valid email address.
    Oops. Something went wrong. Please try again later.

    Trackbacks & Pings

    WordPress Lightbox Plugin
    Get Started