8 Steps to MT Success Series

MT-Success-Series-8-Quality-Expectations-1-1.

Quality Expectation Quick Facts BLEU, a popular metric for automatically assessing machine translation (MT) quality, was first proposed in 2002 by researchers at IBM Watson. It is actually an acronym for “BiLingual Evaluation Understudy”. Quality estimation - the task of judging MT quality on the fly - is extremely challenging and researchers compare performance annually in a forum organised by the Conference (formerly Workshop) on...

Read More
MT-Success-Series-7-Integration-Requirements

Integration Requirements Quick Facts Amazon Web Services (AWS) is one of the world’s largest cloud-computing providers with more than 1 million active users. Netflix, which runs on AWS, accounts for ⅓ of all internet traffic at peak times. Customer Relationship Management software is the largest market for software as a service models, with more than $4bn in annual sales.  In layman’s terms, Moore’s Law suggests...

Read More
MT-Success-Series-6-Translation-Memory-Leverage-

Translation Memory Quick Facts Translation Memory eXchange (TMX) is the standard format for sharing translation memory files and was created in 1997. Trados Studio is by far the most used CAT tool among translators, with memoQ and WordFast following. We’ve said it before, but translation memories are a perfect example of training data. This week’s topic is more geared to the machine translation use case of...

Read More
MT-Success-Series-5-Buyer-Experience-1

Buyer Experience Quick Facts The concept of using computers to translate languages was first proposed by Warren Weaver back in 1947. According to recent CSA research, 77% of enterprise MT users are based in North America.  It is estimated that enterprises will translate 59% of all content using MT by 2019. This week’s topic arguably has the single biggest impact on successful (painless) adoption of machine...

Read More
MT-Success-Series-4-Training-Data-1

Training Data Quick Facts The Europarl corpus is a free collection of MT training data with more than 600 million words available across 20 language pairs. The amount of data used to train an engine directly correlates to how fast the engine can translate words, and how much disk space/memory is required to run it. The Rosetta Stone is probably the most famous example of...

Read More
MT-Success-Series-3-Content-Type-1

Content Type Quick Facts The field of Controlled Language proposes 10 authoring rules to make content more suitable for machine translation. More than 50% of all internet users contribute to the creation of user-generated content, and more than 40 million pieces of content are added on Facebook alone, every hour. Patent claims must be written as a single sentence, which frequently leads to artificially long...

Read More
MT-Success-Series-2-Volume-1

Volume Quick Facts IBM estimates that 2.5 quintillion bytes of data are created on a daily basis (that’s a lot of content to translate!). US Patent 5146591, published in 1992, is the longest patent ever published, containing more than 1 million words.  The most popular languages in terms of words in literary publications (aside from English), are Mandarin for books, Spanish for newspapers/magazines, German for...

Read More
MT-Success-Series-1-Language-1

Language Quick Facts There are over 7,097 living languages in the world. Papua New Guinea hosts over 840, or 11.8%, of these living languages, the highest amount of any one country on earth 1.1% of languages are spoken by 80% of the world’s population  There are over a hundred languages spoken by just 1 - 9 people, with 2,444 languages described as “threatened or worse” We’re kicking...

Read More
8-Steps-to-MT-Success-Series-Introduction

Machine Translation is undoubtedly a complex technology. It’s rare to have software that can work so well in some cases, and yet struggle in others. This obviously presents a challenge to potential end users and buyers of MT who will naturally be asking “how likely is it that MT will work for my particular needs?”. It’s a question that we’ve had to answer on countless...

Read More