Blog

MT-Success-Series-8-Quality-Expectations-1-1.

Quality Expectation Quick Facts BLEU, a popular metric for automatically assessing machine translation (MT) quality, was first proposed in 2002 by researchers at IBM Watson. It is actually an acronym for “BiLingual Evaluation Understudy”. Quality estimation - the task of judging MT quality on the fly - is extremely challenging and researchers compare performance annually in a forum organised by the Conference (formerly Workshop) on...

Read More
MT-Success-Series-7-Integration-Requirements

Integration Requirements Quick Facts Amazon Web Services (AWS) is one of the world’s largest cloud-computing providers with more than 1 million active users. Netflix, which runs on AWS, accounts for ⅓ of all internet traffic at peak times. Customer Relationship Management software is the largest market for software as a service models, with more than $4bn in annual sales.  In layman’s terms, Moore’s Law suggests...

Read More
MT-Success-Series-6-Translation-Memory-Leverage-

Translation Memory Quick Facts Translation Memory eXchange (TMX) is the standard format for sharing translation memory files and was created in 1997. Trados Studio is by far the most used CAT tool among translators, with memoQ and WordFast following. We’ve said it before, but translation memories are a perfect example of training data. This week’s topic is more geared to the machine translation use case of...

Read More
MT-Success-Series-5-Buyer-Experience-1

Buyer Experience Quick Facts The concept of using computers to translate languages was first proposed by Warren Weaver back in 1947. According to recent CSA research, 77% of enterprise MT users are based in North America.  It is estimated that enterprises will translate 59% of all content using MT by 2019. This week’s topic arguably has the single biggest impact on successful (painless) adoption of machine...

Read More
ACL Conference-Berlin-August 7-12

Our own Carmen Heger was in Berlin at the Humboldt University for the Association for Computational Linguistics annual conference. ACL 2016 was the biggest yet with over 1,600 attendees participating in a variety of workshops, roundtables, panels, poster sessions, and talks. Natural Language Processing (NLP) applications, including Machine Translation, are becoming increasingly more prevalent across a wide variety of industries. This jump in interest is due...

Read More
SXSW-–-Austin-–-March-10-–-19

In collaboration with Nathan Hurst of Shutterstock (formerly Google, Amazon, Adobe) our CEO John Tinsley will present a panel on "Transcending Language: Understanding with AI". Nathan and John will be exploring the current landscape of machine learning, how we are teaching AI to not only recognize images but also describe them , how machine translation aids in the process and what the future of AI-assisted search...

Read More
NTIF-–-Malmö-–-November-24-–-25

Iconic are heading to Malmö for the Nordic Translation Industry Forum from November 24 - 25 at the Clarion Hotel & Congress. Nordic countries are home to some of the most challenging languages for Machine Translation so we are traveling to the source to help our colleagues in these countries understand what the particular challenges are for MT, and to learn where MT can fit into...

Read More
MT-Success-Series-4-Training-Data-1

Training Data Quick Facts The Europarl corpus is a free collection of MT training data with more than 600 million words available across 20 language pairs. The amount of data used to train an engine directly correlates to how fast the engine can translate words, and how much disk space/memory is required to run it. The Rosetta Stone is probably the most famous example of...

Read More
MT-Success-Series-3-Content-Type-1

Content Type Quick Facts The field of Controlled Language proposes 10 authoring rules to make content more suitable for machine translation. More than 50% of all internet users contribute to the creation of user-generated content, and more than 40 million pieces of content are added on Facebook alone, every hour. Patent claims must be written as a single sentence, which frequently leads to artificially long...

Read More
MT-Success-Series-2-Volume-1

Volume Quick Facts IBM estimates that 2.5 quintillion bytes of data are created on a daily basis (that’s a lot of content to translate!). US Patent 5146591, published in 1992, is the longest patent ever published, containing more than 1 million words.  The most popular languages in terms of words in literary publications (aside from English), are Mandarin for books, Spanish for newspapers/magazines, German for...

Read More