Ensemble Architecture

Linguistic Engineering

In order to build truly effective customised Machine Translation systems, the training process must not only focus on the languages involved but also on the content being translated, and the style in which it is written. This is our primary focus at Iconic and our Ensemble Architecture™ allows us to deliver best-in-class performance.

Our world-class team has developed an extensive set of processes based on years of research and development to adapt the training process for the domains and content being translated. For each Iconic domain, we have a unique training process across multiple languages that targets the nuances of that content type and ensures that the most important elements are translated correctly.

Beyond data

At Iconic Translation Machines, we take a different approach to building solutions for translation automation. Our focus is not on collecting data to dump into a training process. Instead, our focus is on building Machine Translation systems – using all available techniques – adapted to the language and style of the content being translated: true customisation.

Data Engineering

Most comtemporary approaches to Machine Translation use technology based on statistics; so-called Statistical Machine Translation (SMT). These approaches require large amounts of training data – parallel texts in multiple languages – in order to ‘learn’ how to translate new content. SMT relies almost exclusively on data engineering; large-scale processing of this data to build translation systems. Custom-build Machine Translation solutions use this approach, illustrated below, to build systems across multiple domains. This approach can be effective for non-technical translation with narrow focus but it is not fit for purpose when dealing with complex content such as legal, pharmaceutical, and medical. That’s where our Ensemble Archicture™ with Linguistic Engineering comes to the fore.


Linguistic Engineering + Data Engineering

Our approach to MT at Iconic combines the strength of data engineering with intelligent Linguistic Engineering that provides unique domain-adaptation capabilities to ensure the delivery of high quality translations for technical content that is more informative and more usable for the end user.

“Translation machines with subject matter expertise”

When sourcing professional translators for jobs in technical areas such as legal and life sciences, it is often necessary to find someone with subject matter expertise, such is the nature of the content. Similarly, when training Machine Translation systems for these areas, the training and translation process needs to have similar knowledge of the domain in order to apply the correct processes. For example:

  • In pharmaceutical translation that means understanding the different characteristics in “Summaries of Product Characteristics (SmPCs)” and “Case Report Forms (CRFs)” documents;
  • In patent translation that means knowing what should be translated in a claim of a chemical patent that contains a long amino acid sequence, and understanding the difference in style in “Written Opinions of the Searching Authority (WOSAs)”;
  • In medical device translation that means understanding how important “Instructions For Use (IFU)” are and the terminonolgy used in “Patient Report Outcomes (PROs)”.

The Ensemble Architecture™

We have developed unique Linguistic Engineering processes specifically for these types of content based on deep domain understanding and we use these to construct the ensemble that forms the backbone of our MT architecture, illustrated below. We do no restrict ourselves to a single approach to MT – be it statistical, rule-based, or hybrid – as doing this places a limit on the performance we can achieve. Instead, we combine processes from all paradigms, including linguistic, syntactic, and semantic information, to determine what gives the best MT quality for a particular input.

