Reflections on a Translation Studies internship working on low resource languages
Iconic plays a major role in the million euro EU funded PRINCIPLE project, where we are building bespoke neural machine translation engines for low resourced languages (languages with very little parallel data). Méabh Sloane initially joined Iconic as an intern from her Masters programme at DCU, but grew into the position of MT Research Assistant on the PRINCIPLE project over time. Méabh has now left Iconic to start a full-time PhD programme, but has kindly shared here her experiences of working with us. We want to take this opportunity to thank Méabh for her valuable contributions to Iconic and the PRINCIPLE project, and wish her all the best in her future endeavors!
I began working as an intern with Iconic in May while completing a Masters Degree in Translation Studies at Dublin City University. A core element of the Masters programme is an introduction to translation technology that centres on the plethora of computer assisted translation (CAT) tools. Having thoroughly enjoyed this module, I jumped at the chance to work for an established language technology company such as Iconic. Although the ongoing pandemic saw the world put on pause two months prior, this challenge was overcome by the company’s ability to migrate to an online workspace. Working remotely for the entirety of my internship did not hinder my integration within the team. My internship came to an end in early June and I was delighted to stay on at Iconic as an MT Research Assistant.
My main responsibilities related to PRINCIPLE, an EU-funded project that seeks to collect high quality language resources for the low resource languages of Irish, Croatian, Icelandic and Norwegian. Iconic’s role is to use the resources collected as training data from data providers for the development of MT engines. This allows for the validation of language resource quality before sharing with the EU. My various tasks consisted primarily of data collection, MT engine evaluation and Irish MT research.
While data collection was underway by PRINCIPLE project partners in each country, Iconic sought to assess the volume of language resources already available for use in the project languages. As an intern, it was my job to review, collect and record data from the European Language Resource Coordination ELRC-SHARE platform which was then used to build generic baseline engines. This provided the project with a beneficial starting volume of data and would later facilitate an evaluative comparison for future PRINCIPLE MT engines. This comparison would in turn highlight both the benefit and relevance of the work carried out throughout the project – improving MT in low resource languages with newly-collected language resources. This task involved close collaboration with project partners for an ELRC data quality review and required me to upskill tech-wise by making a new acquaintance with Thor, one of Iconic’s servers.
PRINCIPLE MT engines are bespoke systems built specifically for organisations who contribute data to the project; referred to as Early Adopters. These early adopters are also conducting evaluations of the MT engines to assess engine performance and provide Iconic with feedback to maximise translation quality by tackling identified issues. While DCU, as project coordinator, is coordinating the overall evaluation of MT engines, Iconic, too, conducts their own internal evaluation.
Automatic evaluation is a key part of the Iconic pipeline that sees newly built engines compared with online MT systems. This gives a general indication of engine performance but does not pinpoint areas that require improvement. I explored the multiple metrics of human evaluation, researched the most efficient method of error analysis and shared my findings within a blog post for the Neural MT Weekly series. Aside from reviewing engine output and human evaluation, I also assisted the MT scientists in their examination of the reliability of automatic metrics, namely METEOR and ChrF.
Irish MT Research
Part of the reason I was taken on as an intern at Iconic was my fluency in Irish and knowledge of the status of the language. Irish is classified as a minority language but is protected by its status as an official language of the EU and language legislation in Ireland. While there is a continuous demand for Irish translation services, this is met by a lack of qualified translators. The derogation on Irish translation in the EU is due to be lifted in 2021 and this will see a surge in the volume of Irish translation required within European Institutions. In addition, public bodies in Ireland that are affected by the Official Languages Act, regularly fail to comply with regulations in relation to bilingual services. With an amendment to this Act currently being presented in the Dáil, it is the ideal time to consider the need for machine translation with regards to the Irish language. Throughout my time at Iconic I was delighted to have been given the opportunity to conduct detailed research into language legislation in Ireland and the EU, the current situation of the translation industry for Irish and the history of past MT systems.
My time at Iconic has come to a bittersweet end, having been offered a scholarship as a PhD student in Fiontar & Scoil na Gaeilge at DCU. I am grateful for the opportunities awarded to me during the past eight months and for the wealth of knowledge shared. While professional translators may feel threatened by advancements in MT, they can take comfort in the fact that there will always be a collaborative role available. The insight of a human translator proves invaluable within the context of translation technology, be that as post-editors, engine evaluators or expert advisors in matters such as style, domain and register. I wish both the Iconic and PRINCIPLE teams the very best. While I may only have met my colleagues virtually through coffee mornings, daily stand-ups and monthly zoom socials, I can wholeheartedly say that Iconic boasts a supportive, productive and caring workplace where curiosity, research and expertise continuously flourish.