The right way to use machine translation

The conundrum of what constitutes translation as opposed to post-editing of machine translation is one that has beset the language services industry for a few years now.

Ever since machine translation started to become the norm – in both academic and commercial contexts – users of machine translation have been asking themselves whether or not they’re doing something different mentally and practically when post-editing. This has also led researchers to ponder whether the language that’s produced from post-editing is actually a new one or simply a different type of translation.

What the research suggests

One such researcher is Antonio Toral, from the University of Groningen in the Netherlands, who has recently published a paper called “Post-editese: an Exacerbated Translationese”. In it, he explains how he compared a number of post-edited texts with human translations of the same texts using simplification, homogenisation and interference as his main assessment criteria.

He found that post-edited texts tend to have lower lexical variety and lower lexical density, with sentence lengths matching the source text a lot more closely. These tendencies produce texts that are generally less varied and less rich. They also tend to be more homogenous and introduce significant interference from the source language. The sample size, metrics used and text types selected for this study have their limits, but it’s still quite interesting to observe this phenomenon.

The effects on the wider language

Language change is a natural process: it happens and it has happened regardless of machine translation, with factors like instant messaging, social media and a constant need for better, faster and more optimised communication being key drivers of this trend. Now we can add post-editing to the list of factors that influence how the language we read evolves and develops.

That said, an important note to make is that while things like instant messaging are entirely driven by human input, machine translation (and by extension, post-editing) is not that different. Yes, the text is produced by a machine, but the machine itself is trained on data that is generated by humans, so in a sense, the machine is simply replicating what we all write and say to suit a specific context. So while it is an innovation, it’s also solidly grounded in data collected the old-fashioned way.

The machine is simply replicating what we all write and say to suit a specific context.

One could then argue that machine translation is innovative in the way it recycles the language to reuse it when possible – a very “green” approach of not wasting any training data it has been fed. Data is, after all, the new high-value commodity in our modern world, and language data is incredibly important for any translation provider thinking about using machine translation to its fullest potential.

A different skillset for translators

At the end of the day, it’s easy to think – regardless of whether you’re post-editing or translating – that you’re just turning one language into another, right? While that might be true, the way you approach the task is substantially different on a fundamental level: when you post-edit, there’s already something there: you’re not starting with a blank canvas.

This may sound obvious, but it leads to a number of interesting habits for post-editors, one of which is the temptation to simply read the MT output and think “yeah, that’ll do, next”, especially if you’re pressed for time with a deadline looming. False translations, unidiomatic constructions and internal inconsistencies are among the most common examples of “under-editing”, so it’s important to always be careful and rely on good old-fashioned attention to detail.

It’s important to always be careful and rely on good old-fashioned attention to detail.

Oddly enough, this is complicated by the fact that the latest developments in machine translation, and particularly in neural machine translation, have led to great improvements in the flow and grammatical accuracy of the output: the language can sound so natural that it can trick post-editors into thinking that there is less to edit than there actually is.

This means that translators working on post-editing jobs should not underestimate the task at hand: yes, they do have the existing skills to be ready for it, but the process might be more mentally complex that they initially expect.

When is MTPE the right solution?

This is all well and good, but what should a buyer of translation services ultimately make of this information? And what should a language services provider take into consideration when offering translation and post-editing of machine translation?

It all boils down to the intended purpose of the text (and in turn, your buyer): homogenising the text might sound like a terrifying thought, but if you’re ordering the translation of a safety data sheet for a chemical product or a list of ingredients for a beverage, is the flow of the language really that important? Wouldn’t the opportunity to be faster and more productive when translating these texts with machine translation – which thrives on repetition and recurrent patterns – be far more appealing?

Wouldn’t the opportunity to be faster and more productive with machine translation be far more appealing?

And at the other end of the spectrum, if you’re dealing with a text that’s very creative, for example a client’s website that’s on view to the public, it might be preferable to consider a different approach. In this case, machine translation might not be the best solution and you should consider opting for transcreation for a better end product.

A good example of correctly used machine translation is usually an engine trained and used for a particular domain or text type. For instance, an engine built entirely with and for legal texts will generally perform well with the often formulaic and standardised terminology and constructions typical of that domain. Neural machine translation should also be the best solution here, since legal texts tend to have lengthy, verbose sentences that can be quite time-consuming to break down and translate manually without extra aid.

It’s safe to say that the decision to use MT should be made on a domain-by-domain and perhaps even job-by-job basis. If you want to know more about when it’s the right solution, download our free Guide to machine translation.

Machine translation, Translation industry, Translation technology

The right way to use machine translation

What the research suggests

The effects on the wider language

A different skillset for translators

When is MTPE the right solution?

Related articles

Heavy machinery translation – building our future in...

Heavy machinery translation – building our future in...

From CAT tools to choir singing – Trixie’s 25 years ...

From CAT tools to choir singing – Trixie’s 25 years ...

Export planning – How language helps

Export planning – How language helps