In the previous two parts of our interview with STP’s machine translation guru, Mattia Ruaro, we’ve discussed the different kinds of machine translation (MT), the way the technology is changing, and how it can and should be used in the translation industry.
In this final part, Mattia shares his thoughts on how translators can use MT as a tool – and how STP is going about it.
You mentioned that editing machine translation output is a skill all of its own for a translator. How does it differ from translation?
I’d say that machine translation post-editing is not really that different from translation these days. Of course it’s quite different from translating a text from scratch in a word processor, but I think sometimes people forget that translators very often work with translation memories (TMs) nowadays. So they don’t necessarily have a blank slate even without MT.
How does working with MT compare to working with translation memories?
It’s somewhat similar; essentially, you are editing matches in both cases. In the case of TM matches, a tool will suggest translations of similar sentences that have been translated before and stored in a translation memory file attached to the project.
The translator might, for example, have a 95 per cent match where only the punctuation is different to that of the sentence they are looking at – or perhaps there is just one word that is different. Translators have become used to editing TM matches; an MT match is often much less accurate, but it’s a starting point.
How does the process of post-editing differ from the process of translation? What does a translator need to know before starting this?
The biggest problem, particularly for inexperienced editors, is bearing in mind that MT output is the work of a machine, not a human. You can’t trust a machine the same way you can trust a translation memory match from a previous translator.
This seems like a fairly straightforward distinction – the clue is in the name. But many struggle to make this distinction.
Another thing is the amount of training, because there is very little training and resources available. This is why we recorded webinars for our freelancers, and all our in-house translators have received training too. We can’t give people MT output and expect them to just deal with it.
Machine translation post-editing (MTPE) is not as intuitive as people think: training, experience and knowledge are necessary. It’s really helpful to try to understand why the machine produces the output it does – but this is something that requires an understanding of technology.
From my perspective, it’s really helpful to have very specific feedback from translators, as training the engine requires precision.
You can and should be able to influence the engine quality – you can train the engine as well as the translator. If you “put yourself in the machine’s shoes”, things start to fall into place.
STP has is certified according to the ISO 18587 standard on MTPE. Why is this?
It shows the amount of effort we’ve put into learning, understanding and using this kind of technology as a company. And this isn’t just the case for the technology team – our production teams have put in a lot of work as well.
Adhering to the standard is something we are doing with everyone’s best interests in mind; we’re trying to contribute to making a positive difference in the industry.
The standard is basically a set of guidelines – I would describe them as a collection of best practices. Basically, they raise the bar for everyone in the industry. Companies that care about these standards can promote them and counter the misuse of MT technology.
Do you think there is a lot of deliberate misuse of MT in the industry?
Some, certainly. There are companies trying to pass off raw MT output as translation and sending it out to vendors as regular revision projects, for example. But these agencies know what they are doing – and the revisers can spot this kind of thing a mile away.
There are some companies that lack information on the MT that they are using – or that they are expecting their vendors to use. They simply don’t know how good the MT output is, since they don’t have in-house people proficient in the relevant languages to check and provide feedback on it. STP only generates output for languages that we can check in-house. That way we know exactly what sort of quality it is.
Would you say that MTPE is faster than translation without MT?
There has been a lot of talk about MT improving productivity, but most of the research on this is done with very few people who are not working with strict deadlines. These circumstances do not really reflect the way in which translators work in the commercial world. The studies often make flawed assumptions too.
AT STP, we can test the effectiveness of MT as a tool internally. We have a lot of information on our translators and they already work with deadlines and under pressure, which makes them ideal test subjects.
How do you measure something like this accurately?
We have data based on edit distance – how different the final, edited output is from the raw, unedited MT output. In general, it seems that people are more productive with MT than without, though that doesn’t necessarily mean the quality is good.
How is STP measuring this?
Basically, we are making an effort to track productivity gains. We are doing this by recording how much time projects where no MT is used take compared to MTPE tasks. It’s not the perfect metric, but we need some hard data on MT and how useful it actually is.
Is the difference that MT makes reflected in the rates?
For us, it’s really not as simple as that. In terms of efficiency, we want to be sure we know what we are actually getting.
I see a lot of nonsense numbers being thrown around. For example, MTPE is supposedly 50 per cent more efficient than translation. Even if there are time-saving aspects to this, it’s not realistic to put it in those terms.
The productivity increase needs to be contextualised as well. There are often other aspects that slow the work down, such as special instructions that need to be read and implemented.
At STP, we want to take into account the total effort people put into a project. And, at the end of the day, you still have to do the work – the engine just provides suggestions.
Based on the feedback we’ve had from our translators, so-called “high fuzzies”, meaning TM matches that are ranked as a 75 per cent match or higher by the CAT tool, are almost always more helpful than MT matches. So when our translators use MT, they are only using it for sentences where there are no “high fuzzies” available. So far, this has been a useful approach for us.
The one thing that is perhaps different at STP is that we have over 70 in-house translators who can help us develop our approach.
How does having a large team of in-house translators help?
They are all professionals who have been trained to post-edit MT output, and they are happy to help us develop the engines further. I can understand how a smaller company might find this harder.
At STP, we have a small number of languages we work with on a daily basis as well, so we have fewer engines to worry about than some other companies.
If people are not happy with something, we can try to improve it – or abandon it if that doesn’t help. We can go back to the drawing board.
How do you work with the in-house teams in practice?
We have one person for each target language who is our go-to person for MT development. So far, we’ve had this for all the Scandinavian languages and English. I work with these MT “power users”, or MT experts, when I need feedback.
It’s easy to do this with translators who are genuinely interested in the process and the technology. The technology would not really be worth much to us without our translator teams – their effort is crucial in all stages of the process.