Human post-editing to evaluate and compare the quality of three machine translation engines for Russian translations of Cochrane Plain Language Summaries

2020 Abstracts

Yudina E¹, Gabdrakhmanov A¹, Ried J², Ziganshina LE¹

¹Cochrane Russia

²Cochrane Central Executive

Background: Translation and multi-language activities are a priority for Cochrane and critical to its Knowledge Translation (KT) activities to enable uptake of Cochrane evidence globally. High-quality machine translation (MT) can help facilitate efficient delivery of translated content in different languages, including Russian. The potential of MT and available options have been rapidly growing recently. It is important to understand which MT engine performs best for specific languages.

Objectives: To compare and evaluate the quality of three off-the-shelf MT engines for Russian translations of Cochrane Plain Language Summaries (PLS) using human post-editing.

Methods: We compared three MT engines, DeepL MT, Google Translate MT and Microsoft Translator MT, as part of our standard translation workflow within Memsource translation management system. We selected 90 PLSs published in the Cochrane Library from May 2018 to April 2019 and not yet translated into Russian. We translated 30 PLSs each with the three MT engines. We invited 10 experienced volunteer translators and editors to post-edit the machine translations, and randomly assigned them 3 pre-translated PLS per MT engine, so 9 PLS in total. Two editors performed a second and final review. Memsource Machine Translation Quality Estimation (MTQE) provided an initial artificial intelligence-powered estimate of how much editing would be required for each machine translated text. The Memsource analysis feature allowed precise recording and numerical presentation of the amount of human editing required for each MT engine at both editing steps. We analysed and interpreted those data after machine translation and each consecutive human post-editing step to assess the quality of the three MT engines.

Results: Google Translate MT had on average the highest ratings for translation quality: the overall quality estimate after machine translation was the highest, whilst the amount of required human revisions was the lowest at both editing steps. DeepL MT followed closely after the Google Translate MT showing overall slightly lower quality estimates after machine translation and requiring overall slightly more editing. Microsoft Translator MT had the lowest quality estimate ratings and required the most revisions at both human editing steps.

Conclusions: Among the three MT engines that we tested, Google Translate MT appeared to perform best for Russian translations of Cochrane PLSs, while DeepL MT also showed good results. At this point in time, we would recommend Google Translate, and DeepL MT as the second-best option, for machine translation of Cochrane PLSs into Russian. Future developments in MT research and the MT market may mean that a different MT engine will become preferable. While Google Translate MT performed slightly better than DeepL, we have opted for DeepL as default MT engine in our translation workflow, as DeepL offers preferable IP and copyright terms.

Patient or healthcare consumer involvement: About one fifth of the volunteer editors in our study were consumers.