Richness Lost in Machine Translationese: Lexical Richness in Human Translation versus Neural Machine Translation from Arabic into English

Kotait, Radwa;

Abstract


Neural Machine Translation (NMT) might have been pronounced as faster and better than human translation. However, NMT inherently overgeneralizes the more frequently appearing patterns detected in their training data at the expense of the less frequently appearing ones in a phenomenon dubbed “machine translationese”. This machine translationese has been noticed to reflect some controversial asymmetries. One usually overlooked facet of this machine bias is the loss of "lexical richness". The generated translations have only recently been noticed to be disproportionately deformed and impoverished, negatively impacted with the NMT’s tendency to overgeneralize. Lexical richness, notwithstanding its worth, has not received the same attention that lexical accuracy and error-measuring have received, and more important, it has not received any attention at all in under-researched language pairs, such as Arabic–English. This study aims to shed light on lexical richness in the output of Arabic-into-English NMT as opposed to human translation (HT), answering the question: Does HT exhibit more lexical richness than NMT does? The study adopts the most agreed-upon definition of lexical richness as a superordinate term that includes “lexical diversity, “lexical density”, and “lexical sophistication”; all three are statistical metrics that gauge the lexical richness of a text. The study analyses the outputs of two NMTs, Google Translate and Microsoft Translator, in terms of lexical richness, using both quantitative and qualitative methods, and then compares the results to those of the HT output. The corpus of the study is comprised of a news subcorpus and a literary subcorpus.


Other data

Title Richness Lost in Machine Translationese: Lexical Richness in Human Translation versus Neural Machine Translation from Arabic into English
Authors Kotait, Radwa 
Keywords lexical density;lexical diversity;lexical richness;lexical sophistication;neural machine translation;machine translationese
Issue Date 2024
Journal The Egyptian Journal of Language Engineering 
Volume 11
Issue 1
Start page 66
End page 85

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.