Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation

Ebrahim S. ; Hegazy, Doaa ; Mostafa M. ; El-Beltagy S. 


Abstract


© 2017 The Author(s). In this paper we introduce a new method for detecting a type of English Multiword Expressions (MWEs), which is phrasal verbs, into an English-Arabic phrase-based statistical machine translation (PBSMT) system. The detection starts with parsing the English side of the parallel corpus, detecting various linguistic patterns for phrasal verbs and finally integrate them into the En-Ar PBSMT system. In addition, the paper explores the effect of cliticizing specific words in English that have no Arabic equivalent. The results, which reported with the BLEU scores, showed that some patterns achieved significant improvements compared to other patterns and still the baseline achieves the highest score. This paper shows that, by detecting more linguistic patterns and integrating them into En-Ar SMT system, translation quality could be improved with other integration methods. Yet, the results show which path is worth to follow and clarifies the perspective that linguistic features are not handled properly in the statistically learned models.


Other data

Issue Date 1-Jan-2017
Journal Procedia Computer Science 
URI http://research.asu.edu.eg/123456789/1021
DOI 111
http://api.elsevier.com/content/abstract/scopus_id/85037742623
117
10.1016/j.procs.2017.10.099


Recommend this item

CORE Recommender
1
Citations

9
Views


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.