Neural Networks Pipeline for Offline Machine Printed Arabic OCR

Radwan, MA; Khalil, Mahmoud; Abbas, HM;

Abstract


In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network model predicts the font size of an Arabic word, then the word is normalized to an 18pt font size that will be used to train the next two models. The second model is used to segment a word into characters. The problem of words segmentation in the Arabic language, as in many similar cursive languages, presents a challenge to the OCR systems. This paper presents a multichannel neural network to solve the offline segmentation of machine-printed Arabic documents. The segmented characters are then fed as an input to a convolutional neural network for Arabic characters recognition. The font size prediction model produced a test accuracy of 99.1%. The accuracy of the segmentation model using one font is 98.9%, while four-font model showed 95.5% accuracy. The whole pipeline showed an accuracy of 94.38% on Arabic Transparent font of size 18pt from APTI data set.


Other data

Title Neural Networks Pipeline for Offline Machine Printed Arabic OCR
Authors Radwan, MA; Khalil, Mahmoud ; Abbas, HM
Keywords OCR;Arabic word segmentation;Convolutional neural networks;Character recognition;CHARACTER SEGMENTATION;RECOGNITION;TEXT
Issue Date 2018
Publisher SPRINGER
Journal NEURAL PROCESSING LETTERS 
Volume 48
Issue 2
Start page 769
End page 787
ISSN 1370-4621
DOI 10.1007/s11063-017-9727-y
Scopus ID 2-s2.0-85032508045
Web of science ID WOS:000446501500007

Attached Files

File Description SizeFormat Existing users please Login
2018 Neural Networks Pipeline for Offline Machine Printed Arabic OCR.pdf1.71 MBAdobe PDF    Request a copy
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

Citations 29 in scopus


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.