Developing an Algorithm for Arabic Document Image Analysis

Ibrahim Mohammed Ali Amer;

Abstract


Document layout analysis (DLA) is the process of identifying the regions of interest in document image which requires the separation of text regions from non-text ones. DLA is an essential step for Optical Character Recognition systems (OCR), document management systems, document-archiving systems and more. The text of the document fed to the OCR must be extracted first and isolated from images if exist. OCR systems recognize printed or handwritten text images, these images must contain text only and if the document contains text mixed with photos, graphs shapes or halftones; this will result in a negative effect on recognition accuracy. Thus, DLA is a crucial step before OCR. The DLA task is difficult as there is no fixed layout for all documents, but instead, there are several layouts based on the document type: a newspaper, a magazine, a book or a manuscript. There are various approaches for DLA for various different languages, but document layout analysis for Arabic scripts is


Other data

Title Developing an Algorithm for Arabic Document Image Analysis
Authors Ibrahim Mohammed Ali Amer
Issue Date 2018

Attached Files

File SizeFormat
J5558.pdf294.29 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 3 in Shams Scholar
downloads 2 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.