Developing an Algorithm for Arabic Document Image Analysis
Ibrahim Mohammed Ali Amer;
Abstract
Document layout analysis (DLA) is the process of identifying the regions of interest in document image which requires the separation of text regions from non-text ones. DLA is an essential step for Optical Character Recognition systems (OCR), document management systems, document-archiving systems and more. The text of the document fed to the OCR must be extracted first and isolated from images if exist. OCR systems recognize printed or handwritten text images, these images must contain text only and if the document contains text mixed with photos, graphs shapes or halftones; this will result in a negative effect on recognition accuracy. Thus, DLA is a crucial step before OCR. The DLA task is difficult as there is no fixed layout for all documents, but instead, there are several layouts based on the document type: a newspaper, a magazine, a book or a manuscript. There are various approaches for DLA for various different languages, but document layout analysis for Arabic scripts is
Other data
| Title | Developing an Algorithm for Arabic Document Image Analysis | Authors | Ibrahim Mohammed Ali Amer | Issue Date | 2018 |
Recommend this item
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.