IBtelligent I xpert System jor Articulate Arabic Text Machine Ueader
Mohamed Fathi Hamed ALRa.hmawy;
Abstract
This thesis aims mainly to build a fast and efficient Arabic OCR system using object-oriented programming technology in order to build an Articulating machine reader for the printed Arabic writing.
Hence, the basic difficulties and the different characteristics of the Arabic text recognition problem are outlined. Then, different stages of OCR systems are reviewed and the basic approaches used in each stage are studied and the previous work in Arabic OCR is reviewed.
Also, basic concepts of Neural Networks and its most general models are reviewed. Then, a summaty of back propagation algorithmfor teaming is presented as the most widely learning algorithm. The virtues and limitations of Back-Propagation Leaming are studied.
Next, The details of the algorithms used in implementing the proposed system are studied. Where, novel algorithms for document preprocessing and extracting the chain-coded inner and outer contours of the subwords of the document and representing them as objects are presented. Then, locations of these objects are analyzed in order to segment them in separate lines. Also, a new Arabic text segmentation algorithm is presented for seginenting the chain coded contours of the Arabic subwords into chain coded objects of the primitives of the characters (sub-characters) constituting these subwords. Then, a novel fast and efficient algorithm for extracting the central-moments features out of the chain-coded upper and lower outer contours of the segmented primitives is used in order to improve the feature extraction rate.
For the classification of the primitives, a two-stage hybrid tecognition system is implemented for the classification of the segmented primitives. The hybrid system uses two Neural networks in its first stage. The used neural networks are embedded within the system as objects and linked with its objects for clustering the primitives to be recognized into one of the predefined clusters. Then, in the second stage a set of classifiers (one classifier for each cluster) that use statistical, structural and heuristic rules of Arabic writing are implemented for the final classification of the primitives and building the characters.
Also, a n9vel method for recombining the recognized sub-words into words using in-between spaces and language rules is presented.
Finally, An Arabic word-based articulation sub-system is presented for articulating either the recognized text or simply from a text file.
Hence, the basic difficulties and the different characteristics of the Arabic text recognition problem are outlined. Then, different stages of OCR systems are reviewed and the basic approaches used in each stage are studied and the previous work in Arabic OCR is reviewed.
Also, basic concepts of Neural Networks and its most general models are reviewed. Then, a summaty of back propagation algorithmfor teaming is presented as the most widely learning algorithm. The virtues and limitations of Back-Propagation Leaming are studied.
Next, The details of the algorithms used in implementing the proposed system are studied. Where, novel algorithms for document preprocessing and extracting the chain-coded inner and outer contours of the subwords of the document and representing them as objects are presented. Then, locations of these objects are analyzed in order to segment them in separate lines. Also, a new Arabic text segmentation algorithm is presented for seginenting the chain coded contours of the Arabic subwords into chain coded objects of the primitives of the characters (sub-characters) constituting these subwords. Then, a novel fast and efficient algorithm for extracting the central-moments features out of the chain-coded upper and lower outer contours of the segmented primitives is used in order to improve the feature extraction rate.
For the classification of the primitives, a two-stage hybrid tecognition system is implemented for the classification of the segmented primitives. The hybrid system uses two Neural networks in its first stage. The used neural networks are embedded within the system as objects and linked with its objects for clustering the primitives to be recognized into one of the predefined clusters. Then, in the second stage a set of classifiers (one classifier for each cluster) that use statistical, structural and heuristic rules of Arabic writing are implemented for the final classification of the primitives and building the characters.
Also, a n9vel method for recombining the recognized sub-words into words using in-between spaces and language rules is presented.
Finally, An Arabic word-based articulation sub-system is presented for articulating either the recognized text or simply from a text file.
Other data
| Title | IBtelligent I xpert System jor Articulate Arabic Text Machine Ueader | Other Titles | نظام خبير ذكى للالة الناطقة بالعربية | Authors | Mohamed Fathi Hamed ALRa.hmawy | Issue Date | 2001 |
Attached Files
| File | Size | Format | |
|---|---|---|---|
| محمد فتحى حامد.pdf | 410.09 kB | Adobe PDF | View/Open |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.