Developing Semantic-based System for Arabic Information Retrieval
Wasim Ahmed Abdul-Aziz Alromima;
Abstract
In the era of information overload, Information Retrieval Systems are vital applications. Nowadays, the World Wide Web and the social media has become a vast library of unstructured data, which is laboriously comprehended and processed without using intelligent techniques.Many researchers are endeavoring to enhance search results in terms of precision and recall by developing new methods, especially in semantics. The amount of available Arabic content is increasing, but this is of low usefulness due to the complexity of the Arabic language morphology and the lack of resources like ontologies and machine-readable dictionaries.
The main objective of this thesis is to introduce a new Semantic-based Arabic Information Retrieval System (SAIRS) to improve Arabic text retrieval. Due to the complexity aspect and limited resources of the Arabic language, the proposed approach has three main contributions. First, the query is expanded using n-gram term collocations, which are automatically mined from the Arabic corpus; therefore there is no need for external semantic resource. Second, the query is expanded using Arabic domain ontology, which wasdesigned and represented manually by the Web Ontology Language (OWL).Third, the system index is constructed using the corpus words, and hence the cost and effort of the stemming process are saved.The Vector Space Model (VSM) has been employed to represent both documents and user queries. The experimental evaluation has been conducted on the scripts of the Arabic Holy Quran.
The main two sub-objectivesfor this thesis are: first,extracts tagged n-gram collocations (from 2- 6 gram) from the Arabic corpus is presented, which extractswords collocations by matching input structured pattern of the Arabic language versus the Part of Speech Tagging (POST) for the Arabiccorpus. The system is useful for extracting different kinds of sequences of words and phrases.The prototype is beneficial for linguistic research as shown in different scenarios for the experiments conducted.
The main objective of this thesis is to introduce a new Semantic-based Arabic Information Retrieval System (SAIRS) to improve Arabic text retrieval. Due to the complexity aspect and limited resources of the Arabic language, the proposed approach has three main contributions. First, the query is expanded using n-gram term collocations, which are automatically mined from the Arabic corpus; therefore there is no need for external semantic resource. Second, the query is expanded using Arabic domain ontology, which wasdesigned and represented manually by the Web Ontology Language (OWL).Third, the system index is constructed using the corpus words, and hence the cost and effort of the stemming process are saved.The Vector Space Model (VSM) has been employed to represent both documents and user queries. The experimental evaluation has been conducted on the scripts of the Arabic Holy Quran.
The main two sub-objectivesfor this thesis are: first,extracts tagged n-gram collocations (from 2- 6 gram) from the Arabic corpus is presented, which extractswords collocations by matching input structured pattern of the Arabic language versus the Part of Speech Tagging (POST) for the Arabiccorpus. The system is useful for extracting different kinds of sequences of words and phrases.The prototype is beneficial for linguistic research as shown in different scenarios for the experiments conducted.
Other data
| Title | Developing Semantic-based System for Arabic Information Retrieval | Other Titles | تطوير نظام دلالي لاسترجاع المعلومات باللغة العربية | Authors | Wasim Ahmed Abdul-Aziz Alromima | Issue Date | 2016 |
Attached Files
| File | Size | Format | |
|---|---|---|---|
| G14107.pdf | 589.66 kB | Adobe PDF | View/Open |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.