Enhancing Information Retrieval through Dependency Modeling

Doaa Mabrouk Abd El-Fatah Mabrouk;

Abstract


In every field in our life, there are many problems especially in the field of computer. These problems increased due to the rapid spread of the internet. Today, the most important field in our life is information retrieval and the search to convey user’s need. With the growth of using the internet and available information on the web, Information Retrieval “IR” became a fact of life for users. The internet is providing the user with vast knowledge and information in different domains. The major research areas include biology, chemistry, commerce, tourism, earth, education, mathematics, physics, economics, agriculture, and information and computer sciences.

In this thesis, the following problems are introduced: Term dependency, especially, that some of the mathematical models assume terms are independent. One of these models is Vector Space Model “VSM”, while others, assume that terms are dependent such as Markov Random Field “MRF”, Unigram and Bigram models. Term weighting is a core behind mathematical retrieval modeling which is important in document ranking. There are some methods such as Term Frequency Inverse Document Frequency “TF*IDF”, Information Gain Ratio “IGR”, Confidence weight “Conf.Weight” and weighted clustering.

The proposed algorithm of the power sets a theory to discover all the combinations between words in documents. Moreover, the judgement of the results uses accuracy measurements by Subsumptions Rule-Based Classifiers “SRBC” through two ways (Maximum-Number –Term Dependency Identification “Max-No-TDI” and Maximum-Feature Count “Max-FC”).

This thesis introduces a survey of mathematical information retrieval systems’ using dependency modeling and term weighting. The enhancement of dependency modeling is through performance, effectiveness and efficiency in addition to term weighting which considers another factor that affects the result. It also contains the power set theory to discover Term Dependency Identification “TDI” between words in Text Classification “TC” and measure accuracy of all generated random experiments. The result is Max-No-TDI which is better than Max-Fc with 96% accuracy level.


Other data

Title Enhancing Information Retrieval through Dependency Modeling
Other Titles تطوير نظم استرجاع المعلومات من خلال نماذج التبعية
Authors Doaa Mabrouk Abd El-Fatah Mabrouk
Issue Date 2019

Attached Files

File SizeFormat
cc1031.pdf373.07 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 3 in Shams Scholar
downloads 4 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.