A Sentiment Analysis Approach for Arabic Texts

Radwa Moustafa Kamal Saeed;

Abstract


Nowadays, individuals express their experiences and opinions through online reviews. These reviews influence online marketing and provide a guide for potential customers allowing them to reach real knowledge about products/services while making decisions. Sentiment analysis is the process of analyzing opinions expressed in textual reviews automatically. The efficiency of this process is affected by the spammed opinion information, and by the set of representative features extracted from the reviews. Prior spam detection researches and most sentiment classification studies integrating dimensionality reduction have focused on English texts, with less attention to other languages, including Arabic. Huge amounts of Arabic data have been generated due to the huge population of Arab world; and despite that, the aforementioned technical gaps still exist for such language.
In this thesis, a supervised learning approach for Arabic reviews’ sentiment classification is proposed. This approach utilizes optimal compact features that depend on a well representative feature set coupled with feature reduction technique, which provides high accuracy and time/space savings. The employed feature set includes a triple combination of N-gram features and positive/negative N-grams counts features obtained after negation handling. Two different linear transformation methods are studied; Principal Component Analysis (PCA) as an unsupervised method and Latent Dirichlet Allocation (LDA) as a supervised method. Spam detection is also employed as a prior process to the classification to increase its robustness. Four different Arabic spam reviews detection methods are proposed while putting more focus towards the construction and evaluation of ensemble approaches, which integrate rule-based classification and machine learning techniques, and with the use of content-based features that depend on N-gram features and negation handling.
The proposed Arabic sentiment classification approach and Arabic spam reviews detection methods have been assessed by conducting several experiments. The sentiment classification approach has been evaluated on five Arabic opinion text datasets, of different domains and with varying sizes (1.6K up to 94K reviews). The approach has been experimented for classifying sentiments in two (positive/negative) and three (positive/negative/ neutral) class problems. Accuracy values for the feature reduction-based sentiment analysis approach occurred in the range 95.5−99.8% for 2-class problem and 92−97.3% for 3-class problem and outperformed existing related works by far of 23% for accuracy. The LDA feature reduction outperformed PCA by an average of 4.34% in accuracy. The results also demonstrated significant improvement with 24% increase in accuracy, 93% savings in the feature space, and 97% decrease in the classification execution time. The four spam reviews detection methods have been evaluated on two Arabic opinion text datasets of different sizes (1.6K and 94K reviews). The results indicated the efficiency of the ensemble method, where it achieved accuracy values of 95.25% and 99.98% for the two experimented datasets and outperformed existing related works by far of 25% for accuracy.


Other data

Title A Sentiment Analysis Approach for Arabic Texts
Other Titles نهج تحليل الرأي للنصوص العربية
Authors Radwa Moustafa Kamal Saeed
Issue Date 2020

Attached Files

File SizeFormat
BB7260.pdf1.45 MBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 2 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.