Opinion Mining Using Machine Learning Techniques
Donia GamalEldin Nazim Sayed;
Abstract
Opinion Mining (OM) has lately become one of the increasing areas of research identified with text mining and Natural Language Processing (NLP). OM, also known as Sentiment Analysis (SA), is the approach toward analyzing textual data and classifying it according to its sentiment. SA has a huge variety of applications in various business. The evolution of Social Media (SM) based applications has generated a big amount of personalized reviews of different related information on the Web in the form of tweets, status updates, and many others. However, the large and several kinds of researches identified with this topic focus essentially on English texts with limited, finite tools and resources accessible for miscellaneous languages like Arabic.
The lack of existing researches to Arabic OM compared to English OM caused by the unique nature and difficulty of the Arabic language. An Arabic benchmark dataset is proposed in this thesis for OM showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects. These opinions are normalized and labeled into two balanced classes, namely, positive and negative. Besides the construction of the Arabic dataset, the preprocessing of the collected data is explored in detail.
The steps associated with data preprocessing are removing all noisy data in tweets such as hashtags, profile pictures, retweets, emoticons, user-names, user mentions, and URLs. The second step is tokenization, removing non-Arabic letters, removing diacritics, and normalizing Arabic analogous letters such as ‘أ’ to be ‘ا’ to decrease uncertainty and ambiguity. Then stop words are evacuate
The lack of existing researches to Arabic OM compared to English OM caused by the unique nature and difficulty of the Arabic language. An Arabic benchmark dataset is proposed in this thesis for OM showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects. These opinions are normalized and labeled into two balanced classes, namely, positive and negative. Besides the construction of the Arabic dataset, the preprocessing of the collected data is explored in detail.
The steps associated with data preprocessing are removing all noisy data in tweets such as hashtags, profile pictures, retweets, emoticons, user-names, user mentions, and URLs. The second step is tokenization, removing non-Arabic letters, removing diacritics, and normalizing Arabic analogous letters such as ‘أ’ to be ‘ا’ to decrease uncertainty and ambiguity. Then stop words are evacuate
Other data
| Title | Opinion Mining Using Machine Learning Techniques | Other Titles | استخدام أساليب تعلم الآله في تنقيب بيانات أراء المستخدمين | Authors | Donia GamalEldin Nazim Sayed | Issue Date | 2018 |
Recommend this item
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.