Efficient email classification approach based on semantic methods

Bahgat, Eman M.; Rady, Sherine; Gad, Walaa; Moawad, Ibrahim F.;

Abstract


Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features’ set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works.


Other data

Title Efficient email classification approach based on semantic methods
Authors Bahgat, Eman M.; Rady, Sherine ; Gad, Walaa ; Moawad, Ibrahim F.
Keywords Email classification;WordNet ontology;Spam;Semantic similarity;Features reduction
Issue Date 1-Dec-2018
Publisher ELSEVIER SCIENCE BV
Journal Ain Shams Engineering Journal 
Volume 9
Issue 4
Start page 3259
End page 3269
ISSN 20904479
DOI 10.1016/j.asej.2018.06.001
Scopus ID 2-s2.0-85056622559
Web of science ID WOS:000454548400258

Attached Files

File Description SizeFormat Existing users please Login
ASEJ.2018_Eman.Bahgat-et-al.pdf2.13 MBAdobe PDF    Request a copy
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

Citations 39 in scopus


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.