New semantic similarity based model for text clustering using extended gloss overlaps

Gad, Walaa; Kamel, M.S.;

Abstract


Most text clustering techniques are based on words and/or phrases weights in the text. Such representation is often unsatisfactory because it ignores the relationships between terms, and considers them as independent features. In this paper, a new semantic similarity based model (SSBM) is proposed. The semantic similarity based model computes semantic similarities by utilizing WordNet as an ontology. The proposed model captures the semantic similarities between documents that contain semantically similar terms but unnecessarily syntactically identical. The semantic similarity based model assigns a new weight to document terms reflecting the semantic relationships between terms that co-occur literally in the document. Our model in conjunction with the extended gloss overlaps measure and the adapted Lesk algorithm solves ambiguity, synonymy problems that are not detected using traditional term frequency based text mining techniques. The proposed model is evaluated on the Reuters-21578 and the 20-Newsgroups text collections datasets. The performance is assessed in terms of the Fmeasure, Purity and Entropy quality measures. The obtained results show promising performance improvements compared to the traditional term based vector space model (VSM) as well as other existing methods that include semantic similarity measures in text clustering. © 2009 Springer Berlin Heidelberg.


Other data

Title New semantic similarity based model for text clustering using extended gloss overlaps
Authors Gad, Walaa ; Kamel, M.S. 
Issue Date 2009
Journal Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 
ISBN 3642030696
DOI 663
http://www.scopus.com/inward/record.url?eid=2-s2.0-70350225972&partnerID=MN8TOARS
5632 LNAI
10.1007/978-3-642-03070-3_50
Scopus ID 2-s2.0-70350225972

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

Citations 16 in scopus
views 22 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.