Enhancing text clustering performance using semantic similarity

Gad, Walaa; Kamel, M.S.

Enhancing text clustering performance using semantic similarity

Gad, Walaa; Kamel, M.S.;

Abstract

Text documents clustering can be challenging due to complex linguistics properties of the text documents. Most of clustering techniques are based on traditional bag of words to represent the documents. In such document representation, ambiguity, synonymy and semantic similarities may not be captured using traditional text mining techniques that are based on words and/or phrases frequencies in the text. In this paper, we propose a semantic similarity based model to capture the semantic of the text. The proposed model in conjunction with lexical ontology solves the synonyms and hypernyms problems. It utilizes WordNet as an ontology and uses the adapted Lesk algorithm to examine and extract the relationships between terms. The proposed model reflects the relationships by the semantic weighs added to the term frequency weight to represent the semantic similarity between terms. Experiments using the proposed semantic similarity based model in text clustering are conducted. The obtained results show promising performance improvements compared to the traditional vector space model as well as other existing methods that include semantic similarity measures in text clustering. © 2009 Springer Berlin Heidelberg.

Other data

Title	Enhancing text clustering performance using semantic similarity
Authors	Gad, Walaa ; Kamel, M.S.
Issue Date	2009
Journal	Lecture Notes in Business Information Processing
ISBN	9783642013461
DOI	325 http://www.scopus.com/inward/record.url?eid=2-s2.0-65949123813&partnerID=MN8TOARS 24 LNBIP 10.1007/978-3-642-01347-8_28
Scopus ID	2-s2.0-65949123813

Recommend this item

Similar Items from Core Recommender Database

Google Scholar^TM

Check

Citations 9 in scopus

views 27 in Shams Scholar

Enhancing text clustering performance using semantic similarity

Gad, Walaa; Kamel, M.S.;

Abstract

Other data

Google ScholarTM

Google Scholar^TM