A Novel Approach for Parallel Document Clustering Using an Enhanced Parallel WAND Algorithm
Ali, Wael; Khamis, Soheir; Zakaria, Wael;
Abstract
Document clustering is crucial for managing a large textual data available on the Internet, even though it is computationally costly to cluster high dimensional and large datasets. To tackle these obstacles, a widely used information retrieval method called the weighted AND (WAND) algorithm is utilized as an essential stage in a document clustering process to make it more effective. WAND uses an efficient data structure known as an inverted index to determine document scores and ranks, allowing it to extract the topK documents that are most similar to a given query. Owing to its effectiveness, several versions of parallel algorithms have been proposed to enhance it. However, challenges in document clustering increase since it requires retrieving a higher number of topK and processing longer queries. So, in this paper, an enhanced parallel version of the WAND algorithm (PWAND) is proposed. PWAND divides the inverted index into partitions, each is assigned a specific percentage of topK according to its relevance to the given query. Furthermore, a novel PWAND-based Parallel PArtitional Clustering (PWPPAC) approach that combines the parallel execution of clustering stages with PWAND is proposed. Based on the practical results across a variety of datasets, PWAND is a promising method since it produces results that are extremely match to those obtained by WAND, but with a significant speedup, where the maximum recorded speedup is 85.7x on AG-News dataset. Moreover, the results show that employing the PWAND algorithm in the clustering process makes it more efficient, while maintaining accuracy and quality of clustering.
Other data
| Title | A Novel Approach for Parallel Document Clustering Using an Enhanced Parallel WAND Algorithm | Authors | Ali, Wael; Khamis, Soheir; Zakaria, Wael | Keywords | Data Mining | Issue Date | 2024 | Journal | International Journal of Intelligent Engineering and Systems | Volume | 17 | Issue | 6 | Start page | 1083 | End page | 1097 | ISSN | 21853118 | DOI | 10.22266/ijies2024.1231.80 |
Attached Files
| File | Description | Size | Format | Existing users please Login |
|---|---|---|---|---|
| Wael Ali paper.pdf | 1.63 MB | Adobe PDF | Request a copy |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.