hiCLUMP: A hybrid Implementation of the CLUMP Algorithm for Clustering Microarrays Data
khalifa, mohamed essam; Dina Elsayad; Amal Khalifa; El-Sayed M. El-Horbaty;
Abstract
Microarrays technology allows us to measure the expression
level of hundreds of thousands of genes simultaneously. The
microarrays data analysis process involves various heavy
computational tasks such as clustering. The clustering can be
defined as partitioning a dataset into groups where objects in
the same group are similar in somehow. CLUMP (clustering
through MST in parallel) is one of the minimum spanning tree
(MST) -based clustering techniques. It employed a parallel
approach to reduce the MST construction time. An enhanced
version of CLUMP (iCLUMP) was proposed to further
improve the MST construction phase using cover tree data
structure. Despite that modification, the MST construction
phase is still a bottleneck since it is a time consuming task.
Both CLUMP and iCLUMP are based on a distributed parallel
computing model. Therefore, the objective of this paper is to
study a different approach of enhancement using a hybrid
parallel model. The proposed algorithm; hiCLUMP (hybrid
CLUMP), considers utilizing multithreading on some of the
distributed partitions suggested by the CLUMP algorithm.
The experimental results on six different microarrays datasets
show that the load balancing strategy used in hiCLUMP
succeeded to decrease the MST construction in a range
between 8% and 17% on 36 processing node. Moreover, the
results showed that the hiCLUMP could not outperform the
iCLUMP emphasizing that using another data structure is
more effective than increasing the processing power of the
underlying parallel machine.
level of hundreds of thousands of genes simultaneously. The
microarrays data analysis process involves various heavy
computational tasks such as clustering. The clustering can be
defined as partitioning a dataset into groups where objects in
the same group are similar in somehow. CLUMP (clustering
through MST in parallel) is one of the minimum spanning tree
(MST) -based clustering techniques. It employed a parallel
approach to reduce the MST construction time. An enhanced
version of CLUMP (iCLUMP) was proposed to further
improve the MST construction phase using cover tree data
structure. Despite that modification, the MST construction
phase is still a bottleneck since it is a time consuming task.
Both CLUMP and iCLUMP are based on a distributed parallel
computing model. Therefore, the objective of this paper is to
study a different approach of enhancement using a hybrid
parallel model. The proposed algorithm; hiCLUMP (hybrid
CLUMP), considers utilizing multithreading on some of the
distributed partitions suggested by the CLUMP algorithm.
The experimental results on six different microarrays datasets
show that the load balancing strategy used in hiCLUMP
succeeded to decrease the MST construction in a range
between 8% and 17% on 36 processing node. Moreover, the
results showed that the hiCLUMP could not outperform the
iCLUMP emphasizing that using another data structure is
more effective than increasing the processing power of the
underlying parallel machine.
Other data
| Title | hiCLUMP: A hybrid Implementation of the CLUMP Algorithm for Clustering Microarrays Data | Authors | khalifa, mohamed essam ; Dina Elsayad; Amal Khalifa; El-Sayed M. El-Horbaty | Keywords | Clustering;Microarrays;Minimum spanning tree;Parallel | Issue Date | Jul-2013 | Publisher | WARSE | Journal | International Journal of Bio-Medical Informatics and e-Health | Volume | 1 |
Attached Files
| File | Description | Size | Format | Existing users please Login |
|---|---|---|---|---|
| A hybrid Implementation of the CLUMP Algorithm for Clustering.pdf | 962.09 kB | Adobe PDF | Request a copy |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.