Title:Identification of Robust Clustering Methods in Gene Expression Data Analysis
VOLUME: 12 ISSUE: 6
Author(s):Md. Bipul Hossen* and Md. Siraj-Ud-Doulah
Affiliation:Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur-5400, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur-5400
Keywords:Agglomerative hierarchical clustering, corrected rand index, microarray gene expressions data, outlier, proximity
measures.
Abstract:Background: Cluster analysis techniques of gene expression microarray data is of increasing
interest in the field of current bioinformatics. One of the reasons for this is the need for molecular-based
refinement of broadly defined biological classes, with implications in cancer diagnosis, prognosis and
treatment. And many algorithms have been developed for this problem.
Objective: However microarray data frequently include outliers, and how to treat these outlier's effects
in the subsequent analysis-clustering.
Method: In this paper, we present the large-scale analysis of seven different agglomerative hierarchical
clustering methods and five proximity measures for the analysis of 33 cancer gene expression datasets.
As a case study, we used two experimental datasets: Affymetrix and cDNA, and different percent
outliers were artificially added to these datasets.
Results: We found that ward method gives the highest corrected Rand index value with respect to the
spearman proximity measures when datasets contain with and without outliers.
Conclusion: This study proves that ward method is more robust clustering methods in gene expression
data analysis among other methods.