Performance Evaluation of Threshold-Based and k-means Clustering Algorithms Using Iris Dataset

Author(s): Mamta Mittal*, Rajendra Kumar Sharma, Varinder Pal Singh.

Journal Name: Recent Patents on Engineering

Volume 13 , Issue 2 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Clustering is one of the data mining tools which classify the raw data reasonably into disjoint clusters. Researchers have developed many algorithms to cluster large data sets based on specific parameters.

Objective: This study is centered around the popular partitioning-based technique, i.e., k-means. It requires the number of clusters to be generated as an input parameter; it does not provide a global solution of the problem; and it is sensitive to outliers and initial seed selection.

Methods: In this paper, authors have discussed threshold-based clustering method, single pass method, which overcomes the above limitations but it requires a threshold value as an input parameter. Other researchers’ work related to k-means published in patent form is noteworthy and paving path for the researchers.

Results: To assess the quality of clustering, numerous validity measures and indices have been assessed on the Iris dataset for both k-means and threshold-based clustering algorithms. It has been observed from the experiments that threshold-based method generates more separated and compact clusters, in addition, there is significant improvement in the validity indices.

Conclusion: Threshold-based clustering generates the clusters automatically which are not sensitive to initial seeds selection and outlier; it is more scalable. It will inevitably be an efficient approach of partitioning based clustering whenever one will select the threshold value carefully or will propose new functions for deciding the value of threshold.

Keywords: Clustering, k-means, threshold-based clustering, validity indices, validity measures, partitioning-based technique.

[1]
J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques.. 3rd ed San Francisco, USA: Morgan Kaufmann Publishers, 2006
[2]
K. Jain, M.N. Murty, and P.J. Flynn, "“Data clustering: a review”, ACM", Comput. Surv.(CSUR),, vol. 31, pp. 264-323, 1999.
[3]
S. Lloyd, "Least squares quantization in PCM", IEEE Trans. Inf. Theory, vol. 28, pp. 129-137, 1982.
[4]
J.B. MacQueen, "Some methods for classification and analysis of multivariate observations", In Fifth Symposium on Mathematical Statistics and ProbabilityBerkley, . 1967, pp. 281-297
[5]
D. Arthur, and S. Vassilvitskii, "k-means++: the advantage of careful seeding", In Eighteenth Symposium on Discrete Analysis New Orleans, Louisiana 2007, pp. 1027-1035.
[6]
A.M. Fahim, A.M. Salem, F.A. Tokey, and M. Ramadan, "An Efficient enhanced k-means clustering algorithm", J. Zhej. Univ. Sc. A, vol. 7, pp. 1626-1633, 2006.
[7]
K. Jain, "Data clustering: 50 years beyond k-means", Pattern Recognit. Lett., vol. 31, pp. 651-666, 2010.
[8]
E. Murat, C. Nazif, and S. Sadullah, "A new algorithm for initial cluster centres in k-means algorithm", Pattern Recognit. Lett., vol. 32, pp. 1701-1705, 2011.
[9]
D. Reddya, and K.J. Prasanta, "Initialization for k-means clustering using voronoi diagram", Proc. Tech., vol. 4, pp. 395-400, 2012.
[10]
I. Melnykov, and V. Melnykov, "On k-means algorithm with the use of mahalanobis distances", Stat. Probab. Lett., vol. 84, pp. 88-95, 2014.
[11]
G. Salton, The SMART retrieval system., Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1971.
[12]
G. Salton, and A. Wong, "Generation and search of clustered files", ACM Trans. Database Syst., vol. 3, pp. 321-346, 1978.
[13]
M. Mittal, V.P. Singh, and R.K. Sharma, Random automatic detection of clusters.In Image Information Processing., ICIIP: Shimla, India, 2011, pp. 1-6.
[14]
M. Mittal, V.P. Singh, and R.K. Sharma, "Validation of k-means and threshold based clustering methods", Int. J. Adv. Technol., vol. 5, pp. 153-160, 2014.
[15]
M. Mittal, V.P. Singh, and R.K. Sharma, "Modified single pass clustering with variable threshold approach", Int. J. Innov. Comput., Inf. Control, vol. 11, pp. 375-386, 2015.
[16]
U. Chaudhari, J. Navratil, and G. Ramaswamy, Efficient recursive clustering based on a splitting function derived from successive eigen-decompositions. U.S. Patent 20,030,158,853, 2003.
[17]
C. Ordonez, K-means clustering using structured query language (SQL) statements and sufficient statistics. U.S. Patent 7,359,913, 2008
[18]
M. Halkidi, M. Vazirgiannis, and I. Batistakis, "On clustering validation techniques", J. Intell. Inf. Syst., vol. 17, pp. 107-114, 2001.
[19]
C. Dunn, "Well separated clusters and optimal fuzzy partitions", J. Cyb., vol. 4, pp. 95-104, 1974.
[20]
L. Davies, and D.W. Bouldin, "A cluster separation measure", IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, pp. 224-227, 1979.
[21]
"Iris data set: Available at:", http://archive.ics.uci.edu/ml/datasets/iris


Rights & PermissionsPrintExport Cite as


Article Details

VOLUME: 13
ISSUE: 2
Year: 2019
Page: [131 - 135]
Pages: 5
DOI: 10.2174/1872212112666180510153006
Price: $58

Article Metrics

PDF: 32
HTML: 4
EPUB: 2
PRC: 2