Vol 7, No 4 (2016) > Electrical, Electronics and Computer Engineering >

Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm

K. Aparna, Mydhili K. Nair

 

Abstract: Data clustering is one
of the major areas in data mining.  The
bisecting clustering algorithm is one of the most widely used for high
dimensional dataset.  But its performance
degrades as the dimensionality increases. 
Also, the task of selection of a cluster for further bisection is a
challenging one.  To overcome these
drawbacks, we developed a novel partitional clustering algorithm called a HB-K-Means algorithm (High dimensional Bisecting
K-Means).  In order to improve the
performance of this algorithm, we incorporate two constraints, such
as a stability-based
measure and a Mean Square Error (MSE) resulting in CHB-K-Means
(Constraint-based
High dimensional Bisecting K-Means) algorithm.  
The CHB-K-Means algorithm generates two initial partitions.  Subsequently, it calculates the stability and
MSE for each partition generated. 
Inference techniques are applied on the stability and MSE values of the
two partitions to select the next partition for the re-clustering process.  This process is repeated until K number of clusters
is obtained.  From the experimental
analysis, we infer that an average clustering accuracy of 75% has been
achieved.  The comparative analysis of
the proposed approach with the other traditional algorithms shows an
achievement of a higher clustering accuracy rate and an increase in
computation time.
Keywords: Bisecting K-Means; Constraints; High dimensionality; Mean Square Error (MSE); Partitional clustering; Stability

Full PDF Download

References


Aparna, K., Nair, M.K., 2015a. Comprehensive Study and Analysis of Partitional Data Clustering Techniques. International Journal of Business Analytics, Volume 2(1), pp. 23–38

Aparna, K., Nair, M.K., 2015b. HB-K Means: An Algorithm for High Dimensional Data Clustering using Bisecting K-Means. International Journal of Applied Engineering Research (IJAER), Volume 10(14), pp. 34945–34951

Behera, H.S., Lingdoh, R.B., Kodamasingh, D., 2011. An Improved Hybridized K-Means Clustering Algorithm (IHKMCA) for High dimensional Dataset & Its Performance Analysis. International Journal on Computer Science and Engineering (IJCSE), Volume 3(3), pp. 1183–1190

Bouguessa, M., Wang, S., 2008. Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering, Volume 21(4), pp. 507–522

Dash, R., Mishra, D., Rath, A.K., Acharya, M., 2009. A Hybridized K-means Clustering Approach for High Dimensional Dataset. International Journal of Engineering, Science Technology, Volume 2(2), pp. 59–66

Ding, C., He, X., 2002. Cluster Merging and Splitting in Hierarchical Clustering Algorithms. In: Proceedings of the IEEE International Conference on Data Mining, pp. 139–146

Domeniconi, C., Ma, S., 2004. Subspace Clustering of High Dimensional Data. In: Proceedings of International Conference on Data Mining, pp. 517–521

Gu, J.W.F., Feng, W., Zeng, J., Mamitsuka, H., 2013. Efficient Semi-supervised MEDLINE Document Clustering with MeSH-Semantic and Global-Content Constraints. IEEE Transactions on Cybernetics, Volume 43(4), pp. 1265–1276

Liu, X., Xie, X., Wang, W., 2009. A Projection Clustering Technique based on Projection. Journal of Service Science & Management, Volume 2, pp. 362–367

McCallum, A., Kamal, N., Ungar, L. H., 2000. Efficient Clustering of High-dimensional Data Sets with Application to Reference Matching. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.169–178

Napoleon, D., Pavalakodi, S., 2011. New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set. International Journal of Computer Applications, Volume 13(7), pp. 41–46

Prasanna, K.M., Kumar, S.P., Narayana, G.S., 2011. A Novel Benchmark K-Means Clustering on Continuous Data. International Journal on Computer Science and Engineering (IJCSE), Volume 3(8), pp. 2974–2977

Savaresi, S.M., Boley, D.L., 2001. On the Performance of Bisecting K-means and PDDP. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 1–14

Sculley, D., 2010. Web-scale K-Means Clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178

Valarmathie, P., Srinath, M.V., Dinakaran, K., 2009. An Increased Performance of Clustering High Dimensional Data through Dimensionality Reduction Technique. Journal of Theoretical and Applied Information Technology, pp. 731–733

Wagsta, K., Cardie, C., Rogers, S., Schroedgl, S., 2001. Constrained K-means Clustering with Background Knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584

Wu, B., Zhang, Y., Hu, B-G., Ji, Q., 2013. Constrained Clustering and Its

Application to Face Clustering in Videos. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3507–3514

Yip, K.Y., Cheung, D.W., Ng, M.K., 2004. HARP: A Practical Projected Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, Volume 16(11), pp.1387–1397