### Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm

*K. Aparna, Mydhili K. Nair*

**Abstract**: Data clustering is one

of the major areas in data mining. The

bisecting clustering algorithm is one of the most widely used for high

dimensional dataset. But its performance

degrades as the dimensionality increases.

Also, the task of selection of a cluster for further bisection is a

challenging one. To overcome these

drawbacks, we developed a novel partitional clustering algorithm called a HB-K-Means algorithm (High dimensional Bisecting

K-Means). In order to improve the

performance of this algorithm, we incorporate two constraints, such

as a stability-based

measure and a Mean Square Error (MSE) resulting in CHB-K-Means

(Constraint-based

High dimensional Bisecting K-Means) algorithm.

The CHB-K-Means algorithm generates two initial partitions. Subsequently, it calculates the stability and

MSE for each partition generated.

Inference techniques are applied on the stability and MSE values of the

two partitions to select the next partition for the re-clustering process. This process is repeated until K number of clusters

is obtained. From the experimental

analysis, we infer that an average clustering accuracy of 75% has been

achieved. The comparative analysis of

the proposed approach with the other traditional algorithms shows an

achievement of a higher clustering accuracy rate and an increase in

computation time.

**Keywords**: Bisecting K-Means; Constraints; High dimensionality; Mean Square Error (MSE); Partitional clustering; Stability

Full PDF Download

#### References

Aparna, K., Nair, M.K., 2015a. Comprehensive Study and Analysis of Partitional Data Clustering Techniques. International Journal of Business Analytics, Volume 2(1), pp. 23–38

Aparna, K., Nair, M.K., 2015b. HB-K Means: An Algorithm for High Dimensional Data Clustering using Bisecting K-Means. International Journal of Applied Engineering Research (IJAER), Volume 10(14), pp. 34945–34951

Behera, H.S., Lingdoh, R.B., Kodamasingh, D., 2011. An Improved Hybridized K-Means Clustering Algorithm (IHKMCA) for High dimensional Dataset & Its Performance Analysis. International Journal on Computer Science and Engineering (IJCSE), Volume 3(3), pp. 1183–1190

Bouguessa, M., Wang, S., 2008. Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering, Volume 21(4), pp. 507–522

Dash, R., Mishra, D., Rath, A.K., Acharya, M., 2009. A Hybridized K-means Clustering Approach for High Dimensional Dataset. International Journal of Engineering, Science Technology, Volume 2(2), pp. 59–66

Ding, C., He, X., 2002. Cluster Merging and Splitting in Hierarchical Clustering Algorithms. In: Proceedings of the IEEE International Conference on Data Mining, pp. 139–146

Domeniconi, C., Ma, S., 2004. Subspace Clustering of High Dimensional Data. In: Proceedings of International Conference on Data Mining, pp. 517–521

Gu, J.W.F., Feng, W., Zeng, J., Mamitsuka, H., 2013. Efficient Semi-supervised MEDLINE Document Clustering with MeSH-Semantic and Global-Content Constraints. IEEE Transactions on Cybernetics, Volume 43(4), pp. 1265–1276

Liu, X., Xie, X., Wang, W., 2009. A Projection Clustering Technique based on Projection. Journal of Service Science & Management, Volume 2, pp. 362–367

McCallum, A., Kamal, N., Ungar, L. H., 2000. Efﬁcient Clustering of High-dimensional Data Sets with Application to Reference Matching. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.169–178

Napoleon, D., Pavalakodi, S., 2011. New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set. International Journal of Computer Applications, Volume 13(7), pp. 41–46

Prasanna, K.M., Kumar, S.P., Narayana, G.S., 2011. A Novel Benchmark K-Means Clustering on Continuous Data. International Journal on Computer Science and Engineering (IJCSE), Volume 3(8), pp. 2974–2977

Savaresi, S.M., Boley, D.L., 2001. On the Performance of Bisecting K-means and PDDP. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 1–14

Sculley, D., 2010. Web-scale K-Means Clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178

Valarmathie, P., Srinath, M.V., Dinakaran, K., 2009. An Increased Performance of Clustering High Dimensional Data through Dimensionality Reduction Technique. Journal of Theoretical and Applied Information Technology, pp. 731–733

Wagsta, K., Cardie, C., Rogers, S., Schroedgl, S., 2001. Constrained K-means Clustering with Background Knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584

Wu, B., Zhang, Y., Hu, B-G., Ji, Q., 2013. Constrained Clustering and Its

Application to Face Clustering in Videos. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3507–3514

Yip, K.Y., Cheung, D.W., Ng, M.K., 2004. HARP: A Practical Projected Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, Volume 16(11), pp.1387–1397