K-Means Performance Optimization Using Rank Order Centroid (ROC) And Braycurtis Distance
DOI:
10.33395/sinkron.v7i2.11371Keywords:
Clustering, K-Mans, Rank Order Centroid , Braycurtis Distance, Silhouette CoefficientAbstract
K-Means is a clustering algorithm that groups data based on similarities between data. Some of the problems that arise from this algorithm are when determining the center point of the cluster randomly. This will certainly affect the final result of a clustering process. To anticipate the poor accuracy value, a process is needed to determine the initial centroid in the initialization process. The second problem is when calculating the Euclidean distance on the distance between data. However, this method only gives the same impact on each data attribute. From some of these problems, this study proposes the Rank Order Centroid (ROC) method for initializing the cluster center point and using the Braycurtis distance method to calculate the distance between data. With the experiment K=2 to K=10, the results obtained in this study are the proposed method obtains an iteration reduction of 6.6% on the Student Performance Exams dataset and 19.3% on the Body Fat Prediction dataset. However, there was an increase in iterations on the Heart Failure dataset by 24.2%. In testing the cluster results using the Silhouette Coefficient, this method shows an increase in the evaluation value of 5.9% in the Student Performance Exams dataset. However, the evaluation value decreased by 8.3% in the Body Fat Prediction dataset and 3.3% in the Heart Failure dataset.
Downloads
References
Ahn, B. S. (2011). Compatible weighting method with rank order centroid: Maximum entropy ordered weighted averaging approach. European Journal of Operational Research, 212(3), 552–559.
Alamri, S. S. A., Bin-Sama, A. S. A., & Bin-Habtoor, A. S. Y. (2016). Satellite image classification by using distance metric. International Journal of Computer Science And Information Security.
Bramer, M. (2007). Principles of data mining (Vol. 180). Springer.
Capó, M., Pérez, A., & Lozano, J. A. (2017). An efficient approximation to the K-means clustering for massive data. Knowledge-Based Systems, 117, 56–69.
Faisal, M., & Zamzami, E. M. (2020). Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance. Journal of Physics: Conference Series, 1566(1), 012112.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Kumar, J., & Vashistha, R. (2017). Estimation of inter-centroid distance quality in data clustering problem using hybridized K-means algorithm. 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–7.
Mamat, A. R., Mohamed, F. S., Mohamed, M. A., Rawi, N. M., & Awang, M. I. (2018). Silhouette index for determining optimal k-means clustering on images in different color models. International Journal of Engineering and Technology, 7(2.14), 105–109.
Nawrin, S., Rahman, M. R., & Akhter, S. (2017). Exploreing k-means with internal validity indexes for data clustering in traffic management system. International Journal of Advanced Computer Science and Applications, 8(3), 264–272.
Pulungan, A. F., Zarlis, M., & Suwilo, S. (2020). Performance Analysis of Distance Measures in K-Nearest Neighbor.
Rahim, M. S., & Ahmed, T. (2017). An initial centroid selection method based on radial and angular coordinates for K-means algorithm. 2017 20th International Conference of Computer and Information Technology (ICCIT), 1–6.
Retno, S. (2019). Peningkatan Akurasi Algoritma K-Means dengan Clustering Purity Sebagai Titik Pusat Cluster Awal (Centroid).
Selvida, D. (2019). Analisis Klasifikasi Data dengan Kombinasi Metode K-Means dan Rapid Centroid Estimation (RCE).
Sitompul, B. J. D., Sitompul, O. S., & Sihombing, P. (2019). Enhancement clustering evaluation result of davies-bouldin index with determining initial centroid of k-means algorithm. Journal of Physics: Conference Series, 1235(1), 012015.
Syakur, M. A., Khotimah, B. K., Rochman, E. M. S., & Satoto, B. D. (2018). Integration k-means clustering method and elbow method for identification of the best customer profile cluster. IOP Conference Series: Materials Science and Engineering, 336(1), 012017.
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Data mining introduction. Bei Jing: The people post and Telecommunications Press.
Thakur, N., Mehrotra, D., Bansal, A., & Bala, M. (2019). Analysis and Implementation of the Bray–Curtis Distance-Based Similarity Measure for Retrieving Information from the Medical Repository. International Conference on Innovative Computing and Communications, 117–125.
Vashistha, R., & Nagar, S. (2017). An intelligent system for clustering using hybridization of distance function in learning vector quantization algorithm. 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–7.
Wang, X., & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Materials Science and Engineering, 569(5), 052024.
Waruwu, F. T., & Mesran, M. (2021). Comparative Analysis of Ranking Methods of WASPAS+ ROC with Preference Selection Index (PSI) in Determining the Performance of Young Lecturers. IJISTECH (International Journal of Information System & Technology), 5(2), 207–214.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2022 Hafiz Irwandi, Opim Salim Sitompul, Sutarman Sutarman
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.