Determining The Optimal Number of K-Means Clusters Using The Calinski Harabasz Index and Krzanowski and Lai Index Methods for Groupsing Flood Prone Areas In North Sumatra

Authors

  • Ziana Syahputri North Sumatra State Islamic University
  • Sutarman University of North Sumatra
  • Machrani Adi Putri Siregar State Islamic University of North Sumatra

DOI:

10.33395/sinkron.v9i1.13246

Keywords:

Cluster, K-Means, CH Index, KL Index, Cluster Tightness Measure (CTM), Flood

Abstract

The k-means algorithm is a partitional clustering method. K-Means has several advantages, including being easy to implement, having a high level of convergence and producing denser clusters. Meanwhile, the drawback is that it is difficult to determine the optimal number of clusters. The K-Means method will be used to solve problems in areas prone to flood disasters in North Sumatra. This research aims to find the optimal number of clusters with the Calinski Harabasz Index and Krzanowski And Lai Index based on the Cluster Tightness Measure (CTM) value. There are eleven variables used in this research. Based on the research results, it was concluded that the CTM CH result of 0.376 was smaller than the CTM KL of 0.7843. So it can be said that determining the optimal number of clusters using CH with k = 6 is better than KL with k = 2.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Azizah, Oscarini, D. R., Saputra, F. M., & Multazam, H. (2021). Grouping Districts in Jakarta Based on Their Level of Susceptibility to Floods Using K-Means Clustering. 3, 150–159.

Brito Da Silva, L. E., Melton, N. M., & Wunsch, D. C. (2020). Incremental Cluster Validity Indices for Online Learning of Hard Partitions: Extensions and Comparative Study. IEEE Access, 8, 22025–22047.

Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). Nbclust: An R package for determining the relevant number of clusters in a data set. Journal of Statistical Software, 61(6), 1–36.

Fernandes, A. A. R. (2021). Comparison of Cluster and Linkage Validity Indices in Integrated Cluster Analysis with Structural Equation Modeling War-PLS Approach. Journal of Hunan University (Natural Sciences), 48(4).

Heer, J., & Chi, E. H. (2002). Mining the Structure of User Activity using Cluster Stability. Proceedings of the Workshop on Web Analytics SIAM Conference on Data Mining, February.

Khairati, A. F., Adlina, A. ., Hertono, G. ., & Handari, B. . (2019). Validity Index Study on the K-Means Enhanced Algorithm and K-Means MMCA. PRISMA, Prosiding Seminar Nasional Matematika, 2, 161–170.

Madani, B.J. (2014). Hybrid Hirerchical Clustering Analysis Through Mutual Clusters, Bottom-Up and Top Down Using Euclidean and Mahalanobis Distances .

Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika , 50 (2).

Ni'matuzzahroh, L., Andrea Tri Rian, D., & Adrianingsih, NY (2022). Clustering Regencies / Cities in Kalimantan Island Based on Poverty Indicators using Agglomerative Hierarchical Clustering (AHC). Journal of Mathematics, Statistics, and Computing , 19 (1), 79–89.

Saidah, D. A., Santoso, R., & Widiharih, T. (2022). Grouping Provinces in Indonesia Based on Environmental Health Indicators Using the Partitioning Around Medoids Method with Internal Index Validation. Jurnal Gaussian, 11(2), 302–312.

Saitta, S., Raphael, B., & Smith, I. F. C. (2008). A comprehensive validity index for clustering. Intelligent Data Analysis, 12(6), 529–548.

Saputro, D. R. S. (2022). Algoritme Partitioning Around Medoid (Pam) Dengan Calinski-Harabasz Index Untuk Clustering Data Outlier. UNEJ E-Proceeding.

Sikana, AM, & Wijayanto, AW (2021). Comparative Analysis of 2019 Indonesian Human Development Index Groupings using Partitioning and Hierarchical Clustering Methods. Journal of Computer Science , 14 (2), 66.

Suyanto. (2019). DATA MINING; For Data Classification and Clustering . Bandung Informatics.

Ulinnuha, N., & Sholihah, SA (2021). Cluster Analysis for Mapping Covid - 19 Case Data in Indonesia Using K- Means. MSA Journal (Mathematics and Statistics and Their Applications) , 9 (2).

Downloads


Crossmark Updates

How to Cite

Syahputri, Z. ., Sutarman, & Machrani Adi Putri Siregar. (2024). Determining The Optimal Number of K-Means Clusters Using The Calinski Harabasz Index and Krzanowski and Lai Index Methods for Groupsing Flood Prone Areas In North Sumatra. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 571-580. https://doi.org/10.33395/sinkron.v9i1.13246