Integration Of Pca And K-Means Clustering For Staple Food Segmentation In Support Of National Food Policy

Authors

  • Sardo Sipayung Program Studi Sainsdata Fakultas Ilmu Komputer Universitas Katolik Santo Thomas
  • Paska Marto Hasugian Program Studi Sainsdata Fakultas Ilmu Komputer Universitas Katolik Santo Thomas

DOI:

10.33395/sinkron.v9i4.15343

Keywords:

PCA, K-Means, regional segmentation, staple foods, Indonesia

Abstract

This study aims to develop cross-provincial staple-food segmentation by integrating Principal Component Analysis (PCA) and K-Means to support policy formation. The dataset comprises 2023 staple-food consumption for 34 Indonesian provinces across six indicators from BPS/SUSENAS. All indicators were standardized using z-score, reduced via PCA, and the resulting component scores were used as inputs to K-Means. Three components (PC1–PC3) explained 73.86% of the variance and captured shifts between sweet/animal-based vs. plant foods, fatty or animal-based grains, and the energy contribution of fat. The optimal number of clusters was determined as k = 3, yielding Silhouette = 0.466 and DBI = 0.733, indicating sufficiently compact and well-separated groups. The results reveal three segments: the first group consists of 11 provinces that are predominantly plant-based with low sugar and low animal-based consumption; the second group includes 13 provinces characterized by high animal-based and high-fat consumption; and the third group comprises 10 provinces with low-fat diets and fresh plant-based consumption. Stability checks on initialization and a leave-one-feature-out procedure confirmed consistent assignments. This fills an empirical gap: to our knowledge, no prior research integrates PCA with K-Means for cross-provincial staple-food segmentation in Indonesia while also reporting internal validation. Practically, the study provides operational segmentation to support food-security interventions moving beyond composite indices toward actionable targeting for production support, supply/price stabilization, and improved nutritional access thereby reframing IKP/FSVA from index-ranking to evidence-based segmentation.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Anuragi, A., Sisodia, D. S., & Pachori, R. B. (2024). Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review. Artificial Intelligence Review, 57(3), 1–28. https://doi.org/10.1007/s10462-024-10711-8

Azzam, A. F., Maghrabi, A., El-Naqeeb, E., Aldawood, M., & Elghawalby, H. (2024). Morphological Accuracy Data Clustering: A Novel Algorithm for Enhanced Cluster Analysis. Applied Computational Intelligence and Soft Computing, 2024(3). https://doi.org/10.1155/2024/3795126

Badan Pangan Nasional. (2022). Indeks Ketahanan Pangan 2022. Antimicrobial Agents and Chemotherapy, 58(12), 7250–7257.adan Pangan Nasional. (2022). Indeks Ketahanan Pangan 2022. Antimicrobial Agents and Chemotherapy, 58(12), 7250–7257.

Bougiouklis, J. N., Barouchas, P. E., Petropoulos, P., Tsesmelis, D. E., & Moustakas, N. (2025). Precision soil sampling strategy for the delineation of management zones in olive cultivation using unsupervised machine learning methods. Scientific Reports, 15(1), 1–26. https://doi.org/10.1038/s41598-025-89395-1

Davies, D. L., & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

Dongyu, Q., Lario, A., Russel, C., Hensley McCain, C., & Adhanom Ghebreyesus, T. (2024). The State of Food Security and Nutrition in the World 2024. In The State of Food Security and Nutrition in the World 2024. https://doi.org/10.4060/cd1254en

Facendola, R., Ottomano Palmisano, G., De Boni, A., Acciani, C., & Roma, R. (2023). Profiling Citizens on Perception of Key Factors of Food Security: An Application of K-Means Cluster Analysis. Sustainability (Switzerland), 15(13). https://doi.org/10.3390/su15139915

Festa, D., Novellino, A., Hussain, E., Bateson, L., Casagli, N., Confuorto, P., Del Soldato, M., & Raspini, F. (2023). Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering. International Journal of Applied Earth Observation and Geoinformation, 118(November 2022), 103276. https://doi.org/10.1016/j.jag.2023.103276

Fite, N. B., Wegari, G. M., & Steendam, H. (2025). Integration of Artificial Neural Network Regression and Principal Component Analysis for Indoor Visible Light Positioning. Sensors, 25(4), 1–22. https://doi.org/10.3390/s25041049

Fitra, R. A. (n.d.). Penerapan Metode K-Means Clustering pada Hasil Produksi Beras di Wilayah Sumatera Utara. 1(6), 2–8.

Ha, J., Kambe, M., & Pe, J. (2011). Data Mining: Concepts and Techniques. In Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5

Ikotun, A. M., Habyarimana, F., & Ezugwu, A. E. (2025). Benchmarking validity indices for evolutionary K-means clustering performance. Scientific Reports, 15(1), 1–24. https://doi.org/10.1038/s41598-025-08473-6

Iqbal, M., Sipayung, S. P., Sinaga, A. R., & Hasugian, P. M. (2024). Analysis of Student Achievement with K-Means on Socioeconomic , Behavioral , and Psychological Factors. 14(04), 715–728. https://doi.org/10.54209/infosains.v14i04

Konishi, T. (2025). Means and Issues for Adjusting Principal Component Analysis Results. Algorithms, 18(3). https://doi.org/10.3390/a18030129

Maugeri, A., Barchitta, M., Favara, G., La Mastra, C., La Rosa, M. C., Magnano San Lio, R., & Agodi, A. (2023). The Application of Clustering on Principal Components for Nutritional Epidemiology: A Workflow to Derive Dietary Patterns. Nutrients, 15(1). https://doi.org/10.3390/nu15010195

Nardo, M., Saisana, M., Saltelli, A., Tarantola, S., Hoffman, A., & Giovannini, E. (2005). Handbook on constructing composite indicators. In OECD Statistics Working Papers (Issue 03). http://www.oecd-ilibrary.org/docserver/download/5lgmz9dkcdg4.pdf?expires=1471336777&id=id&accname=guest&checksum=158391DADFA324416BB9015F3E4109AF

Qarmiche, N., El Kinany, K., Otmani, N., El Rhazi, K., & Chaoui, N. E. H. (2023). Cluster analysis of dietary patterns associated with colorectal cancer derived from a Moroccan case-control study. BMJ Health and Care Informatics, 30(1), 1–9. https://doi.org/10.1136/bmjhci-2022-100710

Roh, H. R., Kim, C. S., Lee, Y., & Lee, J. M. (2025). Dimensionality Reduction for Clustering of Nonlinear Industrial Data: A Tutorial. Korean Journal of Chemical Engineering, 42(5), 987–1001. https://doi.org/10.1007/s11814-025-00402-7

Sciaraffa, N., Gagliano, A., Augugliaro, L., & Coronnello, C. (2025). Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics. Frontiers in Bioinformatics, 5(June), 1–21. https://doi.org/10.3389/fbinf.2025.1562410

Tahun, F. N. (2023). FSVA Nasional Tahun 2023 1.

Tarekegn, A. N., Tessem, B., & Rabbi, F. (2025). A New Cluster Validation Index Based on Stability Analysis. International Conference on Pattern Recognition Applications and Methods, 1(Icpram), 377–384. https://doi.org/10.5220/0013309100003905

The Global Food Security Index 2022. (2022). Global Food Security Index 2022. Economist Impact. https://impact.economist.com/sustainability/project/food-security-index/explore-countries/indonesia

Ville, B. de. (2001). Introduction to Data Mining. In Microsoft Data Mining. https://doi.org/10.1016/b978-155558242-5/50003-6

Wani, A. A. (2025). Comprehensive review of dimensionality reduction algorithms: challenges, limitations, and innovative solutions. PeerJ Computer Science, 11, e3025. https://doi.org/10.7717/peerj-cs.3025

Downloads


Crossmark Updates

How to Cite

Sipayung, S., & Hasugian, P. M. (2025). Integration Of Pca And K-Means Clustering For Staple Food Segmentation In Support Of National Food Policy. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 3175-3189. https://doi.org/10.33395/sinkron.v9i4.15343