Addressing Class Imbalance in Stunting Classification Using SMOTE Enhanced Random Forest

Authors

  • Ronald Belferik Universitas Pelita Harapan
  • Frans Mikael Sinaga Universitas Pelita Harapan, Medan
  • Ferawaty Universitas Pelita Harapan, Medan
  • Mangasa A.S. Manullang Universitas Pelita Harapan, Medan
  • Tetti Sinaga Universitas Pelita Harapan, Medan

DOI:

10.33395/sinkron.v9i4.15349

Keywords:

Stunting, Nutritional Status, Random Forest, Imbalance Data, SMOTE

Abstract

Stunting is a chronic nutritional problem that poses serious long-term effects on children’s health, including impaired physical growth, delayed cognitive development, and reduced productivity in adulthood. Early and accurate detection of stunting is therefore essential to support effective public health interventions and targeted policy implementation. However, one of the central challenges in developing machine learning models for this purpose is the presence of class imbalance in health-related datasets. Such imbalance frequently leads to biased classifiers that perform well on majority classes but fail to identify minority categories, reducing the overall reliability of the system. To overcome this issue, the present study utilized the Synthetic Minority Oversampling Technique (SMOTE) to balance the distribution of classes in a dataset containing 110,000 records. A Random Forest algorithm was then employed as the base classifier, with hyperparameter optimization carried out using the Optuna framework to ensure robustness and generalizability. The experimental results demonstrate that the combined application of SMOTE and Optuna significantly improved classification performance, producing the highest Macro Area Under the Curve (AUC) of 0.9972. This outstanding score indicates the model’s superior ability to distinguish nutritional status categories across both majority and minority classes. The study concludes that addressing data imbalance through oversampling is a fundamental methodological step in constructing fair and effective machine learning systems for stunting detection, ultimately contributing to improved health outcomes and evidence-based policy design.

GS Cited Analysis

Downloads

Download data is not yet available.

References

S. Aisyah et al., “GAMBARAN PENGUKURAN ANGKA STUNTING DI KOTA MEDAN TAHUN 2022,” vol. 8, pp. 3711–3716, 2024.

R. Hardinata, L. Oktaviana, F. F. Husain, S. Putri, and F. Kartiasih, “Analysis of Factors Influencing Stunting in Indonesia 2021,” Seminar Nasional Official Statistics 2023, vol. 2023, no. 1, pp. 817–826, 2023.

P. P. Rahayu and Casnuri, “Stunting risk differences based on gender,” Seminar Nasional UNRIYO, vol. 1, no. 1, pp. 135–139, 2020.

N. F. Khusna, A. Rahmah, and R. K. Nur, “Implementasi Random Forest dalam Klasifikasi Kasus Stunting pada Balita dengan Hyperparameter Tuning Grid Search,” vol. 2024, no. Senada, pp. 791–801, 2024.

M. R. Akbar Ariyadi, S. Lestanti, and S. Kirom, “Klasifikasi Balita Stunting Menggunakan Random Forest Classifier Di Kabupaten Blitar,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 6, pp. 3846–3851, 2024, doi: 10.36040/jati.v7i6.7822.

R. J. Ellis, R. M. Sander, and A. Limon, “Twelve key challenges in medical machine learning and solutions,” Intell Based Med, vol. 6, no. February, 2022, doi: 10.1016/j.ibmed.2022.100068.

S. Aisyah et al., “GAMBARAN PENGUKURAN ANGKA STUNTING DI KOTA MEDAN TAHUN 2022,” vol. 8, pp. 3711–3716, 2024.

G. Surono and N. N. Pusparini, “Journal of technology information,” Jurnal Of Technology Information, vol. 5, no. 2, pp. 99–104, 2020.

R. Ridwan, E. H. Hermaliani, and M. Ernawati, “Penerapan: Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian,” Computer Science (CO-SCIENCE), vol. 4, no. 1, pp. 80–88, 2024, [Online]. Available: https://jurnal.bsi.ac.id/index.php/co-science/article/view/2990

R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis : Jurnal Ilmiah Ekonomi dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.

S. Shekhar, A. Bansode, and A. Salim, “A Comparative study of Hyper-Parameter Optimization Tools,” 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021, 2021, doi: 10.1109/CSDE53843.2021.9718485.

J. Al Amien, Yoze Rizki, and Mukhlis Ali Rahman Nasution, “Implementasi Adasyn Untuk Imbalance Data Pada Dataset UNSW-NB15 Adasyn Implementation For Data Imbalance on UNSW-NB15 Dataset,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, pp. 242–248, 2022, doi: 10.37859/coscitech.v3i3.4339.

F. Hutter, Parameter Optimization, vol. 19. 2017. doi: 10.1142/9789814630146_0014.

E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset,” Sensors, vol. 22, no. 9, 2022, doi: 10.3390/s22093246.

I. K. Dharmendra, I. M. Agus, W. Putra, and Y. P. Atmojo, “Evaluasi Efektivitas SMOTE dan Random Under Sampling pada Klasifikasi Emosi Tweet,” vol. 9, no. 2, pp. 192–193, 2024.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631, 2019, doi: 10.1145/3292500.3330701.

Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.

Y. S. Nugroho and N. Emiliyawati, “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest,” Jurnal Teknik Elektro, vol. 9, no. 1, pp. 24–29, 2017.

G. A. Sandag, “Prediksi Rating Aplikasi App Store Menggunakan Algoritma Random Forest,” CogITo Smart Journal, vol. 6, no. 2, pp. 167–178, 2020, doi: 10.31154/cogito.v6i2.270.167-178.

J. Muktabir, “Stunting & Wasting Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/jabirmuktabir/stunting-wasting-dataset. [Accessed: Aug. 12, 2025].

Harnelia, “Faktor Stunting,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/harnelia/faktor-stunting. [Accessed: Aug. 12, 2025].

Downloads


Crossmark Updates

How to Cite

Belferik, R., Sinaga, F. M. ., Ferawaty, F., Manullang, M. A. ., & Sinaga, T. . (2025). Addressing Class Imbalance in Stunting Classification Using SMOTE Enhanced Random Forest. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 2108-2116. https://doi.org/10.33395/sinkron.v9i4.15349