Addressing Class Imbalance in Stunting Classification Using SMOTE Enhanced Random Forest

Ronald Belferik; Frans Mikael  Sinaga; Ferawaty Ferawaty; Mangasa A.S.  Manullang; Tetti  Sinaga

doi:10.33395/sinkron.v9i4.15349

Authors

Ronald Belferik Universitas Pelita Harapan
Frans Mikael Sinaga Universitas Pelita Harapan, Medan
Ferawaty Universitas Pelita Harapan, Medan
Mangasa A.S. Manullang Universitas Pelita Harapan, Medan
Tetti Sinaga Universitas Pelita Harapan, Medan

DOI:

10.33395/sinkron.v9i4.15349

Keywords:

Stunting, Nutritional Status, Random Forest, Imbalance Data, SMOTE

Abstract

Stunting is a chronic nutritional problem that poses serious long-term effects on children’s health, including impaired physical growth, delayed cognitive development, and reduced productivity in adulthood. Early and accurate detection of stunting is therefore essential to support effective public health interventions and targeted policy implementation. However, one of the central challenges in developing machine learning models for this purpose is the presence of class imbalance in health-related datasets. Such imbalance frequently leads to biased classifiers that perform well on majority classes but fail to identify minority categories, reducing the overall reliability of the system. To overcome this issue, the present study utilized the Synthetic Minority Oversampling Technique (SMOTE) to balance the distribution of classes in a dataset containing 110,000 records. A Random Forest algorithm was then employed as the base classifier, with hyperparameter optimization carried out using the Optuna framework to ensure robustness and generalizability. The experimental results demonstrate that the combined application of SMOTE and Optuna significantly improved classification performance, producing the highest Macro Area Under the Curve (AUC) of 0.9972. This outstanding score indicates the model’s superior ability to distinguish nutritional status categories across both majority and minority classes. The study concludes that addressing data imbalance through oversampling is a fundamental methodological step in constructing fair and effective machine learning systems for stunting detection, ultimately contributing to improved health outcomes and evidence-based policy design.

GS Cited Analysis

Downloads

Download data is not yet available.

References

S. Aisyah et al., “GAMBARAN PENGUKURAN ANGKA STUNTING DI KOTA MEDAN TAHUN 2022,” vol. 8, pp. 3711–3716, 2024.

R. Hardinata, L. Oktaviana, F. F. Husain, S. Putri, and F. Kartiasih, “Analysis of Factors Influencing Stunting in Indonesia 2021,” Seminar Nasional Official Statistics 2023, vol. 2023, no. 1, pp. 817–826, 2023.

P. P. Rahayu and Casnuri, “Stunting risk differences based on gender,” Seminar Nasional UNRIYO, vol. 1, no. 1, pp. 135–139, 2020.

N. F. Khusna, A. Rahmah, and R. K. Nur, “Implementasi Random Forest dalam Klasifikasi Kasus Stunting pada Balita dengan Hyperparameter Tuning Grid Search,” vol. 2024, no. Senada, pp. 791–801, 2024.

M. R. Akbar Ariyadi, S. Lestanti, and S. Kirom, “Klasifikasi Balita Stunting Menggunakan Random Forest Classifier Di Kabupaten Blitar,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 6, pp. 3846–3851, 2024, doi: 10.36040/jati.v7i6.7822.

R. J. Ellis, R. M. Sander, and A. Limon, “Twelve key challenges in medical machine learning and solutions,” Intell Based Med, vol. 6, no. February, 2022, doi: 10.1016/j.ibmed.2022.100068.

S. Aisyah et al., “GAMBARAN PENGUKURAN ANGKA STUNTING DI KOTA MEDAN TAHUN 2022,” vol. 8, pp. 3711–3716, 2024.

G. Surono and N. N. Pusparini, “Journal of technology information,” Jurnal Of Technology Information, vol. 5, no. 2, pp. 99–104, 2020.

R. Ridwan, E. H. Hermaliani, and M. Ernawati, “Penerapan: Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian,” Computer Science (CO-SCIENCE), vol. 4, no. 1, pp. 80–88, 2024, [Online]. Available: https://jurnal.bsi.ac.id/index.php/co-science/article/view/2990

R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis : Jurnal Ilmiah Ekonomi dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.

S. Shekhar, A. Bansode, and A. Salim, “A Comparative study of Hyper-Parameter Optimization Tools,” 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021, 2021, doi: 10.1109/CSDE53843.2021.9718485.

J. Al Amien, Yoze Rizki, and Mukhlis Ali Rahman Nasution, “Implementasi Adasyn Untuk Imbalance Data Pada Dataset UNSW-NB15 Adasyn Implementation For Data Imbalance on UNSW-NB15 Dataset,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, pp. 242–248, 2022, doi: 10.37859/coscitech.v3i3.4339.

F. Hutter, Parameter Optimization, vol. 19. 2017. doi: 10.1142/9789814630146_0014.

E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset,” Sensors, vol. 22, no. 9, 2022, doi: 10.3390/s22093246.

I. K. Dharmendra, I. M. Agus, W. Putra, and Y. P. Atmojo, “Evaluasi Efektivitas SMOTE dan Random Under Sampling pada Klasifikasi Emosi Tweet,” vol. 9, no. 2, pp. 192–193, 2024.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631, 2019, doi: 10.1145/3292500.3330701.

Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.

Y. S. Nugroho and N. Emiliyawati, “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest,” Jurnal Teknik Elektro, vol. 9, no. 1, pp. 24–29, 2017.

G. A. Sandag, “Prediksi Rating Aplikasi App Store Menggunakan Algoritma Random Forest,” CogITo Smart Journal, vol. 6, no. 2, pp. 167–178, 2020, doi: 10.31154/cogito.v6i2.270.167-178.

J. Muktabir, “Stunting & Wasting Dataset,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/jabirmuktabir/stunting-wasting-dataset. [Accessed: Aug. 12, 2025].

Harnelia, “Faktor Stunting,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/harnelia/faktor-stunting. [Accessed: Aug. 12, 2025].

	CONTACT US
	EDITORIAL BOARD
	AIMS & SCOPE
	COPYRIGHT & LICENSE
	REVIEWER
	FACEBOOK FANPAGE
	AUTHOR PROCESSING CHARGE
	OPEN ACCESS POLICY
	TEMPLATE
	PEER REVIEW PROCESS
	PUBLICATION ETHICS
	STATISTIC VIEWER
	ARCHIVING
	CROSSMARK POLICY
	FREQUENCY
	PLAGIARISM POLICY
	AUTHOR GUIDELINES
	HISTORY
	CALL REVIEWER

Addressing Class Imbalance in Stunting Classification Using SMOTE Enhanced Random Forest

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Current Issue

Make a Submission

Information

Developed By

Acceptance Rate Statistics