Comparison of XGBoost and Naive Bayes Models in Type 2 Diabetes Prediction with RFE Feature Selection
DOI:
10.33395/sinkron.v10i1.15509Abstract
Type 2 diabetes mellitus is a chronic disease with an increasing prevalence rate that can cause serious complications if not detected early. The application of machine learning algorithms can aid prediction, but selecting the right model and features greatly determines the accuracy of the results. This study aims to compare the performance of the Extreme Gradient Boosting (XGBoost) and Naive Bayes algorithms in predicting type 2 diabetes with and without Recursive Feature Elimination (RFE) feature selection. The data used were from the UCI Machine Learning Repository, comprising 768 samples and eight clinical features. The research process included data preprocessing, dividing the data into 614 training data and 154 testing data, applying RFE to select the most influential features, model training, and evaluation using accuracy, precision, recall, F1-score, and AUC. The results show that Naive Bayes without RFE achieves 70.77% accuracy, 0.57377 precision, 0.648148 recall, F1-score 0.608696, and 0.772778 AUC, while Naive Bayes with RFE increases the accuracy to 74.02% and the AUC to 0.793333. Meanwhile, XGBoost with RFE provided the best results with an accuracy of 74.67%, precision of 0.653061, recall of 0.592593, F1-score of 0.621359, and the highest AUC of 0.804259. Besides, applying RFE also improves the computational efficiency. These findings indicate that applying RFE significantly improves classification and computation time performance. The practical implication is that this model could aid early detection of diabetes in clinical settings. Further research can be conducted by optimizing parameters and using more diverse datasets.
Downloads
References
Alqahtani, S. A. M., Alobaid, H. M., Alshammari, J., Alqarzae, S. A., Aloyouni, S. Y., Al-Eidan, A. A., Alhamad, S., Almiman, A., Alkhulaifi, F. M., & Alomar, S. (2024). Feature importance and model performance for prediabetes prediction: A comparative study. Journal of King Saud University - Science, 36(11). https://doi.org/10.1016/j.jksus.2024.103583
Anasanti, M. D., Hilyati, K., & Novtariany, A. (2022). Exploring feature selection techniques on Classification Algorithms for Predicting Type 2 Diabetes at Early Stage. Jurnal RESTI, 6(5), 832–839. https://doi.org/10.29207/resti.v6i5.4419
Erkamim, M., Suswadi, S., Subarkah, M. Z., & Widarti, E. (2023). Komparasi Algoritme Random Forest dan XGBoosting dalam Klasifikasi Performa UMKM. Jurnal Sistem Informasi Bisnis, 13(2), 127–134. https://doi.org/10.21456/vol13iss2pp127-134
Fitriyani, F. (2021). Prediksi Diabetes Menggunakan Algoritma Naive Bayes dan Greedy Forward Selection. Jurnal Nasional Teknologi Dan Sistem Informasi, 7(2), 61–69. https://doi.org/10.25077/teknosi.v7i2.2021.61-69
Goyal, D., Singh, J., & Vashist, A. (2025). Advanced Machine Learning Models Diabetes Risk Prediction Using Feature Selection Algorithms and Advanced Machine Learning Models. 1–24.
Idris, N. F., Ismail, M. A., Jaya, M. I. M., Ibrahim, A. O., Abulfaraj, A. W., & Binzagr, F. (2024). Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus. PLoS ONE, 19(5 May), 1–18. https://doi.org/10.1371/journal.pone.0302595
Jawza, D. N., Mazdadi, M. I., Farmadi, A., Saragih, T. H., Kartini, D., & Abdullayev, V. (2025). Enhancing Diabetes Prediction Accuracy Using Random Forest and XGBoost with PSO and GA-Based Feature Selection. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(2), 295–306. https://doi.org/10.35882/jeeemi.v7i2.626
Khurshid, M. R., Manzoor, S., Sadiq, T., Hussain, L., Khan, M. S., & Dutta, A. K. (2025). Unveiling diabetes onset: Optimized XGBoost with Bayesian optimization for enhanced prediction. PLoS ONE, 20(1 January), 1–29. https://doi.org/10.1371/journal.pone.0310218
Maulana, I., & Ernawati, S. (2025). Meningkatkan Klasifikasi Penyakit Diabetes Menggunakan Metode Ensemble Softvoting Dengan SMOTE-ENN dan Optimasi Bayesian. Evolusi: Jurnal Sains Dan Manajemen, 13(1), 71–86.
Naki, M. I., Tambengi, R. A., & Sumariangen, A. B. (2025). Diabetes Mellitus Tipe 2 : Prevalensi , Etiologi , dan Pelaksanaannya. 5(1), 77–87.
Nemer, Z. N., Raheem, S. F., & Alabbas, M. (2025). Comparison of Classification of Different Machine Learning Algorithms in the Diagnosis and Detect of Diabetes. International Journal of Computing and Digital Systems, 18(1). https://doi.org/10.12785/ijcds/1571016484
Parvez, A., & Mufti, M. J. (2025). Generalizable Diabetes Risk Stratification via Hybrid Machine Learning Models.
Sukri, & Arisandi, D. (2020). Jurnal Resti. Resti, 1(1), 19–25.
Susanto, E. R., Teknik, F., Indonesia, U. T., & Lampung, B. (2025). Penerapan Algoritma XGBoost untuk Prediksi Diabetes : Analisis Confusion Matrix dan ROC Curve Agum Cahyana Abstrak. 10(1).
Syahputra, A. A., & Saputro, R. E. (2024). Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants. Sinkron, 8(3), 1920–1931. https://doi.org/10.33395/sinkron.v8i3.13840
Wardhani, K. D. K., & Novayani, W. (2024). Principal Component Analysis for Prediabetes Prediction using Extreme Gradient Boosting (XGBoost). Scientific Journal of Informatics, 11(3), 863–872. https://doi.org/10.15294/sji.v11i3.13416
Wijaya Kusuma, A., Mazdadi, M. I., Kartini, D., Farmadi, A., Indriani, F., & P., C. (2025). Improving Diabetes Prediction Using Feedforward Neural Network with Adam Optimization and SMOTE Technique. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(3), 539–548. https://doi.org/10.35882/ijeeemi.v7i3.127
Wiratama, M. A., & Pradnya, W. M. (2022). OPTIMASI ALGORITMA DATA MINING MENGGUNAKAN BACKWARD ELIMINATION UNTUK KLASIFIKASI PENYAKIT DIABETES Jurnal Nasional Pendidikan Teknik Informatika : JANAPATI | 2. 11, 1–12.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Hanisa putri Barus, Robet, Feriani Astuti Tarigan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit




















