Comparative Analysis of Homogeneous and Heterogeneous Ensembles for Diabetes Classification Optimization

Authors

  • Muhammad Naufal Maulana Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Muljono Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Eka Putra Agus Meindiawan Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

DOI:

10.33395/sinkron.v9i1.14439

Keywords:

Diabetes, Boosting, Bagging, Stacking, Blending

Abstract

Diabetes mellitus is a chronic disease with an increasing prevalence worldwide, including in Indonesia, reaching 11.7% by 2023. Early prediction of this disease is essential for more effective management. This study aims to develop a diabetes mellitus prediction model using an ensemble learning approach, including homogeneous (boosting and bagging) and heterogeneous (stacking and blending) techniques. In this study, the boosting algorithm using AdaBoost with Random Forest as the base estimator showed the highest accuracy of 98%, with balanced precision and recall. The bagging technique, which also uses Random Forest as the base estimator, achieved 97% accuracy, although slightly lower than boosting. The stacking technique, which combines XGBoost, Gradient Boosting, and Random Forest as base learners, with Random Forest as the meta-model, yields similar accuracy of 98%, but with lower prediction error, demonstrating its ability to cope with more complex data. Blending, which uses a similar approach but with training on the entire dataset, gave 98% accuracy with shorter processing time and more efficient memory usage than stacking.

GS Cited Analysis

Downloads

Download data is not yet available.

References

A, P. D., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Comparative Evaluation.

Agnitia LEstari, M., Tabrani, M., & Ayumida, S. (2021). Sistem Informasi Pengolahan Data Administrasi Kependudukan Pada Kantor Desa Pucung Karawang. Jurnal Interkom: Jurnal Publikasi Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 13(3), 14–21. https://doi.org/10.35969/interkom.v13i3.50

Alam, U., Asghar, O., Azmi, S., & Malik, R. A. (2014). General aspects of diabetes mellitus. Handbook of Clinical Neurology, 126, 211–222. https://doi.org/10.1016/B978-0-444-53480-4.00015-1

Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. Journal of Healthcare Engineering, 2021. https://doi.org/10.1155/2021/9930985

Chandra, W., Suprihatin, B., & Resti, Y. (2023). Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction. Symmetry, 15(4). https://doi.org/10.3390/sym15040887

Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2023). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157–16173. https://doi.org/10.1007/s00521-022-07049-z

Chatzimparmpas, A., Martins, R. M., Kucher, K., & Kerren, A. (2021). StackGenVis: Alignment of data, algorithms, and models for stacking ensemble learning using performance metrics. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1547–1557. https://doi.org/10.1109/TVCG.2020.3030352

Fareed, M. M. S., Zikria, S., Ahmed, G., Mui-Zzud-Din, Mahmood, S., Aslam, M., Jillani, S. F., Moustafa, A., & Asad, M. (2022). ADD-Net: An Effective Deep Learning Model for Early Detection of Alzheimer Disease in MRI Scans. IEEE Access, 10, 96930–96951. https://doi.org/10.1109/ACCESS.2022.3204395

Gomes, H. M., Barddal, J. P., Enembreck, A. F., & Bifet, A. (2017). A survey on ensemble learning for data stream classification. ACM Computing Surveys, 50(2). https://doi.org/10.1145/3054925

Kahloot, K. M., & Ekler, P. (2021). Algorithmic Splitting: A Method for Dataset Preparation. IEEE Access, 9, 125229–125237. https://doi.org/10.1109/ACCESS.2021.3110745

Kumar, M., Singhal, S., Shekhar, S., Sharma, B., & Srivastava, G. (2022). Optimized Stacking Ensemble Learning Model for Breast Cancer Detection and Classification Using Machine Learning. Sustainability (Switzerland), 14(21). https://doi.org/10.3390/su142113998

Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19(1), 1–9. https://doi.org/10.1186/s12902-019-0436-6

Manconi, A., Armano, G., Gnocchi, M., & Milanesi, L. (2022). A Soft-Voting Ensemble Classifier for Detecting Patients Affected by COVID-19. Applied Sciences (Switzerland), 12(15). https://doi.org/10.3390/app12157554

Mengcan, M. I. N., Xiaofang, C., & Yongfang, X. I. E. (2021). Constrained voting extreme learning machine and its application. 32(1), 209–219. https://doi.org/10.23919/JSEE.2021.000018

Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University - Computer and Information Sciences, 35(2), 757–774. https://doi.org/10.1016/j.jksuci.2023.01.014

Mujumdar, A., & Vaidehi, V. (2019). Diabetes Prediction using Machine Learning Algorithms. Procedia Computer Science, 165, 292–299. https://doi.org/10.1016/j.procs.2020.01.047

Muljono, Wulandari, S. A., Azies, H. Al, Naufal, M., Prasetyanto, W. A., & Zahra, F. A. (2024). Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI. IEEE Access, 12(November 2023), 9292–9307. https://doi.org/10.1109/ACCESS.2024.3353788

Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1–14. https://doi.org/10.1016/j.neucom.2022.08.055

Nur, A., Thohari, A., Karima, A., Santoso, K., & Rahmawati, R. (2024). Crack Detection in Building Through Deep Learning Feature Extraction and Machine Learning Approach. 8(1), 1–6.

Ogurtsova, K., da Rocha Fernandes, J. D., Huang, Y., Linnenkamp, U., Guariguata, L., Cho, N. H., Cavan, D., Shaw, J. E., & Makaroff, L. E. (2017). IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Research and Clinical Practice, 128, 40–50. https://doi.org/10.1016/j.diabres.2017.03.024

Rif’at, I. D., Hasneli N, Y., & Indriati, G. (2023). Gambaran Komplikasi Diabetes Melitus Pada Penderita Diabetes Melitus. Jurnal Keperawatan Profesional, 11(1), 52–69. https://doi.org/10.33650/jkp.v11i1.5540

Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/3820360

Tanwar, A., & Bhatia, P. K. (2024). A Review on Diabetes Prediction Using Machine Learning Techniques. Lecture Notes in Electrical Engineering, 1185(09), 513–524. https://doi.org/10.1007/978-981-97-1682-1_41

Tuysuzoglu, G., & Birant, D. (2020). Enhanced bagging (eBagging): A novel approach for ensemble learning. International Arab Journal of Information Technology, 17(4), 515–528. https://doi.org/10.34028/iajit/17/4/10

Wang, Z., Wu, C., Zheng, K., Niu, X., & Wang, X. (2019). SMOTETomek-Based Resampling for Personality Recognition. IEEE Access, 7, 129678–129689. https://doi.org/10.1109/ACCESS.2019.2940061

Wu, H., Wu, Y., Jiang, Y., Zhou, B., Zhou, H., Chen, Z., Xiong, Y., Liu, Q., & Zhang, H. (2022). ScHiCStackL: A stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Briefings in Bioinformatics, 23(1), 1–10. https://doi.org/10.1093/bib/bbab396

Yorke-smith, N., & Dumančić, S. (2023). Model Stacking Performance Comparisons for Lifetime Estimation of CMOS ICs. November 2022.

Zhang, H., Liu, C., Zhang, Z., Xing, Y., Liu, X., Dong, R., He, Y., Xia, L., & Liu, F. (2021). Recurrence Plot-Based Approach for Cardiac Arrhythmia Classification Using Inception-ResNet-v2. Frontiers in Physiology, 12(May), 1–13. https://doi.org/10.3389/fphys.2021.648950

Downloads


Crossmark Updates

How to Cite

Maulana, M. N. ., Muljono, M., & Meindiawan, E. P. A. . (2025). Comparative Analysis of Homogeneous and Heterogeneous Ensembles for Diabetes Classification Optimization. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 512-521. https://doi.org/10.33395/sinkron.v9i1.14439