A Statistical Benchmarking of Imbalance-Aware Ensemble Models for Cervical Cancer Prediction
DOI:
10.33395/sinkron.v10i2.15995Keywords:
Cervical Cancer Prediction, Imbalanced Data Classification, Ensemble Learning, Balanced Random Forest, RUSBoostAbstract
Cervical cancer remains one of the leading causes of cancer-related mortality among women worldwide, particularly in developing countries. Early prediction through machine learning has the potential to support clinical decision-making; however, cervical cancer datasets often suffer from severe class imbalance, which reduces the ability of conventional models to correctly detect minority cases. This study aims to improve minority class detection in cervical cancer prediction by evaluating several imbalance-aware ensemble learning approaches. The proposed study compares five models, namely Random Forest (RF), SMOTE combined with Random Forest (SMOTE+RF), Balanced Random Forest (BRF), EasyEnsemble, and RUSBoost. The models were evaluated using 5-fold cross-validation with performance metrics including accuracy, recall, F1-score, and Area Under the Curve (AUC). Statistical validation was conducted using the Friedman test, followed by the Wilcoxon signed-rank test and Kendall’s W effect size analysis to assess the significance and magnitude of performance differences. Unlike prior studies that primarily focus on performance improvement, this study introduces a statistically rigorous comparative evaluation to assess both significance and practical effect of imbalance-aware ensemble methods. Experimental results show that imbalance-aware ensemble methods significantly improve minority detection compared to the baseline RF model. In particular, BRF achieved the highest AUC of 0.9469 with improved recall stability, while RUSBoost produced the highest F1-score of 0.7451. Although the Friedman test indicated no statistically significant difference among models (p = 0.2037), the Kendall’s W value of 0.297 suggests a small-to-moderate practical effect. These findings indicate that imbalance-aware ensemble learning can enhance the robustness of cervical cancer prediction models, particularly for minority class detection. The results highlight the importance of incorporating imbalance-handling strategies in medical prediction systems and suggest potential directions for future research in improving diagnostic decision-support models.
Downloads
References
Altalhan, M., Algarni, A., & Monia, T. H. A. (2025). Imbalanced Data Problem in Machine Learning: A Review. Turki Hadj Alouane Monia, 11. https://doi.org/10.1109/ACCESS.2025.3531662
Ayodele, A. (2023). A comparative study of ensemble learning techniques for imbalanced classification problems. World Journal of Advanced Research and Review, 19(1), 1633–1643. https://doi.org/https://doi.org/10.30574/wjarr.2023.19.1.1202
Çorbacıoğlu, Ş. K., & Aksel, G. (2023). Receiver operating characteristic curve analysis in diagnostic accuracy studies. Turkish Journal of Emergency Medicine, 23(4). https://doi.org/10.4103/tjem.tjem_182_23
Fulazzaky, T., Saefuddin, A., & Soleh, A. M. (2024). Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost. Scientific Journal of Informatics, 11(4). https://doi.org/https://doi.org/10.15294/sji.v11i4.15937
Geron, A. (2022). Hands on Machine learning with Scikit Learn Keras and Tensor Flow Concepts, Tools, and Techniques to Build Intelligent Systems (3rd ed.). O’Reilly Media, Inc.
Glučina, M., Ariana Lorencin, Nikola Anđelić, & Ivan Lorencin. (2023). Cervical Cancer Diagnostics Using Machine Learning Algorithms and Class Balancing Techniques. Applied Sciences, 13(12). https://doi.org/https://doi.org/10.3390/app13021061
Gurcan, F., & Soylu, A. (2024). Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis. Cancers, 16(19). https://doi.org/https://doi.org/10.3390/cancers16193417
Huang, C. Y., & Dai, H. L. (2021). Learning from class-imbalanced data: review of data driven methods and algorithm driven methods. Data Science in Finance and Economics, 1(1), 21–36. https://doi.org/10.3934/DSFE.2021002
Mudawi, N. Al, & Alazeb, A. (2022). A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors, 22(11). https://doi.org/https://doi.org/10.3390/s22114132
Mulugeta, G., Zewotir, T., Tegegne, A. S., Juhar, L. H., & Muleta, M. B. (2023). Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia. BMC Medical Informatics and Decision Making, 23(98). https://doi.org/10.1186/s12911-023-02185-5
Muraru, M. M., Simó, Z., & Iantovics, L. B. (2024). Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods. Applied Sciences, 14(22). https://doi.org/https://doi.org/10.3390/app142210085
Nurdin, H., Carolina, I., Andharsaputri, R. L., Wuryanto, A., & Ridwansyah. (2024). Forward Selection as a Feature Selection Method in the SVM Kernel for Student Graduation Data. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(October), 2531–2537. https://doi.org/10.33395/sinkron.v8i4.14172
Purnama, J. J., Nawawi, H. M., Rosyida, S., Ridwansyah, & Risandar. (2020). Klasifikasi Mahasiswa Her Berbasis Algortima Svm Dan Decision Tree. Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(6), 1253–1260. https://doi.org/10.25126/jtiik.202073080
Ridwansyah, Iqbal, M., Destiana, H., Sugiono, & Hamid, A. (2024). Data Mining Berbasis Machine Learning Untuk Analitik Prediktif Dalam Kelulusan. SemanTIK, 10(2), 1–10. https://doi.org/https://doi.org/10.55679/semantik.v10i2.67
Ridwansyah, R., Riyanto, V., Hamid, A., Rahayu, S., & Purnama, J. J. (2022). Grouping Data in Predicting Infant Mortality Using K-Means and Decision Tree. Paradigma, 24(2), 168–174. https://doi.org/10.31294/paradigma.v24i2.1399
Salmi, M., Atif, D., Oliva, D., Abraham, A., & Sebastian Ventura. (2024). Handling imbalanced medical datasets: review of a decade of research. Artificial Intelligence Review (Springer Nature, 57(10). https://doi.org/10.1007/s10462-024-10884-2
Saputra, R. M., Alzami, F., Pramudi, Y. T. C., Erawan, L., Megantara, R. A., Ricardus Anggi Pramunendar, & Yusuf, M. (2025). Improving Cervical Cancer Classification Using ADASYN and Random Forest with GridSearchCV Optimization. Informatics, Electrical Engineering, and Mechanical Engineering, 16(1). https://doi.org/https://doi.org/10.35970/infotekmesin.v16i1.2552
Siregar, A. Y., & Arifin, A. S. (2024). Enhancing XGBoost Classification with SVM-SMOTE & EasyEnsemble for Imbalanced Telemedicine Sentiment Data. Jurnal Indonesia Sosial Teknologi, 5(10). https://doi.org/https://doi.org/10.59141/jist.v5i10.1160
Vazquez, B., Rojas-García, M., Rodríguez-Esquivel, J. I., Marquez-Acosta, J., Aranda-Flores, C. E., Cetina-Pérez, L. del C., Soto-López, S., Estévez-García, J. A., Bahena-Román, M., Madrid-Marina, V., & Torres-Poveda, K. (2025). Machine and Deep Learning for the Diagnosis, Prognosis, and Treatment of Cervical Cancer: A Scoping Review. Diagnostics, 15(12).
Yang, Y., Khorshidi, H. A., & Aickelin, U. (2024). A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems. Frontiers in Digital Health, 6(1430245). https://doi.org/10.3389/fdgth.2024.1430245
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Sumarna, Astrilyana, Sugiono, Ganda Wijaya, Yessica Fara Desvia

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
