A Statistical Benchmarking of Imbalance-Aware Ensemble Models for Cervical Cancer Prediction

Authors

  • Sumarna 1)Universitas Nusa Mandiri, Indonesia
  • Astrilyana Universitas Bina Sarana Informatika, Indonesia
  • Sugiono Universitas Bina Sarana Informatika, Indonesia
  • Ganda Wijaya Universitas Nusa Mandiri, Indonesia
  • Yessica Fara Desvia Politeknik Jatiluhur, Indonesia

DOI:

10.33395/sinkron.v10i2.15995

Keywords:

Cervical Cancer Prediction, Imbalanced Data Classification, Ensemble Learning, Balanced Random Forest, RUSBoost

Abstract

Cervical cancer remains one of the leading causes of cancer-related mortality among women worldwide, particularly in developing countries. Early prediction through machine learning has the potential to support clinical decision-making; however, cervical cancer datasets often suffer from severe class imbalance, which reduces the ability of conventional models to correctly detect minority cases. This study aims to improve minority class detection in cervical cancer prediction by evaluating several imbalance-aware ensemble learning approaches. The proposed study compares five models, namely Random Forest (RF), SMOTE combined with Random Forest (SMOTE+RF), Balanced Random Forest (BRF), EasyEnsemble, and RUSBoost. The models were evaluated using 5-fold cross-validation with performance metrics including accuracy, recall, F1-score, and Area Under the Curve (AUC). Statistical validation was conducted using the Friedman test, followed by the Wilcoxon signed-rank test and Kendall’s W effect size analysis to assess the significance and magnitude of performance differences. Unlike prior studies that primarily focus on performance improvement, this study introduces a statistically rigorous comparative evaluation to assess both significance and practical effect of imbalance-aware ensemble methods. Experimental results show that imbalance-aware ensemble methods significantly improve minority detection compared to the baseline RF model. In particular, BRF achieved the highest AUC of 0.9469 with improved recall stability, while RUSBoost produced the highest F1-score of 0.7451. Although the Friedman test indicated no statistically significant difference among models (p = 0.2037), the Kendall’s W value of 0.297 suggests a small-to-moderate practical effect. These findings indicate that imbalance-aware ensemble learning can enhance the robustness of cervical cancer prediction models, particularly for minority class detection. The results highlight the importance of incorporating imbalance-handling strategies in medical prediction systems and suggest potential directions for future research in improving diagnostic decision-support models.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Altalhan, M., Algarni, A., & Monia, T. H. A. (2025). Imbalanced Data Problem in Machine Learning: A Review. Turki Hadj Alouane Monia, 11. https://doi.org/10.1109/ACCESS.2025.3531662

Ayodele, A. (2023). A comparative study of ensemble learning techniques for imbalanced classification problems. World Journal of Advanced Research and Review, 19(1), 1633–1643. https://doi.org/https://doi.org/10.30574/wjarr.2023.19.1.1202

Çorbacıoğlu, Ş. K., & Aksel, G. (2023). Receiver operating characteristic curve analysis in diagnostic accuracy studies. Turkish Journal of Emergency Medicine, 23(4). https://doi.org/10.4103/tjem.tjem_182_23

Fulazzaky, T., Saefuddin, A., & Soleh, A. M. (2024). Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost. Scientific Journal of Informatics, 11(4). https://doi.org/https://doi.org/10.15294/sji.v11i4.15937

Geron, A. (2022). Hands on Machine learning with Scikit Learn Keras and Tensor Flow Concepts, Tools, and Techniques to Build Intelligent Systems (3rd ed.). O’Reilly Media, Inc.

Glučina, M., Ariana Lorencin, Nikola Anđelić, & Ivan Lorencin. (2023). Cervical Cancer Diagnostics Using Machine Learning Algorithms and Class Balancing Techniques. Applied Sciences, 13(12). https://doi.org/https://doi.org/10.3390/app13021061

Gurcan, F., & Soylu, A. (2024). Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis. Cancers, 16(19). https://doi.org/https://doi.org/10.3390/cancers16193417

Huang, C. Y., & Dai, H. L. (2021). Learning from class-imbalanced data: review of data driven methods and algorithm driven methods. Data Science in Finance and Economics, 1(1), 21–36. https://doi.org/10.3934/DSFE.2021002

Mudawi, N. Al, & Alazeb, A. (2022). A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors, 22(11). https://doi.org/https://doi.org/10.3390/s22114132

Mulugeta, G., Zewotir, T., Tegegne, A. S., Juhar, L. H., & Muleta, M. B. (2023). Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia. BMC Medical Informatics and Decision Making, 23(98). https://doi.org/10.1186/s12911-023-02185-5

Muraru, M. M., Simó, Z., & Iantovics, L. B. (2024). Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods. Applied Sciences, 14(22). https://doi.org/https://doi.org/10.3390/app142210085

Nurdin, H., Carolina, I., Andharsaputri, R. L., Wuryanto, A., & Ridwansyah. (2024). Forward Selection as a Feature Selection Method in the SVM Kernel for Student Graduation Data. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(October), 2531–2537. https://doi.org/10.33395/sinkron.v8i4.14172

Purnama, J. J., Nawawi, H. M., Rosyida, S., Ridwansyah, & Risandar. (2020). Klasifikasi Mahasiswa Her Berbasis Algortima Svm Dan Decision Tree. Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(6), 1253–1260. https://doi.org/10.25126/jtiik.202073080

Ridwansyah, Iqbal, M., Destiana, H., Sugiono, & Hamid, A. (2024). Data Mining Berbasis Machine Learning Untuk Analitik Prediktif Dalam Kelulusan. SemanTIK, 10(2), 1–10. https://doi.org/https://doi.org/10.55679/semantik.v10i2.67

Ridwansyah, R., Riyanto, V., Hamid, A., Rahayu, S., & Purnama, J. J. (2022). Grouping Data in Predicting Infant Mortality Using K-Means and Decision Tree. Paradigma, 24(2), 168–174. https://doi.org/10.31294/paradigma.v24i2.1399

Salmi, M., Atif, D., Oliva, D., Abraham, A., & Sebastian Ventura. (2024). Handling imbalanced medical datasets: review of a decade of research. Artificial Intelligence Review (Springer Nature, 57(10). https://doi.org/10.1007/s10462-024-10884-2

Saputra, R. M., Alzami, F., Pramudi, Y. T. C., Erawan, L., Megantara, R. A., Ricardus Anggi Pramunendar, & Yusuf, M. (2025). Improving Cervical Cancer Classification Using ADASYN and Random Forest with GridSearchCV Optimization. Informatics, Electrical Engineering, and Mechanical Engineering, 16(1). https://doi.org/https://doi.org/10.35970/infotekmesin.v16i1.2552

Siregar, A. Y., & Arifin, A. S. (2024). Enhancing XGBoost Classification with SVM-SMOTE & EasyEnsemble for Imbalanced Telemedicine Sentiment Data. Jurnal Indonesia Sosial Teknologi, 5(10). https://doi.org/https://doi.org/10.59141/jist.v5i10.1160

Vazquez, B., Rojas-García, M., Rodríguez-Esquivel, J. I., Marquez-Acosta, J., Aranda-Flores, C. E., Cetina-Pérez, L. del C., Soto-López, S., Estévez-García, J. A., Bahena-Román, M., Madrid-Marina, V., & Torres-Poveda, K. (2025). Machine and Deep Learning for the Diagnosis, Prognosis, and Treatment of Cervical Cancer: A Scoping Review. Diagnostics, 15(12).

Yang, Y., Khorshidi, H. A., & Aickelin, U. (2024). A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems. Frontiers in Digital Health, 6(1430245). https://doi.org/10.3389/fdgth.2024.1430245

Downloads


Crossmark Updates

How to Cite

Sumarna, S., Astrilyana, A., Sugiono, S., Wijaya, G. ., & Desvia, Y. F. . (2026). A Statistical Benchmarking of Imbalance-Aware Ensemble Models for Cervical Cancer Prediction. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(2), 1070-1080. https://doi.org/10.33395/sinkron.v10i2.15995