Heart Disease Classification Using Optimised XGBoost and Random Forest with SHAP Explanations
DOI:
10.33395/sinkron.v10i1.15544Keywords:
Heart Disease Prediction, Machine Learning, Ensemble Learning, XGBoost, Explainable Artificial Intelligence (XAI)Abstract
Heart disease remains one of the leading causes of global morbidity, creating a need for accurate and interpretable computational tools to support early diagnosis. However, many existing studies on the Cleveland Heart Disease dataset rely on limited validation protocols, apply only a single hyperparameter optimisation strategy, or provide narrow explainability analyses, which can lead to optimistic performance estimates and inconsistent clinical insight. This study addresses these gaps by proposing a classification-based prediction framework that evaluates Random Forest and XGBoost for binary heart-disease classification under three hyperparameter optimisation strategies random search, Bayesian optimisation, and particle swarm optimisation (PSO) within a nested, anti-leakage cross-validation design, while SHAP is employed to analyse model interpretability across the best-performing configurations. The experimental results show that the ensemble classifiers achieve strong and consistent performance, with ROC–AUC values ranging from 0.8908 to 0.9089 across all scenarios; Random Forest optimised with PSO obtained the highest ROC–AUC (0.9089 ± 0.0146) and F1-score (0.8188 ± 0.0206), whereas XGBoost with Bayesian optimisation reached comparable performance without statistically significant differences. SHAP analyses identified oldpeak, ca, thal, cp, thalach, and exang as the most influential features, in line with established clinical indicators of myocardial ischemia and perfusion abnormalities. These findings indicate that combining tree-based ensemble classifiers with systematic hyperparameter optimisation and SHAP-based interpretability can enhance the reliability and transparency of heart-disease classification on the Cleveland dataset, while highlighting the need for further validation on contemporary, multi-centre clinical data.
Downloads
References
Al-Alshaikh, H. A., Prabu, P., Poonia, R. C., Saudagar, A. K. J., Yadav, M., AlSagri, H. S., & AlSanad, A. A. (2024). Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-58489-7
Alkhanbouli, R., Matar Abdulla Almadhaani, H., Alhosani, F., & Simsekler, M. C. E. (2025). The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions. In BMC Medical Informatics and Decision Making (Vol. 25, Issue 1). BioMed Central Ltd. https://doi.org/10.1186/s12911-025-02944-6
Alsabhan, W., & Alfadhly, A. (2025). Effectiveness of machine learning models in diagnosis of heart disease: a comparative study. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-09423-y
Biswas, N., Ali, M. M., Rahaman, M. A., Islam, M., Mia, M. R., Azam, S., Ahmed, K., Bui, F. M., Al-Zahrani, F. A., & Moni, M. A. (2023). Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques. BioMed Research International, 2023. https://doi.org/10.1155/2023/6864343
Bouqentar, M. A., Terrada, O., Hamida, S., Saleh, S., Lamrani, D., Cherradi, B., & Raihani, A. (2024). Early heart disease prediction using feature engineering and machine learning algorithms. Heliyon, 10(19). https://doi.org/10.1016/j.heliyon.2024.e38731
Breiman, L. (2001). Random Forests (Vol. 45).
Chen, T., & Guestrin, C. (2016a). XGBoost: A Scalable Tree Boosting System. ACM International Conference Proceeding Series, 785–794. https://doi.org/10.1145/2939672.2939785
Chen, T., & Guestrin, C. (2016b). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. https://doi.org/10.1145/2939672.2939785
Hindarto, D. (2024a). Case Study: Gradient Boosting Machine vs Light GBM in Potential Landslide Detection. Journal of Computer Networks, Architecture and High Performance Computing, 6(1), 169–178. https://doi.org/10.47709/cnahpc.v6i1.3374
Hindarto, D. (2024b). Case Study: Gradient Boosting Machine vs Light GBM in Potential Landslide Detection. Journal of Computer Networks, Architecture and High Performance Computing, 6(1), 169–178. https://doi.org/10.47709/cnahpc.v6i1.3374
Hindarto, D., & Santoso, H. (2022). PERFORMANCE COMPARISON OF SUPERVISED LEARNING USING NON-NEURAL NETWORK AND NEURAL NETWORK. Janapati, 11, 49–62.
Islam, R. Bin, Akhter, S., Iqbal, F., Saif Ur Rahman, M., & Khan, R. (2023a). Deep learning based object detection and surrounding environment description for visually impaired people. Heliyon, 9(6), e16924. https://doi.org/10.1016/j.heliyon.2023.e16924
Islam, R. Bin, Akhter, S., Iqbal, F., Saif Ur Rahman, M., & Khan, R. (2023b). Deep learning based object detection and surrounding environment description for visually impaired people. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e16924
Janosi Andras, S. W. P. M., & Detrano, R. (1989). Heart Disease.
Lamir, A. A., Razzagzadeh, S., & Rezaei, Z. (n.d.). A Comprehensive Machine Learning Framework for Heart Disease Prediction: Performance Evaluation and Future Perspectives.
Rezk, N. G., Alshathri, S., Sayed, A., El-Din Hemdan, E., & El-Behery, H. (2024). XAI-Augmented Voting Ensemble Models for Heart Disease Prediction: A SHAP and LIME-Based Approach. Bioengineering, 11(10). https://doi.org/10.3390/bioengineering11101016
Sanni, R. R., & Guruprasad, H. S. (2021a). Analysis of Performance Metrics of Heart Failured Patients using Python and Machine Learning Algorithms. Global Transitions Proceedings, 0–8. https://doi.org/10.1016/j.gltp.2021.08.028
Sanni, R. R., & Guruprasad, H. S. (2021b). Analysis of performance metrics of heart failured patients using Python and machine learning algorithms. Global Transitions Proceedings, 2(2), 233–237. https://doi.org/10.1016/j.gltp.2021.08.028
Teja, M. D., & Rayalu, G. M. (2025). Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance. BMC Cardiovascular Disorders, 25(1). https://doi.org/10.1186/s12872-025-04627-6
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Pancar Hizkia Hutagalung, Andrianingsih

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit




















