Heart Disease Classification Using Optimised XGBoost and Random Forest with SHAP Explanations

Authors

  • Pancar Hizkia Hutagalung Sistem Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional
  • Andrianingsih Sistem Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional

DOI:

10.33395/sinkron.v10i1.15544

Keywords:

Heart Disease Prediction, Machine Learning, Ensemble Learning, XGBoost, Explainable Artificial Intelligence (XAI)

Abstract

Heart disease remains one of the leading causes of global morbidity, creating a need for accurate and interpretable computational tools to support early diagnosis. However, many existing studies on the Cleveland Heart Disease dataset rely on limited validation protocols, apply only a single hyperparameter optimisation strategy, or provide narrow explainability analyses, which can lead to optimistic performance estimates and inconsistent clinical insight. This study addresses these gaps by proposing a classification-based prediction framework that evaluates Random Forest and XGBoost for binary heart-disease classification under three hyperparameter optimisation strategies random search, Bayesian optimisation, and particle swarm optimisation (PSO) within a nested, anti-leakage cross-validation design, while SHAP is employed to analyse model interpretability across the best-performing configurations. The experimental results show that the ensemble classifiers achieve strong and consistent performance, with ROC–AUC values ranging from 0.8908 to 0.9089 across all scenarios; Random Forest optimised with PSO obtained the highest ROC–AUC (0.9089 ± 0.0146) and F1-score (0.8188 ± 0.0206), whereas XGBoost with Bayesian optimisation reached comparable performance without statistically significant differences. SHAP analyses identified oldpeak, ca, thal, cp, thalach, and exang as the most influential features, in line with established clinical indicators of myocardial ischemia and perfusion abnormalities. These findings indicate that combining tree-based ensemble classifiers with systematic hyperparameter optimisation and SHAP-based interpretability can enhance the reliability and transparency of heart-disease classification on the Cleveland dataset, while highlighting the need for further validation on contemporary, multi-centre clinical data.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Al-Alshaikh, H. A., Prabu, P., Poonia, R. C., Saudagar, A. K. J., Yadav, M., AlSagri, H. S., & AlSanad, A. A. (2024). Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-58489-7

Alkhanbouli, R., Matar Abdulla Almadhaani, H., Alhosani, F., & Simsekler, M. C. E. (2025). The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions. In BMC Medical Informatics and Decision Making (Vol. 25, Issue 1). BioMed Central Ltd. https://doi.org/10.1186/s12911-025-02944-6

Alsabhan, W., & Alfadhly, A. (2025). Effectiveness of machine learning models in diagnosis of heart disease: a comparative study. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-09423-y

Biswas, N., Ali, M. M., Rahaman, M. A., Islam, M., Mia, M. R., Azam, S., Ahmed, K., Bui, F. M., Al-Zahrani, F. A., & Moni, M. A. (2023). Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques. BioMed Research International, 2023. https://doi.org/10.1155/2023/6864343

Bouqentar, M. A., Terrada, O., Hamida, S., Saleh, S., Lamrani, D., Cherradi, B., & Raihani, A. (2024). Early heart disease prediction using feature engineering and machine learning algorithms. Heliyon, 10(19). https://doi.org/10.1016/j.heliyon.2024.e38731

Breiman, L. (2001). Random Forests (Vol. 45).

Chen, T., & Guestrin, C. (2016a). XGBoost: A Scalable Tree Boosting System. ACM International Conference Proceeding Series, 785–794. https://doi.org/10.1145/2939672.2939785

Chen, T., & Guestrin, C. (2016b). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. https://doi.org/10.1145/2939672.2939785

Hindarto, D. (2024a). Case Study: Gradient Boosting Machine vs Light GBM in Potential Landslide Detection. Journal of Computer Networks, Architecture and High Performance Computing, 6(1), 169–178. https://doi.org/10.47709/cnahpc.v6i1.3374

Hindarto, D. (2024b). Case Study: Gradient Boosting Machine vs Light GBM in Potential Landslide Detection. Journal of Computer Networks, Architecture and High Performance Computing, 6(1), 169–178. https://doi.org/10.47709/cnahpc.v6i1.3374

Hindarto, D., & Santoso, H. (2022). PERFORMANCE COMPARISON OF SUPERVISED LEARNING USING NON-NEURAL NETWORK AND NEURAL NETWORK. Janapati, 11, 49–62.

Islam, R. Bin, Akhter, S., Iqbal, F., Saif Ur Rahman, M., & Khan, R. (2023a). Deep learning based object detection and surrounding environment description for visually impaired people. Heliyon, 9(6), e16924. https://doi.org/10.1016/j.heliyon.2023.e16924

Islam, R. Bin, Akhter, S., Iqbal, F., Saif Ur Rahman, M., & Khan, R. (2023b). Deep learning based object detection and surrounding environment description for visually impaired people. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e16924

Janosi Andras, S. W. P. M., & Detrano, R. (1989). Heart Disease.

Lamir, A. A., Razzagzadeh, S., & Rezaei, Z. (n.d.). A Comprehensive Machine Learning Framework for Heart Disease Prediction: Performance Evaluation and Future Perspectives.

Rezk, N. G., Alshathri, S., Sayed, A., El-Din Hemdan, E., & El-Behery, H. (2024). XAI-Augmented Voting Ensemble Models for Heart Disease Prediction: A SHAP and LIME-Based Approach. Bioengineering, 11(10). https://doi.org/10.3390/bioengineering11101016

Sanni, R. R., & Guruprasad, H. S. (2021a). Analysis of Performance Metrics of Heart Failured Patients using Python and Machine Learning Algorithms. Global Transitions Proceedings, 0–8. https://doi.org/10.1016/j.gltp.2021.08.028

Sanni, R. R., & Guruprasad, H. S. (2021b). Analysis of performance metrics of heart failured patients using Python and machine learning algorithms. Global Transitions Proceedings, 2(2), 233–237. https://doi.org/10.1016/j.gltp.2021.08.028

Teja, M. D., & Rayalu, G. M. (2025). Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance. BMC Cardiovascular Disorders, 25(1). https://doi.org/10.1186/s12872-025-04627-6

Downloads


Crossmark Updates

How to Cite

Hutagalung, P. H., & Andrianingsih, A. (2026). Heart Disease Classification Using Optimised XGBoost and Random Forest with SHAP Explanations. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 330-342. https://doi.org/10.33395/sinkron.v10i1.15544