Comparative Analysis of XGBoost, KNN, and SVM Algorithms for Heart Disease Prediction Using SMOTE-Tomek Balancing

Authors

  • Yuliana Department of Informatics, STMIK Time, Medan, Indonesia
  • Robet Department of Informatics, STMIK Time, Medan, Indonesia
  • Leony Hoki Department of Informatics, STMIK Time, Medan, Indonesia

DOI:

10.33395/sinkron.v10i1.15469

Keywords:

Heart Disease; Machine Learning; K-Nearest Neighbors; Support Vector Machine; XGBoost;

Abstract

Heart disease remains one of the leading causes of death worldwide, making early detection crucial for improving patient outcomes. This study aims to evaluate and compare the performance of several machine learning algorithms in detecting heart disease using the 2015 BRFSS dataset, which includes responses from 253,680 individuals. The three algorithms examined are Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The data preprocessing steps involved feature encoding, class imbalance handling using the Synthetic Minority Over-sampling Technique combined with Tomek Links (SMOTE-Tomek), and hyperparameter tuning through RandomizedSearchCV. The models were assessed on a hold-out validation set using several metrics, including accuracy, Receiver Operating Characteristic-Area Under the Curve (ROC-AUC), F1-score, precision, and recall. The results demonstrated that XGBoost achieved the highest performance, with an accuracy of 94%, a ROC-AUC score of 0.98, and an F1-score of 0.94. In comparison, KNN achieved an accuracy of 87% (ROC-AUC 0.95), while SVM attained an accuracy of 79% (ROC-AUC 0.86). These findings suggest that XGBoost is a robust model for large-scale heart disease classification and holds potential for implementation in clinical decision support systems.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Adi, S, Wintarti, A (2022). Komparasi Metode Support Vector Machine (SVM), K-Nearest Neighbors (KNN), dan Random Forest (RF) Untuk Prediksi Penyakit Gagal Jantung. MATHunesa.

Andani, M., Triloka, J., Irianto, S. Y., & Nugroho, H. W. (2025). Comparison of K-Nearest Neighbor, Naive Bayes, Random Forest Algorithms for Obesity Prediction. Sinkron, 9(1), 502–510. https://doi.org/10.33395/sinkron.v9i1.14478

Arif, S. N. N., Siregar, A. M., Faisal, S., & Juwita, A. R. (2024). Klasifikasi Penyakit Serangan Jantung Menggunakan Metode Machine Learning K-Nearest Neighbors (KNN) dan Support Vector Machine (SVM). JURNAL MEDIA INFORMATIKA BUDIDARMA, 8(3), 1617. https://doi.org/10.30865/mib.v8i3.7844

Arjun Vahlevy, D., Levis Putra Zendrato, E., Fadillah, R., & Jafar Sidiq, R. (2023). Tinjauan Literatur Sistematik pada Sistem Pakar untuk Diagnosa Penyakit Manusia. Jurnal Artificial Inteligent Dan Sistem Penunjang Keputusan, 1(1). https://garuda.kemdikbud.go.id/.

Derisma. (2020). Perbandingan Kinerja Algoritma untuk Prediksi Penyakit Jantung dengan Teknik Data Mining. In Journal of Applied Informatics and Computing (JAIC) (Vol. 4, Issue 1). http://jurnal.polibatam.ac.id/index.php/JAIC

Hairani, H., Anggrawan, A., & Priyanto, D. (2023). Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. International Journal on Informatics Visualization, 7(1), 258–264. https://doi.org/10.30630/joiv.7.1.1069

Hidayat, R., Sy, Y. S., Sujana, T., Husnah, M., Saputra, H. T., & Okmayura, F. (2024). Implementasi Machine Learning Untuk Prediksi Penyakit Jantung Menggunakan Algoritma Support Vector Machine. BIOS : Jurnal Teknologi Informasi Dan Rekayasa Komputer, 5(2), 161–168. https://doi.org/10.37148/bios.v5i2.152

Maskuri, M. N., Sukerti, K., & Herdian Bhakti, R. M. (2022). Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Memprediksi Penyakit Stroke Stroke Desease Predict Using KNN Algorithm. Jurnal Ilmiah Intech : Information Technology Journal of UMUS, 4(1).

Mayang Pratiwi, D., & Mufidah, L. (2024). Perbandingan Metode Decision Tree Classifier dan XGBoost Classifier Dalam Memprediksi Penyakit Jantung (Vol. 4, Issue 1).

Natsir, F. M., Yusliana, R., Universitas, B., Makassar, M., Wahyuni, T., & Muhammadiyah Makassar, U. (2024). Arus Jurnal Sains dan Teknologi (AJST) Analisis Deteksi Dini Penyakit Jantung dengan Pendekatan Support Vector Machine pada Data Pasien INFO PENULIS. 2(2). http://jurnal.ardenjaya.com/index.php/ajsthttp://jurnal.ardenjaya.com/index.php/ajst

Nugraha, W. (2022). Prediksi Penyakit Jantung Cardiovascular Menggunakan Model Algoritma Klasifikasi. Jurnal Sigmata

Pramudhyta, N. A., & Rohman, M. S. (2024). Perbandingan Optimasi Metode Grid Search dan Random Search dalam Algoritma XGBoost untuk Klasifikasi Stunting. Jurnal Media Informatika Budidarma, 8(1), 19. https://doi.org/10.30865/mib.v8i1.6965

Rahman, H., & Agusman, R. (2024). Model Prediksi Penyakit Jantung Menggunkan Machine Learning. In Tata Sutabri Jurnal Ilmiah Betrik (Vol. 15, Issue 03).

Rasid, A., & Kenedy, S. (2023). Implementation Of Support Vector Machine Algorithm With Hyper-Tuning Randomized Search In Stroke Prediction. In Journal of Information Systems and Computer Science Prima) (Vol. 6, Issue 2).

Ratantja Kusumajati, F., Rahmat, B., & Junaidi, A. (2024). Implementation Of Balancing Data Method Using Smotetomek In Diabetes Classification Using Xgboost a. 12(4).

Sah, A., Niesa, C., Jafar, R. R., & Muharrom, M. (2025). Analisis Model Prediksi Penyakit Jantung Menggunakan Adaptive Boosting, Gradient Boosting, dan Extreme Gradient Boosting. Jurnal Ilmiah FIFO, 17(1), 46. https://doi.org/10.22441/fifo.2025.v17i1.006

Shabrina Assyifa, D., & Luthfiarta, A. (2024). SMOTE-Tomek Re-sampling Based on Random Forest Method to Overcome Unbalanced Data for Multi-class Classification. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 9(2), 151–160. https://doi.org/10.25139/inform.v9i2.8410

Sukamto, T. F., Prameswary, C. L., Royadi, D., & Sofia, D. (2025). Diabetes Disease Prediction on Unbalanced Data Using SMOTE-Tomek Links and Random Forest Algorithm. G-Tech: Jurnal Teknologi Terapan, 9(3), 1194–1203. https://doi.org/10.70609/g-tech.v9i3.7164

Sumantiawan, D. I., Suseno, J. E., & Syafei, W. A. (2023). Sentiment Analysis of Customer Reviews Using Support Vector Machine and Smote-Tomek Links For Identify Customer Satisfaction. J. Sistem Info. Bisnis, 13(1), 1–9. https://doi.org/10.21456/vol13iss1pp1-9

Surono, M., Fadli, M., Purwamti, D. S., Susanto, E. R., & Komputer, M. I. (2025). INSOLOGI: Jurnal Sains dan Teknologi Hybrid XGBoost-SVM Model untuk Sistem Pendukung Keputusan dalam Prediksi Penyakit Diabetes. Media Cetak, 4(3), 443–454. https://doi.org/10.55123/insologi.v4i3.5410

Yogianto, A., Homaidi, A., & Fatah, Z. (2024). Implementasi Metode K-Nearest Neighbors (KNN) untuk Klasifikasi Penyakit Jantung. G-Tech: Jurnal Teknologi Terapan, 8(3), 1720–1728. https://doi.org/10.33379/gtech.v8i3.4495

Downloads


Crossmark Updates

How to Cite

Yuliana, Y., Robet, R., & Hoki, L. . (2026). Comparative Analysis of XGBoost, KNN, and SVM Algorithms for Heart Disease Prediction Using SMOTE-Tomek Balancing. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 305-314. https://doi.org/10.33395/sinkron.v10i1.15469

Most read articles by the same author(s)