Comparison of Xgboost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification

Authors

  • Lia Relita Sitompul Universitas Prima Indonesia
  • Adli Abdillah Nababan Universitas Prima Indonesia
  • Mey Lestari Manihuruk Universitas Prima Indonesia
  • Wildan Andika Ponsen Universitas Prima Indonesia
  • Supriyandi Universitas Prima Indonesia

DOI:

10.33395/sinkron.v9i2.14794

Keywords:

Stroke, machine learning, xgboost, random forest, logistic regression

Abstract

Stroke remains a critical global health concern, ranking as the second leading cause of mortality and third cause of disability worldwide. Early detection and accurate classification of stroke risk could significantly improve patient outcomes through timely interventions. This research evaluates and compares the performance of three machine learning algorithms—XGBoost, Random Forest, and Logistic Regression—for stroke disease classification using a dataset of 5,110 patient records with 12 attributes including demographic, lifestyle, and health factors. Due to significant data imbalance between stroke and non-stroke cases, Synthetic Minority Over-sampling Technique (SMOTE) was applied to enhance model performance. Comprehensive evaluation metrics including accuracy, precision, recall, and F1-score were utilized to assess each algorithm's effectiveness. Results demonstrate that XGBoost achieved superior performance with 95% accuracy, followed by Random Forest at 94% and Logistic Regression at 82%. Feature importance analysis identified age, average blood glucose level, and history of heart disease as the most significant predictors for stroke diagnosis. This study contributes to the advancement of clinical decision support systems by highlighting the effectiveness of ensemble learning approaches for stroke prediction, potentially enabling earlier interventions and improved patient management. These findings suggest that integration of machine learning tools in clinical settings could enhance stroke risk assessment, though further validation with diverse patient populations is recommended for broader implementation.

GS Cited Analysis

Downloads

Download data is not yet available.

Author Biographies

Lia Relita Sitompul, Universitas Prima Indonesia

Teknik Informatika

Adli Abdillah Nababan, Universitas Prima Indonesia

Sistem Informasi

Mey Lestari Manihuruk, Universitas Prima Indonesia

Teknik Informatika

Wildan Andika Ponsen, Universitas Prima Indonesia

Teknik Informatika

Supriyandi, Universitas Prima Indonesia

Teknik Informatika

References

Agustiningsih, A., Findawati, Y., & Alnarus Kautsar, I. (2023). CLASSIFICATION OF VOCATIONAL HIGH SCHOOL GRADUATES’ ABILITY IN INDUSTRY USING EXTREME GRADIENT BOOSTING (XGBOOST), RANDOM FOREST, AND LOGISTIC REGRESSION. Jurnal Teknik Informatika (Jutif), 4(4), 977–985. https://doi.org/10.52436/1.jutif.2023.4.4.945

Akmal, K., Faqih, A., & Dikananda, F. (2023). PERBANDINGAN METODE ALGORITMA NAÏVE BAYES DAN K-NEAREST NEIGHBORS UNTUK KLASIFIKASI PENYAKIT STROKE. In Jurnal Mahasiswa Teknik Informatika (Vol. 7, Issue 1). www.researchgate.net

Aliffiyanti Iskandar, N., Ernawati, I., & Widiastiwi, Y. (2022a). Klasifikasi Diagnosis Penyakit Stroke Dengan Menggunakan Metode Random Forest. https://www.kaggle.com/fedesoriano/stroke-

Aliffiyanti Iskandar, N., Ernawati, I., & Widiastiwi, Y. (2022b). Klasifikasi Diagnosis Penyakit Stroke Dengan Menggunakan Metode Random Forest. https://www.kaggle.com/fedesoriano/stroke-

Aulia, Y., Andriyansyah, A., Suharjito, S., & Nensi, S. W. (2024). Analisis Prediksi Stroke dengan Membandingkan Tiga Metode Klasifikasi Decision Tree, Naïve Bayes, dan Random Forest. Jurnal Ilmu Komputer Dan Informatika, 3(2), 89–98. https://doi.org/10.54082/jiki.90

Chen, R., Zhang, S., Li, J., Guo, D., Zhang, W., Wang, X., Tian, D., Qu, Z., & Wang, X. (2023). A study on predicting the length of hospital stay for Chinese patients with ischemic stroke based on the XGBoost algorithm. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02140-4

Dhar, T., Dey, N., Borra, S., & Sherratt, R. S. (2023). Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Transactions on Technology and Society, 4(1), 68–75. https://doi.org/10.1109/tts.2023.3234203

Fadmadika, F., Handayani, H. H., Mudzakir, T. Al, & Indra, J. (2024). PENGARUH SMOTE TERHADAP PERFORMA ALGORITMA RANDOM FOREST DAN ALGORITMA GRADIENT BOOSTING DALAM MEMPREDIKSI PENYAKIT STROKE. Jurnal Teknik Informasi Dan Komputer (Tekinkom), 7(2), 837. https://doi.org/10.37600/tekinkom.v7i2.1575

Hikmayanti Handayani, H., Ahmad Baihaqi, K., & Buana Perjuangan Karawang, U. (2023). Implementasi Algoritma Logistic Regression Untuk Klasifikasi Penyakit Stroke. In Syntax: Jurnal Informatika (Vol. 12, Issue 01).

Homepage, J., Akbar, F., Wira Saputra, H., Karel Maulaya, A., & Fikri Hidayat, M. (2022). MALCOM: Indonesian Journal of Machine Learning and Computer Science Implementation of Decision Tree Algorithm C4.5 and Support Vector Regression for Stroke Disease Prediction Implementasi Algoritma Decision Tree C4.5 dan Support Vector Regression untuk Prediksi Penyakit Stroke. 2, 61–67.

Ilham, M. A. R., Hunaifi, I., & Dirja, B. T. (2024). Effect of MLC901 on Red Cell Distribution Width (RDW) in Acute Ischemic Stroke: Literature Review. Jurnal Biologi Tropis, 24(2), 431–440. https://doi.org/10.29303/jbt.v24i2.6833

Islam, R., Debnath, S., & Palash, T. I. (2021). Predictive Analysis for Risk of Stroke Using Machine Learning Techniques. 6th International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering, IC4ME2 2021. https://doi.org/10.1109/IC4ME253898.2021.9768524

Israfil. (n.d.). AHMAR METASTASIS HEALTH JOURNAL The Role of Family in Stroke Patients in The Community: A Narrative Review. http://journal.ahmareduc.or.id/index.php/AMHJ

Jeong, J. S., Noh, Y., Cho, S. W., Hsieh, C. Y., Cho, Y., Shin, J. Y., & Kim, H. (2024). Association of higher potency statin use with risk of osteoporosis and fractures in patients with stroke in a Korean nationwide cohort study. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-81628-z

Jiang, B. (2025). Current Snapshots on Stroke Prevention and Control and More Proactive National Strategies Against It in China. Journal of Central Nervous System Disease, 17. https://doi.org/10.1177/11795735251337605

K V, H., P, H., Gupta, G., P, V., & K B, P. (2021). STROKE PREDICTION USING MACHINE LEARNING ALGORITHMS. International Journal of Innovative Research in Engineering & Management, 8(4). https://doi.org/10.21276/ijirem.2021.8.4.2

Luo, J., Tang, X., Li, F., Wen, H., Wang, L., Ge, S., Tang, C., Xu, N., & Lu, L. (2022). Cigarette Smoking and Risk of Different Pathologic Types of Stroke: A Systematic Review and Dose-Response Meta-Analysis. In Frontiers in Neurology (Vol. 12). Frontiers Media S.A. https://doi.org/10.3389/fneur.2021.772373

Maia da Silva, M., Almeida, A., Silva, B., Cabral, D., Brás, R. J., & Nunes, V. (2025). “Stroke: How to recognise it?” - Stroke intervention project in the school community of Sintra (Portugal). MedEdPublish, 15, 8. https://doi.org/10.12688/mep.20333.1

Mridha, K., Ghimire, S., Shin, J., Aran, A., Uddin, M. M., & Mridha, M. F. (2023). Automated Stroke Prediction Using Machine Learning: An Explainable and Exploratory Study With a Web Application for Early Intervention. IEEE Access, 11, 52288–52308. https://doi.org/10.1109/ACCESS.2023.3278273

Muhammad Fiqri Muslih Djaya, A., Promotif, P., Kesehatan Masyarakat, J., LSjattar, E., Majid, A., program Studi Magister Ilmu Keperawatan Universitas Hasanuddin, M., & Studi Magister Ilmu Keperawatan Universitas Hasanuddin, P. (n.d.). Risk Stratification Schemes dalam Mendeteksi Stroke pada Pasien Atrial Fibrillation Risk Stratification Schemes in Detecting Stroke in Atrial Fibrillation Patients.

Nora Marlim, Y., Suryati, L., & Agustina, N. (2022). Deteksi Dini Penyakit Diabetes Menggunakan Machine Learning dengan Algoritma Logistic Regression. In Jurnal Nasional Teknik Elektro dan Teknologi Informasi | (Vol. 11, Issue 2).

Pratama, R., Siregar, A. M., Lestari, S. A. P., & Faisal, S. (2024). IMPLEMENTATION OF DIABETES PREDICTION MODEL USING RANDOM FOREST ALGORITHM, K-NEAREST NEIGHBOR, AND LOGISTIC REGRESSION. Jurnal Teknik Informatika (Jutif), 5(4), 1165–1174. https://doi.org/10.52436/1.jutif.2024.5.4.2593

Putri, M. (n.d.). Prediksi Penyakit Stroke Menggunakan Machine Learning Dengan Algoritma Random Forest. Jurnal Infomedia: Teknik Informatika.

Rabbani Timur, I., Wayan Tunjung, I., Setiarini, R., Author, C., Program Studi Pendidikan Dokter, M., & Kedokteran, F. (2024). Relationship Between Family History of Stroke, Hypertension and Smoking History with The Incidence of Ischemic Stroke in Stroke Patients. https://doi.org/10.29303/jbt.v25i1.8495

Rahman, S., Hasan, M., & Sarkar, A. K. (2023). Prediction of Brain Stroke using Machine Learning Algorithms and Deep Neural Network Techniques. European Journal of Electrical Engineering and Computer Science, 7(1), 23–30. https://doi.org/10.24018/ejece.2023.7.1.483

Rice, H., de Villiers, L., Scarica, R., Bocquet, A. L., Dargan, K., & Barthe, T. (2024). Health budget implications of mechanical thrombectomy for acute ischaemic stroke in Australia. Journal of Medical Imaging and Radiation Oncology. https://doi.org/10.1111/1754-9485.13652

Riset, A., Yayan Yustika Saifullah, K., Erwin Rachman, M., Triana Limoa, L., & Hamado, N. (n.d.). FAKUMI MEDICAL JOURNAL Literature Review: Hubungan Hipertensi dengan Kejadian Stroke Iskemik dan Stroke Hemoragik.

Rohma, G., Asyafiiyah, U., & Akbar, R. M. (2024). Terbit online pada laman web jurnal: http://ejurnal.unim.ac.id/index.php/submit SUBMIT (Jurnal Ilmiah Teknologi Informasi dan Sains ) METODE LOGISTIC REGRESSION PREDICTION OF PATIENTS INDICATED WITH HEART DISEASE USING LOGISTIC REGRESSION METHOD. 4(1), 19–23. http://ejurnal.unim.ac.id/index.php/submit

Ruescas-Nicolau, M. A., Sánchez-Sánchez, M. L., Cortés-Amador, S., Pérez-Alenda, S., Arnal-Gómez, A., Climent-Toledo, A., & Carrasco, J. J. (2021). Validity of the international physical activity questionnaire long form for assessing physical activity and sedentary behavior in subjects with chronic stroke. International Journal of Environmental Research and Public Health, 18(9). https://doi.org/10.3390/ijerph18094729

Sailasya, G., & Aruna Kumari, G. L. (n.d.). Analyzing the Performance of Stroke Prediction using ML Classification Algorithms. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 12, Issue 6). www.ijacsa.thesai.org

Saleem, M. A., Javeed, A., Akarathanawat, W., Chutinet, A., Suwanwela, N. C., Asdornwised, W., Chaitusaney, S., Deelertpaiboon, S., Srisiri, W., Benjapolakul, W., & Kaewplung, P. (2024). Innovations in Stroke Identification: A Machine Learning-Based Diagnostic Model Using Neuroimages. IEEE Access, 12, 35754–35764. https://doi.org/10.1109/ACCESS.2024.3369673

Setyawan, N. H., & Wakhidah, N. (2025). ANALISIS PERBANDINGAN METODE LOGISTIC REGRESSION, RANDOM FOREST, GRADIENT BOOSTING UNTUK PREDIKSI DIABETES. JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 10(1), 150–162. https://doi.org/10.29100/jipi.v10i1.5743

Sulaeman, K. R. (n.d.). Analisis Algoritma Support Vector Machine Dalam Klasifikasi Penyakit Stroke Support Vector Machine Algorithm Analysis In Stroke Disease Classification.

Tazin, T., Alam, M. N., Dola, N. N., Bari, M. S., Bourouis, S., & Monirujjaman Khan, M. (2021). Stroke Disease Detection and Prediction Using Robust Learning Approaches. Journal of Healthcare Engineering, 2021. https://doi.org/10.1155/2021/7633381

Wahyu Setiyo Aji, P., Dijaya, R., & Sains dan Teknologi, F. (n.d.). KESATRIA: Jurnal Penerapan Sistem Informasi (Komputer & Manajemen) Prediksi Penyakit Stroke Menggunakan Metode Random Forest.

Wilson Sihaloho, R., & Keperawatan Darmo, A. (2021). The Influence Of The Role Of The Family On The Prevention Of Repeated Stroke In Medan Tuntungan Year 2020 under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Jurnal Eduhealth, 11. http://ejournal.seaninstitute.or.id/index.php/healt

Downloads


Crossmark Updates

How to Cite

Sitompul, L. R., Nababan, A. A., Manihuruk, M. L., Ponsen, W. A., & Supriyandi. (2025). Comparison of Xgboost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(2), 957-968. https://doi.org/10.33395/sinkron.v9i2.14794