A Comparative Study of Ensemble Learning and Neural Networks for the Heart Disease Prediction
DOI:
10.33395/sinkron.v9i1.14347Keywords:
Heart Disease Prediction, Ensemble Learning, Neural Networks, Machine Learning Models, Data PreprocessingAbstract
Heart disease continues to be a leading global cause of death, making the development of predictive models for early diagnosis a critical task. This study investigates the performance of various machine learning and deep learning models for heart disease prediction using a structured dataset of 918 observations and 11 features. The analysis includes ensemble methods like Random Forest, Gradient Boosting, and XGBoost, as well as neural networks such as Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). Traditional classifiers, including Support Vector Machines (SVM) and Logistic Regression, are also considered for benchmarking. The dataset was preprocessed using label encoding, standardization, and the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and ensure data consistency. Model evaluation was conducted using key metrics such as precision, recall, F1-score, and ROC-AUC. The results demonstrated that ensemble methods, particularly Random Forest (ROC-AUC: 0.9313) and Gradient Boosting (ROC-AUC: 0.9279), consistently delivered superior performance. Among neural networks, MLPs showed promising results (ROC-AUC: 0.9232), outperforming CNNs, which were less effective in handling tabular data. Meanwhile, TabNet was found to be unsuitable for this dataset, as it significantly underperformed across all metrics. This research highlights the effectiveness of ensemble methods and MLPs in heart disease prediction and the importance of proper preprocessing techniques. Future work could focus on integrating hybrid models or advanced optimization techniques to further enhance predictive accuracy in clinical settings.
Downloads
References
Abdollahi, J., Nouri-Moghaddam, B. & Ghazanfari, M. (2021). Deep Neural Network Based Ensemble learning Algorithms for the healthcare system (diagnosis of chronic diseases). ArXiv Preprint ArXiv:2103.08182.
Ahmad, T., Madonski, R., Zhang, D., Huang, C. & Mujeeb, A. (2022). Data-driven probabilistic machine learning in sustainable smart energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid paradigm. Renewable and Sustainable Energy Reviews, 160, 112128.
Ahsan, M. M. & Siddique, Z. (2022). Machine learning-based heart disease diagnosis: A systematic literature review. Artificial Intelligence in Medicine, 128, 102289.
Alkhawaldeh, I. M., Albalkhi, I. & Naswhan, A. J. (2023). Challenges and limitations of synthetic minority oversampling techniques in machine learning. World Journal of Methodology, 13(5), 373.
Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V. I. & Consortium, P. (2020). Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20, 1–9.
Amekoe, K. M., Azzag, H., Dagdia, Z. C., Lebbah, M. & Jaffre, G. (2024). Exploring accuracy and interpretability trade-off in tabular learning with novel attention-based models. Neural Computing and Applications, 36(30), 18583–18611.
Anitha, S., Varshini, E. K., Mahalakshmi, N. H. & Jishnu, S. (2024). Optimizing Multi-Class Text Classification Models for Imbalanced News Data. 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1–6.
Aria, M., Cuccurullo, C. & Gnasso, A. (2021). A comparison among interpretative proposals for Random Forests. Machine Learning with Applications, 6, 100094.
Asif, S., Wenhui, Y., ur-Rehman, S.-, ul-ain, Q.-, Amjad, K., Yueyang, Y., Jinhai, S. & Awais, M. (2024). Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision. Archives of Computational Methods in Engineering, 1–31.
Bachheti, R. K., Worku, L. A., Gonfa, Y. H., Zebeaman, M., Deepti, Pandey, D. P. & Bachheti, A. (2022). [Retracted] Prevention and Treatment of Cardiovascular Diseases with Plant Phytochemicals: A Review. Evidence-Based Complementary and Alternative Medicine, 2022(1), 5741198.
Bhavsar, K. A., Abugabah, A., Singla, J., AlZubi, A. A., Bashir, A. K. & others. (2021). A comprehensive review on medical diagnosis using machine learning. Computers, Materials and Continua, 67(2), 1997.
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L. & others. (2023). Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2), e1484.
Brownlee, J. (2020). Imbalanced classification with Python: better metrics, balance skewed classes, cost-sensitive learning. Machine Learning Mastery.
fedesoriano. (2021). Heart Failure Prediction Dataset. https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/data
Gangwal, A., Ansari, A., Ahmad, I., Azad, A. K. & Sulaiman, W. M. A. W. (2024). Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Computers in Biology and Medicine, 179, 108734.
Goyal, M. & Mahmoud, Q. H. (2024). A systematic review of synthetic data generation techniques using generative AI. Electronics, 13(17), 3509.
Gupta, P., Sehgal, N. K. & Acken, J. M. (2024). Practical Aspects in Machine Learning. In Introduction to Machine Learning with Security: Theory and Practice Using Python in the Cloud (pp. 281–330). Springer.
Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M. & Hussain, A. (2023). Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation, 1–30.
Heidari, M., Mirniaharikandehei, S., Khuzani, A. Z., Danala, G., Qiu, Y. & Zheng, B. (2020). Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. International Journal of Medical Informatics, 144, 104284.
Hossain, M. S., Betts, J. M. & Paplinski, A. P. (2021). Dual focal loss to address class imbalance in semantic segmentation. Neurocomputing, 462, 69–87.
Hussain, M. M., Rafi, U., Imran, A., Rehman, M. U. & Abbas, S. K. (2024). Risk Factors Associated with Cardiovascular Disorders: Risk Factors Associated with Cardiovascular Disorders. Pakistan BioMedical Journal, 3–10.
Imani, M. & Arabnia, H. R. (2023). Hyperparameter optimization and combined data sampling techniques in machine learning for customer churn prediction: a comparative analysis. Technologies, 11(6), 167.
Islam, M. A., Majumder, M. Z. H., Miah, M. S. & Jannaty, S. (2024). Precision healthcare: A deep dive into machine learning algorithms and feature selection strategies for accurate heart disease prediction. Computers in Biology and Medicine, 176, 108432.
Jan, B., Dar, M. I., Choudhary, B., Basist, P., Khan, R. & Alhalmi, A. (2024). Cardiovascular diseases among Indian older adults: A comprehensive review. Cardiovascular Therapeutics, 2024(1), 6894693.
Khan, Z., Anwar, S. & Sikandar, G. (2023). Heart Disease Prediction Using Hybrid Random Forest and Linear Model. International Journal of Emerging Engineering and Technology, 2(1), 6–12.
Kumar, S., Guruparan, D., Aaron, P., Telajan, P., Mahadevan, K., Davagandhi, D. & Yue, O. X. (2023). Deep learning in computational biology: Advancements, challenges, and future outlook. ArXiv Preprint ArXiv:2310.03086.
Kumar, V., Kedam, N., Sharma, K. V., Mehta, D. J. & Caloiero, T. (2023). Advanced machine learning techniques to improve hydrological prediction: A comparative analysis of streamflow prediction models. Water, 15(14), 2572.
La Cava, W. G., Lee, P. C., Ajmal, I., Ding, X., Solanki, P., Cohen, J. B., Moore, J. H. & Herman, D. S. (2023). A flexible symbolic regression method for constructing interpretable clinical prediction models. NPJ Digital Medicine, 6(1), 107.
Leeuwenberg, A. M., van Smeden, M., Langendijk, J. A., van der Schaaf, A., Mauer, M. E., Moons, K. G. M., Reitsma, J. B. & Schuit, E. (2022). Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods. Diagnostic and Prognostic Research, 6(1), 1.
Lourida, K. G. & Louridas, G. E. (2022). Clinical Phenotypes of Cardiovascular and Heart Failure Diseases Can Be Reversed? The Holistic Principle of Systems Biology in Multifaceted Heart Diseases. Cardiogenetics, 12(2), 142–169.
Malekloo, A., Ozer, E., AlHamaydeh, M. & Girolami, M. (2022). Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights. Structural Health Monitoring, 21(4), 1906–1955.
Mienye, I. D. & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149.
Miller, A., Panneerselvam, J. & Liu, L. (2022). A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors. Neurocomputing, 489, 466–485.
Montesinos López, O. A., Montesinos López, A. & Crossa, J. (2022). Overfitting, model tuning, and evaluation of prediction performance. In Multivariate statistical machine learning methods for genomic prediction (pp. 109–139). Springer.
Mumuni, A. & Mumuni, F. (2024). Automated data processing and feature engineering for deep learning and big data applications: a survey. Journal of Information and Intelligence.
of the Annual Report on Cardiovascular Health, W. C. & in China, D. (2023). Interpretation of the Annual Report on Cardiovascular Health and Diseases in China 2021. Cardiology Discovery, 3(04), 277–300.
Ogunpola, A., Saeed, F., Basurra, S., Albarrak, A. M. & Qasem, S. N. (2024). Machine learning-based predictive models for detection of cardiovascular diseases. Diagnostics, 14(2), 144.
Olsen, C. R., Mentz, R. J., Anstrom, K. J., Page, D. & Patel, P. A. (2020). Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure. American Heart Journal, 229, 1–17.
Osaba, E., Villar-Rodriguez, E., Del Ser, J., Nebro, A. J., Molina, D., LaTorre, A., Suganthan, P. N., Coello, C. A. C. & Herrera, F. (2021). A tutorial on the design, experimentation and application of metaheuristic algorithms to real-world optimization problems. Swarm and Evolutionary Computation, 64, 100888.
Petmezas, G., Papageorgiou, V. E., Vassilikos, V., Pagourelias, E., Tsaklidis, G., Katsaggelos, A. K. & Maglaveras, N. (2024). Recent advancements and applications of deep learning in heart failure: A systematic review. Computers in Biology and Medicine, 108557.
Rane, N., Choudhary, S. & Rane, J. (2024). Ensemble Deep Learning and Machine Learning: Applications, Opportunities, Challenges, and Future Directions. Opportunities, Challenges, and Future Directions (May 31, 2024).
Ritchey, M. D., Wall, H. K., George, M. G. & Wright, J. S. (2020). US trends in premature heart disease mortality over the past 50 years: Where do we go from here? Trends in Cardiovascular Medicine, 30(6), 364–374.
Sapna, F. N. U., Raveena, F. N. U., Chandio, M., Bai, K., Sayyar, M., Varrassi, G., Khatri, M., Kumar, S. & Mohamad, T. (2023). Advancements in heart failure management: a comprehensive narrative review of emerging therapies. Cureus, 15(10).
Shah, S. J., Borlaug, B. A., Kitzman, D. W., McCulloch, A. D., Blaxall, B. C., Agarwal, R., Chirinos, J. A., Collins, S., Deo, R. C., Gladwin, M. T. & others. (2020). Research priorities for heart failure with preserved ejection fraction: national heart, lung, and blood institute working group summary. Circulation, 141(12), 1001–1026.
Shehadeh, A., Alshboul, O., Al Mamlook, R. E. & Hamedat, O. (2021). Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Automation in Construction, 129, 103827.
Soltani, A. & Lee, C. L. (2024). The non-linear dynamics of South Australian regional housing markets: A machine learning approach. Applied Geography, 166, 103248.
Tang, J., Yang, Y., Wei, W., Shi, L., Su, L., Cheng, S., Yin, D. & Huang, C. (2024). Graphgpt: Graph instruction tuning for large language models. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 491–500.
Wang, Z. (2023). Predictive Learning from Real-World Medical Data: Overcoming Quality Challenges.
Xie, F., Yuan, H., Ning, Y., Ong, M. E. H., Feng, M., Hsu, W., Chakraborty, B. & Liu, N. (2022). Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies. Journal of Biomedical Informatics, 126, 103980.
Yahya, T., Jilani, M. H., Khan, S. U., Mszar, R., Hassan, S. Z., Blaha, M. J., Blankstein, R., Virani, S. S., Johansen, M. C., Vahidy, F. & others. (2020). Stroke in young adults: Current trends, opportunities for prevention and pathways forward. American Journal of Preventive Cardiology, 3, 100085.
Zhou, W., Yan, Z. & Zhang, L. (2024). A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction. Scientific Reports, 14(1), 5905.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Gregorius Airlangga, Oskar Ika Adi Nugroho, Bobi Hartanto Pramudita Lim

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.