Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes


  • Alam School of Business and Information Technology, STMIK LIKMI Bandung – Indonesia
  • Divi Adiffia Freza Alana School of Business and Information Technology, STMIK LIKMI Bandung – Indonesia
  • Christina Juliane School of Business and Information Technology, STMIK LIKMI Bandung – Indonesia




Decision Tree Algorithm C4,5, Data Mining, Diabetes, Naïve Bayes


Diabetes mellitus is an urgent global health problem and has a major impact on people around the world. This disease is characterized by high levels of sugar (glucose) in the blood due to disturbances in the production or use of the hormone insulin by the body. This study aims to carry out accurate early detection of diabetics so that they can be treated as soon as possible to reduce the risk of death and to compare the two algorithms that have the best level of accuracy. The algorithms used in this study are the C4.5 and Naïve Bayes Decision Tree Algorithms. The results of the experiments carried out in this study the Decision Tree Algorithm C4.5 and Naïve Bayes can be used in modeling the early detection of diabetes. The highest average accuracy results were obtained at 90.835% using the Decision Tree C4.5 Algorithm. As for the Naïve Bayes Algorithm, an average accuracy rate of 90.745% is obtained. The pruning process was carried out using the Decision Tree Algorithm C4.5, the accuracy performance increased to 91.30%. There were 18 patterns or rules for the early detection of diabetics from the built model. The determination of attributes, the number of attribute dimensions, and the number of samples greatly affect the performance of the model built.

GS Cited Analysis


Download data is not yet available.


Anggraini, S., Defit, S., & Nurcahyo, G. W. (2018). Analisis Data Mining Penjualan Ban Menggunakan Algoritma C4. 5. Jurnal Ilmu Teknik Elektro …, 5(2), 0–7.

Chen, S., Webb, G. I., Liu, L., & Ma, X. (2020). A novel selective naïve Bayes algorithm. Knowledge-Based Systems, 192(xxxx), 105361.

Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., Wang, X., Bian, H., Zhang, S., Pradhan, B., & Ahmad, B. Bin. (2020). Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Science of the Total Environment, 701, 134979.

Choubey, D. K., Kumar, P., Tripathi, S., & Kumar, S. (2020). Performance evaluation of classification methods with PCA and PSO for diabetes. Network Modeling Analysis in Health Informatics and Bioinformatics, 9(1).

Enriko, I. K. A., Melinda, M., Sulyani, A. C., & Astawa, I. G. B. (2021). Breast cancer recurrence prediction system using k-nearest neighbor, naïve-bayes, and support vector machine algorithm. Jurnal Infotel, 13(4), 185–188.

Fajriati, N., Prasetiyo, B., Semarang, U. N., & Korespondensi, P. (2023). Optimasi algoritma naïve bayes dengan diskritisasi k-means optimization of naïve bayes algorithm using k-means discretization in heart disease diagnosis. 10(3), 503–512.

Fersellia, F., Utami, E., & Yaqin, A. (2023). Sentiment Analysis of Shopee Food Application User Satisfaction Using the C4.5 Decision Tree Method. Sinkron, 8(3), 1554–1563.

Gadekallu, T. R., Khare, N., Bhattacharya, S., Singh, S., Maddikunta, P. K. R., Ra, I. H., & Alazab, M. (2020). Early detection of diabetic retinopathy using pca-firefly based deep learning model. Electronics (Switzerland), 9(2), 1–16.

Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516–76531.

Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., & Stiglic, G. (2020). Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific Reports, 10(1), 1–12.

Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2(November 2020), 40–46.

Nagaraj, P., Deepalakshmi, P., Mansour, R. F., & Almazroa, A. (2021). Artificial flora algorithm-based feature selection with gradient boosted tree model for diabetes classification. Diabetes, Metabolic Syndrome and Obesity, 14, 2789–2806.

Patra, K. C., Sethi, R. N., & Behera, D. K. (2021). Benchmark of Unsupervised Machine Learning Algorithms for Condition Monitoring. In Lecture Notes in Networks and Systems: Vol. 185 LNNS.

Rianto, H., Amrin, Rudianto, Pahlevi, O., Kusumawardhani, P., & Hadi, S. S. (2020). Determining the Eligibility of Providing Motorized Vehicle Loans by Using the Logistic Regression, Naive Bayes and Decission Tree (C4.5). Journal of Physics: Conference Series, 1641(1).

Shrinivasan, L., Verma, R., & Nandeesh, M. D. (2023). Early prediction of diabetes diagnosis using hybrid classification techniques. IAES International Journal of Artificial Intelligence, 12(3), 1139–1148.

Singh, S., & Yassine, A. (2018). Big data mining of energy time series for behavioral analytics and energy consumption forecasting. Energies, 11(2).

Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441.

Tigga, N. P., & Garg, S. (2020). Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Computer Science, 167(2019), 706–716.

Wong, T., & Yeh, P. (n.d.). 10.1109@Tkde.2019.2912815. 1, 1.


Crossmark Updates

How to Cite

Alam, A., Alana, D. A. F. ., & Juliane, C. . (2023). Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(4), 2641-2650.