Comparison Of The C.45 And Naive Bayes Algorithms To Predict Diabetes
DOI:
10.33395/sinkron.v8i4.12998Keywords:
Decision Tree Algorithm C4,5, Data Mining, Diabetes, Naïve BayesAbstract
Diabetes mellitus is an urgent global health problem and has a major impact on people around the world. This disease is characterized by high levels of sugar (glucose) in the blood due to disturbances in the production or use of the hormone insulin by the body. This study aims to carry out accurate early detection of diabetics so that they can be treated as soon as possible to reduce the risk of death and to compare the two algorithms that have the best level of accuracy. The algorithms used in this study are the C4.5 and Naïve Bayes Decision Tree Algorithms. The results of the experiments carried out in this study the Decision Tree Algorithm C4.5 and Naïve Bayes can be used in modeling the early detection of diabetes. The highest average accuracy results were obtained at 90.835% using the Decision Tree C4.5 Algorithm. As for the Naïve Bayes Algorithm, an average accuracy rate of 90.745% is obtained. The pruning process was carried out using the Decision Tree Algorithm C4.5, the accuracy performance increased to 91.30%. There were 18 patterns or rules for the early detection of diabetics from the built model. The determination of attributes, the number of attribute dimensions, and the number of samples greatly affect the performance of the model built.
Downloads
References
Anggraini, S., Defit, S., & Nurcahyo, G. W. (2018). Analisis Data Mining Penjualan Ban Menggunakan Algoritma C4. 5. Jurnal Ilmu Teknik Elektro …, 5(2), 0–7. https://core.ac.uk/download/pdf/295348196.pdf
Chen, S., Webb, G. I., Liu, L., & Ma, X. (2020). A novel selective naïve Bayes algorithm. Knowledge-Based Systems, 192(xxxx), 105361. https://doi.org/10.1016/j.knosys.2019.105361
Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., Wang, X., Bian, H., Zhang, S., Pradhan, B., & Ahmad, B. Bin. (2020). Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Science of the Total Environment, 701, 134979. https://doi.org/10.1016/j.scitotenv.2019.134979
Choubey, D. K., Kumar, P., Tripathi, S., & Kumar, S. (2020). Performance evaluation of classification methods with PCA and PSO for diabetes. Network Modeling Analysis in Health Informatics and Bioinformatics, 9(1). https://doi.org/10.1007/s13721-019-0210-8
Enriko, I. K. A., Melinda, M., Sulyani, A. C., & Astawa, I. G. B. (2021). Breast cancer recurrence prediction system using k-nearest neighbor, naïve-bayes, and support vector machine algorithm. Jurnal Infotel, 13(4), 185–188. https://doi.org/10.20895/infotel.v13i4.692
Fajriati, N., Prasetiyo, B., Semarang, U. N., & Korespondensi, P. (2023). Optimasi algoritma naïve bayes dengan diskritisasi k-means optimization of naïve bayes algorithm using k-means discretization in heart disease diagnosis. 10(3), 503–512. https://doi.org/10.25126/jtiik.2023106510
Fersellia, F., Utami, E., & Yaqin, A. (2023). Sentiment Analysis of Shopee Food Application User Satisfaction Using the C4.5 Decision Tree Method. Sinkron, 8(3), 1554–1563. https://doi.org/10.33395/sinkron.v8i3.12531
Gadekallu, T. R., Khare, N., Bhattacharya, S., Singh, S., Maddikunta, P. K. R., Ra, I. H., & Alazab, M. (2020). Early detection of diabetic retinopathy using pca-firefly based deep learning model. Electronics (Switzerland), 9(2), 1–16. https://doi.org/10.3390/electronics9020274
Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516–76531. https://doi.org/10.1109/ACCESS.2020.2989857
Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., & Stiglic, G. (2020). Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Scientific Reports, 10(1), 1–12. https://doi.org/10.1038/s41598-020-68771-z
Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2(November 2020), 40–46. https://doi.org/10.1016/j.ijcce.2021.01.001
Nagaraj, P., Deepalakshmi, P., Mansour, R. F., & Almazroa, A. (2021). Artificial flora algorithm-based feature selection with gradient boosted tree model for diabetes classification. Diabetes, Metabolic Syndrome and Obesity, 14, 2789–2806. https://doi.org/10.2147/DMSO.S312787
Patra, K. C., Sethi, R. N., & Behera, D. K. (2021). Benchmark of Unsupervised Machine Learning Algorithms for Condition Monitoring. In Lecture Notes in Networks and Systems: Vol. 185 LNNS. https://doi.org/10.1007/978-981-33-6081-5_17
Rianto, H., Amrin, Rudianto, Pahlevi, O., Kusumawardhani, P., & Hadi, S. S. (2020). Determining the Eligibility of Providing Motorized Vehicle Loans by Using the Logistic Regression, Naive Bayes and Decission Tree (C4.5). Journal of Physics: Conference Series, 1641(1). https://doi.org/10.1088/1742-6596/1641/1/012061
Shrinivasan, L., Verma, R., & Nandeesh, M. D. (2023). Early prediction of diabetes diagnosis using hybrid classification techniques. IAES International Journal of Artificial Intelligence, 12(3), 1139–1148. https://doi.org/10.11591/ijai.v12.i3.pp1139-1148
Singh, S., & Yassine, A. (2018). Big data mining of energy time series for behavioral analytics and energy consumption forecasting. Energies, 11(2). https://doi.org/10.3390/en11020452
Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004
Tigga, N. P., & Garg, S. (2020). Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Computer Science, 167(2019), 706–716. https://doi.org/10.1016/j.procs.2020.03.336
Wong, T., & Yeh, P. (n.d.). 10.1109@Tkde.2019.2912815. 1, 1.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2023 Alam, Divi Adiffia Freza Alana, Christina Juliane
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.