Improved Accuracy In Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost)
DOI:
10.33395/sinkron.v8i2.12055Keywords:
Counfusion Matrix, Adaptive Boosting, C5.0 algorithm, Decision Tree, Data MiningAbstract
The Decision Tree algorithm is a data mining method algorithm that is often applied as a solution to a problem for a classification. The Decision Tree C5.0 algorithm has several weaknesses, including: the C5.0 algorithm and several other decision tree methods are often biased towards modeling whose features have many levels, some problems for the model can occur such as over-fit or under-fit challenges, big changes to decision logic can result in small changes to data training, C5.0 can experience modeling inconvenience, data imbalance causes low accuracy in C5.0 algorithm. The boosting algorithm is an iterative algorithm that gives different weights to the distribution of training data in each iteration. Each iteration of boosting adds weight to examples of misclassification and decreases weight to examples of correct classification, thereby effectively changing the distribution of the training data. One example of a boosting algorithm is adaboost. The purpose of this research is to improve the performance of the Decision Tree C5.0 classification method using adaptive boosting (adaboost) to predict hepatitis disease using the Confusion matrix. Tests that have been carried out with the Confusion Matrix use the Hepatitis dataset in the Decision Tree C5.0 classification which has an accuracy rate of 80.58% with a classification error rate of 19.15%. Whereas in the Decision Tree C5.0 classification Adaboost has a higher accuracy rate of 82.98%, a classification error rate of 17.02%. This difference is caused by the adaboost algorithm, because the adaboost algorithm is able to change a weak classifier into a strong classifier by increasing the weight of the observations, and adaboost is also able to reduce the classifier error rate.
Downloads
References
Bahramian, S. and Nikravanshalmani, A. 2016. Hybrid Algorithm based on K-Nearest Neighbor Algorithm and Adaboost with Selection Of Feature By Genetic Algorithms for the Diagnosis of Diabetes. International Journal Of Mechatronics, Electrical and Computer Technology (IJMEC) 6(21):2977-2986.
Gorunescu, F. Data Mining: Concepts Models, and Techniques.: Springer, 2011
Han, J., Kamber, M., 2001, “Data Mining Concepts and Techniques”, Morgan Kaufman Pub., USA.
Hendra., Azis, M.A. and Suhardjono. 2020. Analysis of Student Graduation Predictions Using a Decission Tree Based on Particle Swarm Optimization. Journal of SISFOKOM (Computer and Information Systems), pp. 102-107.
Hepatitis B virus DNA is formed through distinct repair processes of each strand
Ihsan. 2018. Attribute Reduction in the K-Nearest Neighbor (Knn) Algorithm Using a Genetic Algorithm. [Thesis]. Medan: University of North Sumatra, Postgraduate.
Napa, K.K. and Dhamodaran, V. 2019. Hepatitis- Infectious Disease Prediction using Classification Algorithms. Research Journal of Pharmacy and Technology, pp. 3720-3725.
Pant, H. and Srivastava, R. 2015. A Survey on Feature Selection Methods for Imbalanced Datasets. International Journal of Computer Engineering & Application, Vol IX, Issue II . pp 197 – 204.
Patil, N., Lathi, R. and Chitre, V., “Comparison of C5.0 & CART Classification algorithms using pruning Technique”, International Journal of Engineering Research & Technology (IJERT), Volume.1, Issue.4, June 2012, pp: 1-5.
Perveen, S., Shahbaz, M., Guergachi, A. and Keshavjee, K. 2016. Performance Analysis of Data Mining Classification Techniques to Predict Diabetes. Procedia Computer Science 82 (2016), pp. 115-121.
Saifudin, A., and Wahono, R. S. (2015). Application of Ensemble Techniques to Handle Class Imbalances in Software Flaw Prediction. Journal of Software Engineering, 1(1), 28–37. https://doi.org/10.1016/S1896- 1126(14)00030-3n
Schapire, R.E. 2013. Explaining Adaboost. Dept of Computer Science. Princeton University: USA.
Shakeel. P.M., Tolba, A., Al-Makhadmeh, Z. and Jaber, M.M. 2019. Automatic Detection Of Lung Cancer From Biomedical Data Sets Using Discrete Adaboost Optimized Ensemble Learning Generalized Neural Networks. Neutral Computing and Applications. Springer.
Sudarto. 2016. Analysis of Handling Class Imbalance Using Density Based Feature Selection (DBFS) and Adaptive Boosting (Adaboost). [Thesis]. Medan: University of North Sumatra, Postgraduate Program
Taufiqurrahman, A., Putrada, A.G and Dawani, F. 2020. Decison Tree Regression with Adaboost Ensemble Learning For Water Temperature Forecasting in Aquaponic Ecosystem. 2020 6th International Conference on Interactive Digital Media (ICIDM), Vol 12 No 1, February 2020, pp. 1-10..
www://Kaggle.com.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2023 Muhammad Riansyah, Saib Suwilo, Muhammad Zarliz

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
