Improved Accuracy In Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost)

Authors

  • Muhammad Riansyah Master of Informatics Program, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara
  • Saib Suwilo Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sumatera Utara
  • Muhammad Zarlis Information Systems Management Department, BINUS Graduate Program – Master of Information Systems Management, Bina Nusantara University, Jakarta, 11480, Indonesia

DOI:

10.33395/sinkron.v8i2.12055

Keywords:

Counfusion Matrix, Adaptive Boosting, C5.0 algorithm, Decision Tree, Data Mining

Abstract

The Decision Tree algorithm is a data mining method algorithm that is often applied as a solution to a problem for a classification. The Decision Tree C5.0 algorithm has several weaknesses, including: the C5.0 algorithm and several other decision tree methods are often biased towards modeling whose features have many levels, some problems for the model can occur such as over-fit or under-fit challenges, big changes to decision logic can result in small changes to data training, C5.0 can experience modeling inconvenience, data imbalance causes low accuracy in C5.0 algorithm. The boosting algorithm is an iterative algorithm that gives different weights to the distribution of training data in each iteration. Each iteration of boosting adds weight to examples of misclassification and decreases weight to examples of correct classification, thereby effectively changing the distribution of the training data. One example of a boosting algorithm is adaboost. The purpose of this research is to improve the performance of the Decision Tree C5.0 classification method using adaptive boosting (adaboost) to predict hepatitis disease using the Confusion matrix. Tests that have been carried out with the Confusion Matrix use the Hepatitis dataset in the Decision Tree C5.0 classification which has an accuracy rate of 80.58% with a classification error rate of 19.15%. Whereas in the Decision Tree C5.0 classification Adaboost has a higher accuracy rate of 82.98%, a classification error rate of 17.02%. This difference is caused by the adaboost algorithm, because the adaboost algorithm is able to change a weak classifier into a strong classifier by increasing the weight of the observations, and adaboost is also able to reduce the classifier error rate.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Bahramian, S. and Nikravanshalmani, A. 2016. Hybrid Algorithm based on K-Nearest Neighbor Algorithm and Adaboost with Selection Of Feature By Genetic Algorithms for the Diagnosis of Diabetes. International Journal Of Mechatronics, Electrical and Computer Technology (IJMEC) 6(21):2977-2986.

Gorunescu, F. Data Mining: Concepts Models, and Techniques.: Springer, 2011

Han, J., Kamber, M., 2001, “Data Mining Concepts and Techniques”, Morgan Kaufman Pub., USA.

Hendra., Azis, M.A. and Suhardjono. 2020. Analysis of Student Graduation Predictions Using a Decission Tree Based on Particle Swarm Optimization. Journal of SISFOKOM (Computer and Information Systems), pp. 102-107.

Hepatitis B virus DNA is formed through distinct repair processes of each strand

Ihsan. 2018. Attribute Reduction in the K-Nearest Neighbor (Knn) Algorithm Using a Genetic Algorithm. [Thesis]. Medan: University of North Sumatra, Postgraduate.

Napa, K.K. and Dhamodaran, V. 2019. Hepatitis- Infectious Disease Prediction using Classification Algorithms. Research Journal of Pharmacy and Technology, pp. 3720-3725.

Pant, H. and Srivastava, R. 2015. A Survey on Feature Selection Methods for Imbalanced Datasets. International Journal of Computer Engineering & Application, Vol IX, Issue II . pp 197 – 204.

Patil, N., Lathi, R. and Chitre, V., “Comparison of C5.0 & CART Classification algorithms using pruning Technique”, International Journal of Engineering Research & Technology (IJERT), Volume.1, Issue.4, June 2012, pp: 1-5.

Perveen, S., Shahbaz, M., Guergachi, A. and Keshavjee, K. 2016. Performance Analysis of Data Mining Classification Techniques to Predict Diabetes. Procedia Computer Science 82 (2016), pp. 115-121.

Saifudin, A., and Wahono, R. S. (2015). Application of Ensemble Techniques to Handle Class Imbalances in Software Flaw Prediction. Journal of Software Engineering, 1(1), 28–37. https://doi.org/10.1016/S1896- 1126(14)00030-3n

Schapire, R.E. 2013. Explaining Adaboost. Dept of Computer Science. Princeton University: USA.

Shakeel. P.M., Tolba, A., Al-Makhadmeh, Z. and Jaber, M.M. 2019. Automatic Detection Of Lung Cancer From Biomedical Data Sets Using Discrete Adaboost Optimized Ensemble Learning Generalized Neural Networks. Neutral Computing and Applications. Springer.

Sudarto. 2016. Analysis of Handling Class Imbalance Using Density Based Feature Selection (DBFS) and Adaptive Boosting (Adaboost). [Thesis]. Medan: University of North Sumatra, Postgraduate Program

Taufiqurrahman, A., Putrada, A.G and Dawani, F. 2020. Decison Tree Regression with Adaboost Ensemble Learning For Water Temperature Forecasting in Aquaponic Ecosystem. 2020 6th International Conference on Interactive Digital Media (ICIDM), Vol 12 No 1, February 2020, pp. 1-10..

www://Kaggle.com.

Downloads


Crossmark Updates

How to Cite

Riansyah, M. ., Suwilo, S., & Zarlis, M. . (2023). Improved Accuracy In Data Mining Decision Tree Classification Using Adaptive Boosting (Adaboost). Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(2), 617-622. https://doi.org/10.33395/sinkron.v8i2.12055

Most read articles by the same author(s)

1 2 3 > >>