The Performance of the Equal-Width and Equal-Frequency Discretization Methods on Data Features in Classification Process

Authors

  • Pramaishella Ardiani Regita Putri School of Computing, Telkom University, Bandung Indonesia
  • Sri Suryani Prasetiyowati School of Computing, Telkom University, Bandung Indonesia
  • Yuliant Sibaroni School of Computing, Telkom University, Bandung Indonesia

DOI:

10.33395/sinkron.v8i4.12730

Keywords:

Discretization, Equal-Width, Equal-Frequency, Classification, Accuracy

Abstract

The classification process often needs help with suboptimal accuracy values, which can be attributed to various factors, including the dataset's wide range of attribute values. Discretization methods offer a solution to address these issues. This study aims to compare the effectiveness of Equal-Width and Equal-Frequency discretization methods in enhancing accuracy during the classification process using datasets with varying sizes. The research employs Naïve Bayes, Decision Tree, and Support Vector Machine as classification models, with three datasets utilized: Bandung City Traffic data (3804 records), Bandung City COVID-19 cases data (2718 records), and Bandung City Dengue Fever Disease Index data (150 records). Three experimental scenarios are executed to assess the impact of the two discretization methods on accuracy. The first scenario involves no discretization, the second employs Equal-Width, and the third applies Equal-Frequency discretization. Experimental results indicate significant accuracy improvements post-discretization. The Naïve Bayes model achieved 94% accuracy for the Traffic dataset, while the Decision Tree achieved 71% accuracy for the COVID-19 dataset and an impressive 98% for the Dengue Fever Disease dataset. These outcomes demonstrate that applying Equal-Width and Equal-Frequency discretization methods addresses the challenge of wide attribute value ranges in the classification process.

GS Cited Analysis

Downloads

Download data is not yet available.

Author Biographies

Pramaishella Ardiani Regita Putri, School of Computing, Telkom University, Bandung Indonesia

 

 

 

 

Sri Suryani Prasetiyowati, School of Computing, Telkom University, Bandung Indonesia

 

 

Yuliant Sibaroni, School of Computing, Telkom University, Bandung Indonesia

 

 

References

Charbuty, B., & Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165

Fajriati, N., Prasetiyo, B., & Korespondensi, P. (2023). OPTIMASI ALGORITMA NAÏVE BAYES DENGAN DISKRITISASI K-MEANS PADA DIAGNOSIS PENYAKIT JANTUNG. 10(3), 503–512. https://doi.org/10.25126/jtiik.2023106510

Hacibeyoglu, M., & Ibrahim, M. H. (2018). EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method. Arabian Journal for Science and Engineering, 43(12), 7695–7704. https://doi.org/10.1007/s13369-018-3144-z

Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023

Nedumaran, A., Ganesh Babu, R., Kassa, M. M., & Karthika, P. (2020). Machine level classification using support vector machine. AIP Conference Proceedings, 2207. https://doi.org/10.1063/5.0000041

Nugroho, W. E., Prihandoyo, T., & Somantri, O. (2022). Optimalisasi Metode Naive Bayes untuk Menentukan Program Studi bagi Calon Mahasiswa Baru dengan Pendekatan Unsupervised Discretization. Infotekmesin, 13(1), 161–167. https://doi.org/10.35970/infotekmesin.v13i1.1048

Saleh, A., & Nasari, F. (n.d.). PENERAPAN EQUAL-WIDTH INTERVAL DISCRETIZATION DALAM METODE NAIVE BAYES UNTUK MENINGKATKAN AKURASI PREDIKSI PEMILIHAN JURUSAN SISWA (STUDI KASUS: MAS PAB 2 HELVETIA,MEDAN) IMPLEMENTATION OF EQUAL-WIDTH INTERVAL DISCRETIZATION IN NAIVE BAYES METHOD FOR INCREASING ACCURACY OF STUDENTS’ MAJORS PREDICTION (CASE STUDY : MAS PAB 2 HELVETIA,MEDAN).

Setyawan, D. A., & Fatichah, C. (2020). ENHANCEMENT OF DECISION TREE METHOD BASED ON HIERARCHICAL CLUSTERING AND DISPERSION RATIO. JUTI: Jurnal Ilmiah Teknologi Informasi, 18(2), 179. https://doi.org/10.12962/j24068535.v18i2.a1005

Stańczyk, U., Zielosko, B., & Baron, G. (2020). Discretisation of conditions in decision rules induced for continuous data. PLoS ONE, 15(4). https://doi.org/10.1371/journal.pone.0231788

Surono Program Studi Matematika FAST UAD Jl Ringroad Selatan, S. (n.d.). DISKRITISASI EQUAL-WIDTH INTERVAL PADA NAÏVE BAYES (STUDI KASUS: KLASIFIKASI PASIEN TBC) EQUAL-WIDTH INTERVAL DISCRETIZATION IN NAÏVE BAYES (CASE STUDY: CLASSIFICATION TBC PATIENTS).

Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004

Tsai, C. F., & Chen, Y. C. (2019). The optimal combination of feature selection and data discretization: An empirical study. Information Sciences, 505, 282–293. https://doi.org/10.1016/j.ins.2019.07.091

Xiong, W., IEEE Computer Society, International Association for Computer & Information Science, Pattern Recognition and Machine Intelligence Association., & Institute of Electrical and Electronics Engineers. (n.d.). 17th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2018) : proceedings : June 6-8, 2018, Singapore.

Yamasari, Y., Qoiriah, A., Rochmawati, N., Yustanti, W., Tjahyaningtijas, H. P. A., & Rusimamto, P. W. (2020, October 3). Combining the Unsupervised Discretization Method and the Statistical Machine Learning on the Students’ Performance. Proceeding - 2020 3rd International Conference on Vocational Education and Electrical Engineering: Strengthening the Framework of Society 5.0 through Innovations in Education, Electrical, Engineering and Informatics Engineering, ICVEE 2020. https://doi.org/10.1109/ICVEE50212.2020.9243273

Downloads


Crossmark Updates

How to Cite

Putri, P. A. R. ., Prasetiyowati, S. S. ., & Sibaroni, Y. . (2023). The Performance of the Equal-Width and Equal-Frequency Discretization Methods on Data Features in Classification Process . Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(4), 2082-2098. https://doi.org/10.33395/sinkron.v8i4.12730