Class Balancing Methods Comparison for Software Requirements Classification on Support Vector Machines
Keywords:Class balancing, classification, random over sampling, software requirements, support vector machine
Cost, time, and development effort can increase due to errors in analyzing functional and non-functional software requirements. To minimize these errors, previous research has tried to classify software requirements, especially non-functional requirements, on the PROMISE dataset using the Bag of Words (BoW) feature extraction and the Support Vector Machine (SVM) classification algorithm. On the other hand, the unbalanced distribution of class labels tends to decrease the evaluation result. Moreover, most software requirements are usually functional requirements. Therefore, there is a tendency for classifier models to classify test data as functional requirements. Previous research has performed class balancing on a dataset to handle unbalanced data. The study can achieve better classification evaluation results. Based on the previous research, this study proposes to combine the class balancing method and the SVM algorithm. K-fold cross-validation is used to optimize the training and test data to be more consistent in developing the SVM model. Tests were carried out on the value of K in k-fold, i.e., 5, 10, and 15. Results are measured by accuracy, f1-score, precision, and recall. The Public Requirements (PURE) dataset has been used in this research. Results show that SVM with class balancing can classify software requirements more accurately than SVM without class balancing. Random Over Sampling is the class balancing method with the highest evaluation score for classifying software requirements on SVM. The results showed an improvement in the average value of accuracy, f1 score, precision, and recall in SVM by 22.07%, 19.67%, 17.73%, and 19.67%, respectively.
Aminu Umar, M. (2020). Automated Requirements Engineering Framework for Agile Development. ICSEA 2020: The Fifteenth International Conference on Software Engineering Advances, c, 147–150.
Ao, S. I., Gelman, Len., Hukins, D. W. L., Hunter, Andrew., Korsunsky, Alexander., & International Association of Engineers. (n.d.). Balancing Class for Performance of Classification with a Clinical Dataset. 1538.
Binkhonain, M., & Zhao, L. (2019). A review of machine learning algorithms for identification and classification of non-functional requirements. Expert Systems with Applications: X, 1. https://doi.org/10.1016/j.eswax.2019.100001
Canedo, E. D., & Mendes, B. C. (2020). Software Requirements Classification Using Machine Learning Algorithms. Entropy, 22(9), 1–20. https://doi.org/10.3390/E22091057
Chakraborty, J., Majumder, S., & Menzies, T. (2021). Bias in machine learning software: Why? how? what to do? ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 429–440. https://doi.org/10.1145/3468264.3468537
Dharma, A. S., & Saragih, Y. G. R. (2022). Comparison of Feature Extraction Methods on Sentiment Analysis in Hotel Reviews. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 7(4), 2349–2354. https://doi.org/10.33395/sinkron.v7i4.11706
Ferrari, A., Spagnolo, G. O., & Gnesi, S. (2017). PURE: A Dataset of Public Requirements Documents. Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017, 502–505. https://doi.org/10.1109/RE.2017.29
Gazali Mahmud, F., Iman Hermanto, T., Maruf Nugroho, I., & Tinggi Teknologi Wastukancana, S. (2023). IMPLEMENTATION OF K-NEAREST NEIGHBOR ALGORITHM WITH SMOTE FOR HOTEL REVIEWS SENTIMENT ANALYSIS. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 8(2). https://doi.org/10.33395/10.33395/sinkron.v8i2.12214
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods, 25(1), 114–146. https://doi.org/10.1177/1094428120971683
Khayashi, F., Jamasb, B., Akbari, R., & Shamsinejadbabaki, P. (2022). Deep Learning Methods for Software Requirement Classification: A Performance Study on the PURE dataset. ArXiv Preprint ArXiv:2211.05286.
Md. Ariful Haque, Md. Abdur Rahman, & Md Saeed Siddik. (2019). Non-Functional Requirements Classification withFeature Extraction and Machine Learning: AnEmpirical Study. 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT).
Mulyawan, M. D., Kumara, I. N. S., Bagus, I., Swamardika, A., & Saputra, K. O. (2021). Kualitas Sistem Informasi Berdasarkan ISO / IEC 25010 : 20(1).
Rahimi, N., Eassa, F., & Elrefaei, L. (2020). SS symmetry An Ensemble Machine Learning Technique for. Ml, 1–25.
Ramos, F., Costa, A., Perkusich, M., Almeida, H., & Perkusich, A. (2018). A non-functional requirements recommendation system for scrum-based projects. Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE, 2018-July(July), 149–154. https://doi.org/10.18293/SEKE2018-107
Shreda, Q. A., & Hanani, A. A. (2021). Identifying Non-functional Requirements from Unconstrained Documents using Natural Language Processing and Machine Learning Approaches. IEEE Access, 4, 1–22. https://doi.org/10.1109/ACCESS.2021.3052921
Susan, S., & Kumar, A. (2021). The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art. Engineering Reports, 3(4). https://doi.org/10.1002/eng2.12298
Tiun, S., Mokhtar, U. A., Bakar, S. H., & Saad, S. (2020). Classification of functional and non-functional requirement in software requirement using Word2vec and fast Text. Journal of Physics: Conference Series, 1529(4). https://doi.org/10.1088/1742-6596/1529/4/042077
Vogelsang, A., & Borg, M. (2019). Requirements Engineering for Machine Learning: Perspectives from Data Scientists. IEEE 27th International Requirements Engineering Conference Workshops (REW), 245–251. http://arxiv.org/abs/1908.04674
Yanmin Yang, Xin Xia, David Lo, & John Grundy. (2020). A Survey on Deep Learning for Software Engineering. ACM Computing Survey, 1(1), 1–35. https://doi.org/10.1145/1122445.1122456
How to Cite
Copyright (c) 2023 Fachrul Pralienka Bani Muhamad, Esti Mulyani, Munengsih Sari Bunga, Achmad Farhan Mushafa
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.