Integration of Feature Selection with Data Level Approach for Software Defect Prediction

Authors

  • Ade Suryadi Universitas Bina Sarana Informatika

DOI:

10.33395/sinkron.v4i1.10137

Keywords:

Class Imbalance, Approach The Level Data, Feature Selection, Software Defect Prediction

Abstract

The dataset of software metrics in general are not balanced (unbalanced).An imbalance distribution of classes and attributes that are not relevant may decrease the performance of the model prediction software defect, because the majority of the class predictions tend to produce than minority class. This research uses a public dataset from NASA (National Aeronautics and Space Administration) MDP (Metrics Data Program) repository. This research aims to reduce the influence of class imbalance in the dataset, so that performance can be improved in the classification of defect prediction software. The model proposed in this research is applying the technique feature selection with particle swarm optimization (PSO), approaches the level of data using Random Under Sampling (RUS) and SMOTE (Synthetic Minority Over-sampling Technique) and (ensemble) Bagging with Naive Bayes Classifier. Research results show that the proposed model can improve the performance of naive bayes of the overall value of the AUC reached > 0.8. Statistical tests indicate that there is a significant difference between a naive bayes model with the model proposed by the p value (0.043) smaller than the alpha values (0.05) which means there is a significant difference between the two models.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Arora, I., Tetarwal, V., & Saha, A. (2015). Open Issues in Software Defect Prediction. Procedia Computer Science, Volume 46, p. 906-912.
<a href="https://scholar.google.com/scholar?cluster=5613041424561392463&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Jones, C. (2013). Software Defect Origins and Removal Methods. Namcook Analytics.

Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering, 1-16.
<a href="https://scholar.google.com/scholar?cluster=12145455490045212354&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software Defect Prediction Using Ensemble Learning on Selected Features. Information and Software Technology, 388-402.
<a href="https://scholar.google.com/scholar?cluster=6225792955697417444&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Yap, B. W., Rani, K. A., Rahman, H. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). 285, pp. 13-22. Singapore: Springer. doi:10.1007/978-9814585-18-7_2
<a href="https://scholar.google.com/scholar?cluster=5808601412150764366&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Wahono, R. S., & Suryana, N. (2013). Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect. IJSEIA, 153-166.
<a href="https://scholar.google.com/scholar?cluster=11333769270751146286&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Wahono, R. S., Suryana, N., & Ahmad, S. (2014). Metaheuristic Optimization based Feature Selection for Software Defect Prediction. Journal of Software, 1324-1333.
<a href="https://scholar.google.com/scholar?cluster=10180218528836933715&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Putri, S. A. & Frieyadie (2017). Combining Integreted Sampling Technique With Feature Selection For Software Defect Prediction, 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, 2017, pp. 1-6. doi: 10.1109/CITSM.2017.8089264
<a href="https://scholar.google.com/scholar?cluster=379766451781990334&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Putri S. A. and Wahono R. S. (2015). Integrasi SMOTE dan Information Gain pada Naive Bayes untuk Prediksi Cacat Software. Journal Software Engineering, vol. 1, no. 2, pp. 86–91.
<a href="https://scholar.google.com/scholar?cluster=2618143861697282174&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Cong jin & Shu-Wei Jin. (2015). Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Applied Soft

Alfaro, E., Gamez, M., & García, N. (2013). adabag: An R Package for Classification with Boosting and Bagging. Journal of Statistical Software, 54(2), 1 - 35.
<a href="https://scholar.google.com/scholar?cluster=3024695059883276621&hl=en&as_sdt=0,5"><strong>Google Scholar</strong></a>

Jain, M., & Richariya, V. (2012). An Improved Techniques Based on Naive Bayesian for Attack Detection. International Journal of Emerging Technology and Advanced Engineering, 2(1), 324-33
<a href="https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=An+Improved+Techniques+Based+on+Naive+Bayesian+for+Attack+Detection&btnG="><strong>Google Scholar</strong></a>

Downloads


Crossmark Updates

How to Cite

Suryadi, A. (2019). Integration of Feature Selection with Data Level Approach for Software Defect Prediction. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 4(1), 51-57. https://doi.org/10.33395/sinkron.v4i1.10137