Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection
DOI:
10.33395/sinkron.v8i2.13625Keywords:
Anomaly Detection, Feature Selection, Machine Learning, Random Forest, UNSW-NB15Abstract
As the volume and complexity of computer network traffic continue to increase, network administrators face a growing challenge in monitoring and discovering unusual activity. To keep the network safe and functioning, detecting anomalies is essential. Machine learning-based anomaly detection techniques have become increasingly popular in recent years. This is due to the fact that conventional anomaly detection methods make it difficult to detect unknown and complex attacks. This research aims to conduct a performance analysis of two feature selection methods using the random forest algorithm using the UNSW-NB15 dataset to determine which model is most effective in detecting network traffic anomalies. The models evaluated were random forest with the filter method and random forest with the wrapper method. A number of metrics used for model performance assessment are accuracy, F1-score, receiver operating characteristic curve, and precision-recall. Dataset collection, data pre-processing, feature selection, model construction, and evaluation are the main components of the research methodology. The research results show that the Random Forest approach with the Filter method has an accuracy of 0.8950, F1-score of 0.8333, ROC score of 0.8928, and a precision-recall value of 0.8347. Meanwhile, the approach using the Wrapper method obtained an accuracy of 0.9151, F1-score of 0.8510, ROC score of 0.9136, and a precision-recall value of 0.8637. This shows that the performance of Random Forest with the Wrapper method is superior in all assessment metrics. Random Forest with the Wrapper Method is the right choice of model for detecting network traffic anomalies because of its stable performance and ability to handle complex patterns
Downloads
References
Almomani, O., Almaiah, M. A., Alsaaidah, A., Smadi, S., Mohammad, A. H., & Althunibat, A. (2021). Machine Learning Classifiers for Network Intrusion Detection System: Comparative Study. 2021 International Conference on Information Technology (ICIT), 440–445. https://doi.org/10.1109/ICIT52682.2021.9491770
Alsahli, M. S., Almasri, M. M., Al-Akhras, M., Al-Issa, A. I., & Alawairdhi, M. (2021). Evaluation of Machine Learning Algorithms for Intrusion Detection System in WSN. International Journal of Advanced Computer Science and Applications, 12(5), 617–626. https://doi.org/10.14569/IJACSA.2021.0120574
Ariyoga, D. (2022). Perbandingan Metode Seleksi Fitur Filter, Wrapper, Dan Embedded Pada Klasifikasi Data NIRS Mangga Menggunakan Random Forest Dan Support Vector Machine (SVM) (Universitas Islam Indonesia). Universitas Islam Indonesia. Retrieved from https://dspace.uii.ac.id/handle/123456789/38955
Arora, N., & Kaur, P. D. (2020). A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing, 86, 105936. https://doi.org/https://doi.org/10.1016/j.asoc.2019.105936
Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., & Lang, M. (2020). Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143, 106839. https://doi.org/https://doi.org/10.1016/j.csda.2019.106839
Chalapathy, R., & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. 1–50. Retrieved from http://arxiv.org/abs/1901.03407
Devia, A., & Soewito, B. (2023). Analisis Perbandingan Metode Seleksi Fitur untuk Mendeteksi Anomali pada Dataset CIC-IDS-2018. JTeksis: Jurnal Teknologi Dan Sistem Informasi Bisnis, 5(4), 572. https://doi.org/10.47233/jteksis.v5i4.1069
Disha, R. A., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1. https://doi.org/10.1186/s42400-021-00103-8
Doreswamy, Hooshmand, M. K., & Gad, I. (2020). Feature selection approach using ensemble learning for network anomaly detection. CAAI Transactions on Intelligence Technology, 5(4), 283–293. https://doi.org/10.1049/trit.2020.0073
Fariadi, & Islami, M. R. R. (2022). Deteksi Dini Serangan Pada Website Menggunakan Metode Anomali Based. JIKO (Jurnal Informatika Dan Komputer), 5(3), 224–229. https://doi.org/10.33387/jiko
Fei, H., Fan, Z., Wang, C., Zhang, N., Wang, T., Chen, R., & Bai, T. (2022). Cotton Classification Method at the County Scale Based on Multi-Features and Random Forest Feature Selection Algorithm and Classifier. Remote Sensing, 14(4). https://doi.org/10.3390/rs14040829
Hooshmand, M. K., & Doreswamy. (2019). Machine Learning Based Network Anomaly Detection. International Journal of Recent Technology and Engineering (IJRTE), 8(4), 542–548. https://doi.org/10.35940/ijrte.d7271.118419
Huljanah, M., Rustam, Z., Utama, S., & Siswantining, T. (2019). Feature Selection using Random Forest Classifier for Predicting Prostate Cancer. IOP Conference Series: Materials Science and Engineering, 546(5). https://doi.org/10.1088/1757-899X/546/5/052031
Jr., G. F., Rodrigues, J. J. P. C., Carvalho, L. F., Al-Muhtadi, J. F., & Jr., M. L. P. (2019). A comprehensive survey on network anomaly detection. Telecommunication Systems, 70(3), 447–489. https://doi.org/10.1007/s11235-018-0475-8
Khan, F. A., & Gumaei, A. (2019). A Comparative Study of Machine Learning Classifiers for Network Intrusion Detection. In X.
Sun, Z. Pan, & E. Bertino (Eds.), ICAIS 2019: Artificial Intelligence and Security (pp. 75–86). Cham: Springer International Publishing.
Khan, I. A., Birkhofer, H., Kunz, D., Lukas, D., & Ploshikhin, V. (2023). A Random Forest Classifier for Anomaly Detection in Laser-Powder Bed Fusion Using Optical Monitoring. Materials, Vol. 16. https://doi.org/10.3390/ma16196470
Kocher, G., & Kumar, G. (2020). Performance Analysis of Machine Learning Classifiers for Intrusion Detection using UNSW-NB15 Dataset. 31–40. https://doi.org/10.5121/csit.2020.102004
Moualla, S., Khorzom, K., & Jafar, A. (2021). Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset. Computational Intelligence and Neuroscience, 2021, 5557577. https://doi.org/10.1155/2021/5557577
Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access, 9, 78658–78700. https://doi.org/10.1109/ACCESS.2021.3083060
Nixon, C., Sedky, M., & Hassan, M. (2020). Autoencoders: A Low Cost Anomaly Detection Method for Computer Network Data Streams. ACM International Conference Proceeding Series, 58–62. https://doi.org/10.1145/3416921.3416937
Pang, G., Shen, C., Cao, L., & Hengel, A. Van Den. (2020). Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 1(1), 1–36. https://doi.org/10.1145/3439950
Riadi, S., Utami, E., & Yaqin, A. (2023). Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection. Sinkron, 8(4), 2414–2424. https://doi.org/10.33395/sinkron.v8i4.12629
Roshan, K., & Zafar, A. (2021). Utilizing XAI Technique to Improve Autoencoder Based Model for Computer Network Anomaly Detection with Shapley Additive Explanation(SHAP). International Journal of Computer Networks &
Communications (IJCNC), 13(6), 109–128. https://doi.org/10.5121/ijcnc.2021.13607
Sahli, Y. (2022). A comparison of the NSL-KDD dataset and its predecessor the KDD Cup ’99 dataset. International Journal of Scientific Research and Management (IJSRM), 10(04), 832–839. https://doi.org/10.18535/ijsrm/v10i4.ec05
Sapre, S., Ahmadi, P., & Islam, K. (2019). A Robust Comparison of the KDDCup99 and NSL-KDD IoT Network Intrusion Detection Datasets Through Various Machine Learning Algorithms. Journal of Student-Scientists’ Research, 1. https://doi.org/10.13021/jssr2019.2681
Sarhan, M., Layeghy, S., Moustafa, N., & Portmann, M. (2021). NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems BT - Big Data Technologies and Applications. In Z. Deze, H. Huang, R. Hou, S. Rho, & N.
Chilamkurti (Eds.), International Conference on Big Data Technologies and Applications (pp. 117–135). Cham: Springer International Publishing.
Tan, T., Sama, H., Wijaya, G., & Aboagye, O. E. (2023). Studi Perbandingan Deteksi Intrusi Jaringan Menggunakan Machine Learning: (Metode SVM dan ANN). Jurnal Teknologi Dan Informasi (JATI), 13(2). https://doi.org/10.34010/jati.v13i2
UNSW. (2021). The UNSW-NB15 Dataset. Retrieved March 20, 2024, from IXIA PerfectStorm website: https://research.unsw.edu.au/projects/unsw-nb15-dataset
Wang, S., Balarezo, J. F., Kandeepan, S., Al-Hourani, A., Chavez, K. G., & Rubinstein, B. (2021). Machine learning in network anomaly detection: A survey. IEEE Access, 9, 152379–152396. https://doi.org/10.1109/ACCESS.2021.3126834
Wardhani, F. H., & Lhaksmana, K. M. (2022). Predicting Employee Attrition Using Logistic Regression With Feature Selection. Sinkron, 7(4), 2214–2222. https://doi.org/10.33395/sinkron.v7i4.11783
Zhu, J., Pan, Z., Wang, H., Huang, P., Sun, J., Qin, F., & Liu, Z. (2019). An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery. Sensors, 19(9). https://doi.org/10.3390/s19092087
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2024 Triya Agustina, Masrizal, Irmayanti
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.