Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection

Authors

  • Triya Agustina Universitas Labuhanbatu
  • Masrizal Universitas Labuhanbatu
  • Irmayanti Universitas Labuhanbatu

DOI:

10.33395/sinkron.v8i2.13625

Keywords:

Anomaly Detection, Feature Selection, Machine Learning, Random Forest, UNSW-NB15

Abstract

As the volume and complexity of computer network traffic continue to increase, network administrators face a growing challenge in monitoring and discovering unusual activity. To keep the network safe and functioning, detecting anomalies is essential. Machine learning-based anomaly detection techniques have become increasingly popular in recent years. This is due to the fact that conventional anomaly detection methods make it difficult to detect unknown and complex attacks. This research aims to conduct a performance analysis of two feature selection methods using the random forest algorithm using the UNSW-NB15 dataset to determine which model is most effective in detecting network traffic anomalies. The models evaluated were random forest with the filter method and random forest with the wrapper method. A number of metrics used for model performance assessment are accuracy, F1-score, receiver operating characteristic curve, and precision-recall. Dataset collection, data pre-processing, feature selection, model construction, and evaluation are the main components of the research methodology. The research results show that the Random Forest approach with the Filter method has an accuracy of 0.8950, F1-score of 0.8333, ROC score of 0.8928, and a precision-recall value of 0.8347. Meanwhile, the approach using the Wrapper method obtained an accuracy of 0.9151, F1-score of 0.8510, ROC score of 0.9136, and a precision-recall value of 0.8637. This shows that the performance of Random Forest with the Wrapper method is superior in all assessment metrics. Random Forest with the Wrapper Method is the right choice of model for detecting network traffic anomalies because of its stable performance and ability to handle complex patterns

GS Cited Analysis

Downloads

Download data is not yet available.

References

Almomani, O., Almaiah, M. A., Alsaaidah, A., Smadi, S., Mohammad, A. H., & Althunibat, A. (2021). Machine Learning Classifiers for Network Intrusion Detection System: Comparative Study. 2021 International Conference on Information Technology (ICIT), 440–445. https://doi.org/10.1109/ICIT52682.2021.9491770

Alsahli, M. S., Almasri, M. M., Al-Akhras, M., Al-Issa, A. I., & Alawairdhi, M. (2021). Evaluation of Machine Learning Algorithms for Intrusion Detection System in WSN. International Journal of Advanced Computer Science and Applications, 12(5), 617–626. https://doi.org/10.14569/IJACSA.2021.0120574

Ariyoga, D. (2022). Perbandingan Metode Seleksi Fitur Filter, Wrapper, Dan Embedded Pada Klasifikasi Data NIRS Mangga Menggunakan Random Forest Dan Support Vector Machine (SVM) (Universitas Islam Indonesia). Universitas Islam Indonesia. Retrieved from https://dspace.uii.ac.id/handle/123456789/38955

Arora, N., & Kaur, P. D. (2020). A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Applied Soft Computing, 86, 105936. https://doi.org/https://doi.org/10.1016/j.asoc.2019.105936

Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., & Lang, M. (2020). Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 143, 106839. https://doi.org/https://doi.org/10.1016/j.csda.2019.106839

Chalapathy, R., & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. 1–50. Retrieved from http://arxiv.org/abs/1901.03407

Devia, A., & Soewito, B. (2023). Analisis Perbandingan Metode Seleksi Fitur untuk Mendeteksi Anomali pada Dataset CIC-IDS-2018. JTeksis: Jurnal Teknologi Dan Sistem Informasi Bisnis, 5(4), 572. https://doi.org/10.47233/jteksis.v5i4.1069

Disha, R. A., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1. https://doi.org/10.1186/s42400-021-00103-8

Doreswamy, Hooshmand, M. K., & Gad, I. (2020). Feature selection approach using ensemble learning for network anomaly detection. CAAI Transactions on Intelligence Technology, 5(4), 283–293. https://doi.org/10.1049/trit.2020.0073

Fariadi, & Islami, M. R. R. (2022). Deteksi Dini Serangan Pada Website Menggunakan Metode Anomali Based. JIKO (Jurnal Informatika Dan Komputer), 5(3), 224–229. https://doi.org/10.33387/jiko

Fei, H., Fan, Z., Wang, C., Zhang, N., Wang, T., Chen, R., & Bai, T. (2022). Cotton Classification Method at the County Scale Based on Multi-Features and Random Forest Feature Selection Algorithm and Classifier. Remote Sensing, 14(4). https://doi.org/10.3390/rs14040829

Hooshmand, M. K., & Doreswamy. (2019). Machine Learning Based Network Anomaly Detection. International Journal of Recent Technology and Engineering (IJRTE), 8(4), 542–548. https://doi.org/10.35940/ijrte.d7271.118419

Huljanah, M., Rustam, Z., Utama, S., & Siswantining, T. (2019). Feature Selection using Random Forest Classifier for Predicting Prostate Cancer. IOP Conference Series: Materials Science and Engineering, 546(5). https://doi.org/10.1088/1757-899X/546/5/052031

Jr., G. F., Rodrigues, J. J. P. C., Carvalho, L. F., Al-Muhtadi, J. F., & Jr., M. L. P. (2019). A comprehensive survey on network anomaly detection. Telecommunication Systems, 70(3), 447–489. https://doi.org/10.1007/s11235-018-0475-8

Khan, F. A., & Gumaei, A. (2019). A Comparative Study of Machine Learning Classifiers for Network Intrusion Detection. In X.

Sun, Z. Pan, & E. Bertino (Eds.), ICAIS 2019: Artificial Intelligence and Security (pp. 75–86). Cham: Springer International Publishing.

Khan, I. A., Birkhofer, H., Kunz, D., Lukas, D., & Ploshikhin, V. (2023). A Random Forest Classifier for Anomaly Detection in Laser-Powder Bed Fusion Using Optical Monitoring. Materials, Vol. 16. https://doi.org/10.3390/ma16196470

Kocher, G., & Kumar, G. (2020). Performance Analysis of Machine Learning Classifiers for Intrusion Detection using UNSW-NB15 Dataset. 31–40. https://doi.org/10.5121/csit.2020.102004

Moualla, S., Khorzom, K., & Jafar, A. (2021). Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset. Computational Intelligence and Neuroscience, 2021, 5557577. https://doi.org/10.1155/2021/5557577

Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access, 9, 78658–78700. https://doi.org/10.1109/ACCESS.2021.3083060

Nixon, C., Sedky, M., & Hassan, M. (2020). Autoencoders: A Low Cost Anomaly Detection Method for Computer Network Data Streams. ACM International Conference Proceeding Series, 58–62. https://doi.org/10.1145/3416921.3416937

Pang, G., Shen, C., Cao, L., & Hengel, A. Van Den. (2020). Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 1(1), 1–36. https://doi.org/10.1145/3439950

Riadi, S., Utami, E., & Yaqin, A. (2023). Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection. Sinkron, 8(4), 2414–2424. https://doi.org/10.33395/sinkron.v8i4.12629

Roshan, K., & Zafar, A. (2021). Utilizing XAI Technique to Improve Autoencoder Based Model for Computer Network Anomaly Detection with Shapley Additive Explanation(SHAP). International Journal of Computer Networks &

Communications (IJCNC), 13(6), 109–128. https://doi.org/10.5121/ijcnc.2021.13607

Sahli, Y. (2022). A comparison of the NSL-KDD dataset and its predecessor the KDD Cup ’99 dataset. International Journal of Scientific Research and Management (IJSRM), 10(04), 832–839. https://doi.org/10.18535/ijsrm/v10i4.ec05

Sapre, S., Ahmadi, P., & Islam, K. (2019). A Robust Comparison of the KDDCup99 and NSL-KDD IoT Network Intrusion Detection Datasets Through Various Machine Learning Algorithms. Journal of Student-Scientists’ Research, 1. https://doi.org/10.13021/jssr2019.2681

Sarhan, M., Layeghy, S., Moustafa, N., & Portmann, M. (2021). NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems BT - Big Data Technologies and Applications. In Z. Deze, H. Huang, R. Hou, S. Rho, & N.

Chilamkurti (Eds.), International Conference on Big Data Technologies and Applications (pp. 117–135). Cham: Springer International Publishing.

Tan, T., Sama, H., Wijaya, G., & Aboagye, O. E. (2023). Studi Perbandingan Deteksi Intrusi Jaringan Menggunakan Machine Learning: (Metode SVM dan ANN). Jurnal Teknologi Dan Informasi (JATI), 13(2). https://doi.org/10.34010/jati.v13i2

UNSW. (2021). The UNSW-NB15 Dataset. Retrieved March 20, 2024, from IXIA PerfectStorm website: https://research.unsw.edu.au/projects/unsw-nb15-dataset

Wang, S., Balarezo, J. F., Kandeepan, S., Al-Hourani, A., Chavez, K. G., & Rubinstein, B. (2021). Machine learning in network anomaly detection: A survey. IEEE Access, 9, 152379–152396. https://doi.org/10.1109/ACCESS.2021.3126834

Wardhani, F. H., & Lhaksmana, K. M. (2022). Predicting Employee Attrition Using Logistic Regression With Feature Selection. Sinkron, 7(4), 2214–2222. https://doi.org/10.33395/sinkron.v7i4.11783

Zhu, J., Pan, Z., Wang, H., Huang, P., Sun, J., Qin, F., & Liu, Z. (2019). An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery. Sensors, 19(9). https://doi.org/10.3390/s19092087

Downloads


Crossmark Updates

How to Cite

Agustina, T., Masrizal, M., & Irmayanti, I. (2024). Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(2). https://doi.org/10.33395/sinkron.v8i2.13625