Enhancing Feature-Efficient Network Intrusion Detection Using Gradient Boosting and Chi-Square Selection on NSL-KDD
DOI:
10.33395/sinkron.v10i1.15650Keywords:
Chi-Square, Feature Selection, Gradient Boosting, Network Intrusion Detection System, NSL-KDDAbstract
This study examines the growing complexity of cyber threats that increasingly challenge the effectiveness of traditional Network Intrusion Detection Systems (NIDS). Modern attacks, particularly zero-day intrusions, require detection approaches capable of handling high-dimensional network traffic data. However, existing studies rarely examine the trade-off between feature efficiency and generalization performance in boosting-based NIDS under controlled feature-reduction strategies. Moreover, the role of statistical feature selection in mitigating overfitting in classical boosting models remains underexplored. This study evaluates the performance of NIDS by combining boosting ensemble algorithms, namely AdaBoost, Gradient Boosting, and XGBoost, with filter-based feature selection methods, including Information Gain, Chi-Square, and ReliefF. The NSL-KDD dataset is used as the primary benchmark, with Min–Max normalization applied during preprocessing to ensure numerical feature consistency. Model development is conducted using Orange Data Mining, and performance is assessed through 10-fold cross-validation. Experimental results show that Gradient Boosting achieves the highest baseline accuracy among the evaluated models. Further performance improvements are obtained through feature selection, with the Chi-Square method yielding the best result at 81.2% accuracy using 19 selected features. Information Gain also enhances performance, achieving 80.8% accuracy with 13 features, while ReliefF provides comparatively lower gains. These findings demonstrate that effective feature reduction improves generalization performance, reduces computational complexity, and mitigates overfitting. Overall, the proposed combination of Gradient Boosting and statistical feature selection provides a feature-efficient, generalizable intrusion detection strategy for modern network environments.
Downloads
References
Abdullah, G. M. S., Ahmad, M., Babur, M., Badshah, M. U., Al-Mansob, R. A., Gamil, Y., & Fawad, M. (2024). Boosting-based ensemble machine learning models for predicting unconfined compressive strength of geopolymer stabilized clayey soil. Scientific Reports, 14(2323). https://doi.org/10.1038/s41598-024-52825-7
Adzimi, S. N., Alfasih, H. A., Ramadhan, F. N. G., Neyman, S. N., & Setiawan, A. (2024). Implementasi Konfigurasi Firewall dan Sistem Deteksi Intrusi menggunakan Debian. Journal of Internet and Software Engineering, 1(4), 12. https://doi.org/10.47134/pjise.v1i4.2681
Agustina, T., Masrizal, & Irmayanti. (2024). Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection. Jurnal Dan Penelitian Teknik Informatika, 8(2), 1116–1123. https://doi.org/10.33395/sinkron.v8i2.13625
Ahmad, T., & Aziz, M. N. (2019). Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Express Letters, 13(2), 93–101. https://doi.org/10.24507/icicel.13.02.93
Ahmed, H. A., Muhammad Ali, P. J., Faeq, A. K., & Abdullah, S. M. (2022). An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970
Alsulami, B., Almalawi, A., & Fahad, A. (2022). A Review on Machine Learning Based Approaches of Network Intrusion Detection Systems. International Journal of Current Science Research and Review, 05(06), 2159–2177. https://doi.org/10.47191/ijcsrr/V5-i6-47
APJII. (2024). Data Riset Pengguna Internet di Indonesia Tahun 2024. https://apjii.or.id/berita/d/apjii-jumlah-pengguna-internet-indonesia-tembus-221-juta-orang
Aqilah Bohani, F., Syazwani, F., Rashid, M., Mahmud, Y., & Yahya, S. R. (2024). ANALYZING THE IMPACT OF FEATURE SELECTION USING INFORMATION GAIN FOR AIRLINES’ CUSTOMER SATISFACTION. Malaysian Journal of Computing, 9(1), 1673–1689. https://doi.org/10.24191/mjoc.v9i1.24163
Ardana, A. (2023). Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement. Jurnal Informatika Dan Teknologi Informasi, 20(1), 91–102. https://doi.org/10.31515/telematika.v20i1.9329
Armijos, A., & Cuenca, E. (2023, November). Zero-day attacks: review of the methods used based on intrusion detection and prevention systems. 1st IEEE Colombian Caribbean Conference, C3 2023. https://doi.org/10.1109/C358072.2023.10436218
Azizah, R. A., Bachtiar, F. A., & Adinugroho, S. (2022). KLASIFIKASI KINERJA AKADEMIK SISWA MENGGUNAKAN NEIGHBOR WEIGHTED K-NEAREST NEIGHBOR DENGAN SELEKSI FITUR INFORMATION GAIN. Jurnal Teknologi Informasi Dan Ilmu Komputer, 9(3), 605–614. https://doi.org/10.25126/jtiik.202295751
BSSN. (2023). LANSKAP KEAMANAN SIBER INDONESIA 2023.
Boldini, D., Grisoni, F., Kuhn, D., Friedrich, L., & Sieber, S. A. (2023). Practical guidelines for the use of gradient boosting for molecular property prediction. Journal of Cheminformatics, 15(73), 1–13. https://doi.org/10.1186/s13321-023-00743-7
Bouke, M. A., Abdullah, A., Udzir, N. I., & Samian, N. (2024). Overcoming the Challenges of Data Lack, Leakage, and Dimensionality in Intrusion Detection Systems: A Comprehensive Review. Journal of Communication and Information Systems, 39(2024), 22–34. https://doi.org/10.14209/jcis.2024.3
Chairunnisa, C., Ernawati Iin, & Santoni, M. M. (2022). Klasifikasi Sentimen Ulasan Pengguna Aplikasi PeduliLindungi di Google Play Menggunakan Algoritma Support Vector Machine dengan Seleksi Fitur Chi-Square. Jurnal Informatik, 18(1), 69–79. https://doi.org/10.52958/iftk.v17i4.4594
Chavan, P. V., & Alone, N. V. (2025). Optimizing Intrusion Detection with Random Forest: A High-Accuracy Approach using CIC-IDS 2017. International Journal of Computer Applications, 187(3), 17–22. https://doi.org/10.5120/ijca2025924816
Chhaybi, A., & Lazaar, S. (2025). Enhancing malware detection utilizing Chi-Square distribution for optimal feature selection in machine learning black box models. Journal of Dynamics and Games, 14, 190–203. https://doi.org/10.3934/jdg.2025010
Dutschmann, T.-M., Kinzel, L., ter Laak, A., & Baumann, K. (2023). Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. Journal of Cheminformatics, 15(49), 1–16. https://doi.org/10.1186/s13321-023-00709-9
Fahrezi, S. Y., Nugraha, A., Luthfiarta, A., & Primadya, N. D. (2024). Optimizing Performance of AdaBoost Algorithm through Undersampling and Hyperparameter Tuning on CICIoT 2023 Dataset. Jurnal Ilmiah Elektroteknika, 23(2), 175–184. https://doi.org/10.31358/techne.v23i2.467
Fonda, H., Irawan, Y., Melyanti, R., Wahyuni, R., & Muhaimin, A. (2024). A Comprehensive Stacking Ensemble Approach for Stress Level Classification in Higher Education. Journal of Applied Data Sciences, 5(4), 1701–1714. https://doi.org/10.47738/jads.v5i4.388
Freda, P. J., Ye, S., Zhang, R., Moore, J. H., & Urbanowicz, R. J. (2024). Assessing the limitations of relief-based algorithms in detecting higher-order interactions. BioData Mining, 17(37), 1–18. https://doi.org/10.1186/s13040-024-00390-0
Gupta, S., Grover, D., Alzubi, A. A., Sachdeva, N., Baig, M. W., & Singla, J. (2022). Machine Learning with Dimensionality Reduction for DDoS Attack Detection. Computers, Materials and Continua, 72(2), 2665–2682. https://doi.org/10.32604/cmc.2022.025048
Hakkal, S., & Lahcen, A. A. (2024). XGBoost To Enhance Learner Performance Prediction. Computers and Education: Artificial Intelligence, 7, 1–10. https://doi.org/10.1016/j.caeai.2024.100254
Hussein, M. K., ALkahla, L. T., & Alqassab, A. (2024). Feature Selection Techniques in Intrusion Detection: A Comprehensive Review. Iraqi Journal for Computers and Informatics, 50(1), 46–53. https://doi.org/10.25195/ijci.v50i1.462
IBM. (2025). Cost of a Data Breach Report 2025.
Ismanto, E., Fadlil, A., Yudhana, A., & Kitagawa, K. (2024). A Comparative Study of Improved Ensemble Learning Algorithms for Patient Severity Condition Classification. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(3), 312–321. https://doi.org/10.35882/jeeemi.v6i3.452
Malashin, I., Tynchenko, V., Gantimurov, A., Nelyub, V., & Borodulin, A. (2025). Boosting-Based Machine Learning Applications in Polymer Science: A Review. Polymers, 17(4), 1–42. https://doi.org/10.3390/polym17040499
Masoodi, F., Bamhdi, A. M., & Teli, T. A. (2021). Machine Learning for Classification analysis of Intrusion Detection on NSL-KDD Dataset. Turkish Journal of Computer and Mathematics Education, 12(10), 2286–2293. https://turcomat.org/index.php/turkbilmat/article/view/4768
Microsoft Digital Defense. (2024). Microsoft Digital Defense Report 2024.
Mulyanto, Y., Susanto, E. S., Akbar, M. I., & Idifitriani, F. (2024). Analisis Keamanan Jaringan Komputer Menggunakan Metode Intrusion Detection System (IDS) dan Firewall. Digital Transformation Technology, 3(2), 864–870. https://doi.org/10.47709/digitech.v3i2.3402
Nabi, F. (2023). Enhancing Intrusion Detection Systems: A Comparative Study of Machine Learning Techniques for Cyber Security. https://doi.org/10.21203/rs.3.rs-3360502/v1
Natha, S., Leghari, M., Rajput, M. A., Zia, S. S., & Shabir, J. (2022). A Systematic Review of Anomaly detection using Machine and Deep Learning Techniques. Quaid-e-Awam University Research Journal of Engineering, Science & Technology, 20(1), 83–94. https://doi.org/10.52584/qrj.2001.11
Ngo, N., Michel, P., & Giorgi, R. (2024). Multivariate filter methods for feature selection with the γ metric. BMC Medical Research Methodology, 24(1), 1–22. https://doi.org/10.1186/s12874-024-02426-9
P, Poobalan., & S, Dr. P. (2022). Hybrid Sequential Feature Selection with Ensemble Boosting Class-based Classification Method. International Journal of Recent Technology and Engineering (IJRTE), 11(4), 13–18. https://doi.org/10.35940/ijrte.D7298.1111422
Putra, R. P., & Amarudin. (2025). Perbandingan Algoritma Machine Learning untuk Intrusion Detection System pada Dataset NSL-KDD. Jurnal Sistem Informasi, 14(4), 1654–1664. http://sistemasi.ftik.unisi.ac.id
Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-56706-x
Regragui, Y., Mazighi, A., Ballihi, L., & Orhanou, G. (2024). Impact Evaluation of Feature Selection Algorithms on Machine Learning-Based Intrusion Detection. Proceedings - 11th International Conference on Wireless Networks and Mobile Communications, WINCOM 2024. https://doi.org/10.1109/WINCOM62286.2024.10656421
Saha, S., & Nandi, D. (2024). SVM-RLF-DNN: A DNN with reliefF and SVM for automatic identification of COVID from chest X-ray and CT images. Digital Health, 10, 1–16. https://doi.org/10.1177/20552076241257045
Sahli, Y. (2022). A comparison of the NSL-KDD dataset and its predecessor the KDD Cup ’99 dataset. International Journal of Scientific Research and Management, 10(04), 832–839. https://doi.org/10.18535/ijsrm/v10i4.ec05
Saputra, N. A., Irawan, R. H., & Mahdiyah, U. (2025). Hybrid Ensemble Learning Sistem Keamanan Jaringan untuk Meningkatkan Performa Deteksi Anomali. Jurnal Nusantara Of Engineering, 8(2), 361–369. https://doi.org/10.29407/noe.v8i02.25617
Schock, C., Dumler, J., & Doepper, F. (2021). Data Acquisition and Preparation - Enabling Data Analytics Projects within Production. Procedia CIRP, 104, 636–640. https://doi.org/10.1016/j.procir.2021.11.107
Setiawan, Y. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Pengembangan IT, 8(2), 89–96. https://doi.org/10.30591/jpit.v8i2.4994
Shokri, B. J., Mirzaghorbanali, A., McDougall, K., Karunasena, W., Nourizadeh, H., Entezam, S., Hosseini, S., & Aziz, N. (2024). Data-Driven Optimised XGBoost for Predicting the Performance of Axial Load Bearing Capacity of Fully Cementitious Grouted Rock Bolting Systems. Applied Sciences (Switzerland), 14(21), 1–26. https://doi.org/10.3390/app14219925
Sujon, K. M., Hassan, R. B., Towshi, Z. T., Othman, M. A., Samad, M. A., & Choi, K. (2024). When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3462434
V. Priyalakshmi, & Dr. R. Devi. (2022). Evaluation of Efficient Classification Algorithm for Intrusion Detection System. International Journal of Advanced Research in Science, Communication and Technology, 2(2), 39–45. https://doi.org/10.48175/ijarsct-7751
Wang, S., Balarezo, J., Kandeepan, S., Al-Hourani, A., Gomez, K., & Rubinstein, B. (2021). Machine Learning in Network Anomaly Detection: A Survey. IEEE Access, 4, 1–17. https://doi.org/10.1109/ACCESS.2021.3126834
Xia, Y., Jiang, S., Meng, L., & Ju, X. (2024). XGBoost-B-GHM: An Ensemble Model with Feature Selection and GHM Loss Function Optimization for Credit Scoring. Systems, 12(7), 1–26. https://doi.org/10.3390/systems12070254
Yang, F., Xu, Z., Wang, H., Sun, L., Zhai, M., & Zhang, J. (2024). A hybrid feature selection algorithm combining information gain and grouping particle swarm optimization for cancer diagnosis. PLoS ONE, 19(3), 1–17. https://doi.org/10.1371/journal.pone.0290332
Yuan, Y., Shen, D., Cao, Y., Wang, X., Zhang, B., & Dong, H. (2025). An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies. Water (Switzerland), 17(10), 1–32. https://doi.org/10.3390/w17101445
Yuliana, Supriyadi, D. H., Fahlevi, M. R., & Arisagas, M. R. (2023). Analysis of NSL-KDD for the Implementation of Machine Learning in Network Intrusion Detection System. Journal of Informatics, Information System, Software Engineering and Applications, 1(1), 001–010. https://doi.org/10.20895/inista.v6i2.1389
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Gilardinho Javiere Oscoraldo Pedrosa Soares, Fauzi Adi Rafrastara, Ramadhan Rakhmat Sani

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
