Enhancing Feature-Efficient Network Intrusion Detection Using Gradient Boosting and Chi-Square Selection on NSL-KDD

Authors

  • Gilardinho Javiere Oscoraldo Pedrosa Soares Universitas Dian Nuswantoro
  • Fauzi Adi Rafrastara Universitas Dian Nuswantoro
  • Ramadhan Rakhmat Sani Universitas Dian Nuswantoro

DOI:

10.33395/sinkron.v10i1.15650

Keywords:

Chi-Square, Feature Selection, Gradient Boosting, Network Intrusion Detection System, NSL-KDD

Abstract

This study examines the growing complexity of cyber threats that increasingly challenge the effectiveness of traditional Network Intrusion Detection Systems (NIDS). Modern attacks, particularly zero-day intrusions, require detection approaches capable of handling high-dimensional network traffic data. However, existing studies rarely examine the trade-off between feature efficiency and generalization performance in boosting-based NIDS under controlled feature-reduction strategies. Moreover, the role of statistical feature selection in mitigating overfitting in classical boosting models remains underexplored. This study evaluates the performance of NIDS by combining boosting ensemble algorithms, namely AdaBoost, Gradient Boosting, and XGBoost, with filter-based feature selection methods, including Information Gain, Chi-Square, and ReliefF. The NSL-KDD dataset is used as the primary benchmark, with Min–Max normalization applied during preprocessing to ensure numerical feature consistency. Model development is conducted using Orange Data Mining, and performance is assessed through 10-fold cross-validation. Experimental results show that Gradient Boosting achieves the highest baseline accuracy among the evaluated models. Further performance improvements are obtained through feature selection, with the Chi-Square method yielding the best result at 81.2% accuracy using 19 selected features. Information Gain also enhances performance, achieving 80.8% accuracy with 13 features, while ReliefF provides comparatively lower gains. These findings demonstrate that effective feature reduction improves generalization performance, reduces computational complexity, and mitigates overfitting. Overall, the proposed combination of Gradient Boosting and statistical feature selection provides a feature-efficient, generalizable intrusion detection strategy for modern network environments.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Abdullah, G. M. S., Ahmad, M., Babur, M., Badshah, M. U., Al-Mansob, R. A., Gamil, Y., & Fawad, M. (2024). Boosting-based ensemble machine learning models for predicting unconfined compressive strength of geopolymer stabilized clayey soil. Scientific Reports, 14(2323). https://doi.org/10.1038/s41598-024-52825-7

Adzimi, S. N., Alfasih, H. A., Ramadhan, F. N. G., Neyman, S. N., & Setiawan, A. (2024). Implementasi Konfigurasi Firewall dan Sistem Deteksi Intrusi menggunakan Debian. Journal of Internet and Software Engineering, 1(4), 12. https://doi.org/10.47134/pjise.v1i4.2681

Agustina, T., Masrizal, & Irmayanti. (2024). Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection. Jurnal Dan Penelitian Teknik Informatika, 8(2), 1116–1123. https://doi.org/10.33395/sinkron.v8i2.13625

Ahmad, T., & Aziz, M. N. (2019). Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Express Letters, 13(2), 93–101. https://doi.org/10.24507/icicel.13.02.93

Ahmed, H. A., Muhammad Ali, P. J., Faeq, A. K., & Abdullah, S. M. (2022). An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970

Alsulami, B., Almalawi, A., & Fahad, A. (2022). A Review on Machine Learning Based Approaches of Network Intrusion Detection Systems. International Journal of Current Science Research and Review, 05(06), 2159–2177. https://doi.org/10.47191/ijcsrr/V5-i6-47

APJII. (2024). Data Riset Pengguna Internet di Indonesia Tahun 2024. https://apjii.or.id/berita/d/apjii-jumlah-pengguna-internet-indonesia-tembus-221-juta-orang

Aqilah Bohani, F., Syazwani, F., Rashid, M., Mahmud, Y., & Yahya, S. R. (2024). ANALYZING THE IMPACT OF FEATURE SELECTION USING INFORMATION GAIN FOR AIRLINES’ CUSTOMER SATISFACTION. Malaysian Journal of Computing, 9(1), 1673–1689. https://doi.org/10.24191/mjoc.v9i1.24163

Ardana, A. (2023). Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement. Jurnal Informatika Dan Teknologi Informasi, 20(1), 91–102. https://doi.org/10.31515/telematika.v20i1.9329

Armijos, A., & Cuenca, E. (2023, November). Zero-day attacks: review of the methods used based on intrusion detection and prevention systems. 1st IEEE Colombian Caribbean Conference, C3 2023. https://doi.org/10.1109/C358072.2023.10436218

Azizah, R. A., Bachtiar, F. A., & Adinugroho, S. (2022). KLASIFIKASI KINERJA AKADEMIK SISWA MENGGUNAKAN NEIGHBOR WEIGHTED K-NEAREST NEIGHBOR DENGAN SELEKSI FITUR INFORMATION GAIN. Jurnal Teknologi Informasi Dan Ilmu Komputer, 9(3), 605–614. https://doi.org/10.25126/jtiik.202295751

BSSN. (2023). LANSKAP KEAMANAN SIBER INDONESIA 2023.

Boldini, D., Grisoni, F., Kuhn, D., Friedrich, L., & Sieber, S. A. (2023). Practical guidelines for the use of gradient boosting for molecular property prediction. Journal of Cheminformatics, 15(73), 1–13. https://doi.org/10.1186/s13321-023-00743-7

Bouke, M. A., Abdullah, A., Udzir, N. I., & Samian, N. (2024). Overcoming the Challenges of Data Lack, Leakage, and Dimensionality in Intrusion Detection Systems: A Comprehensive Review. Journal of Communication and Information Systems, 39(2024), 22–34. https://doi.org/10.14209/jcis.2024.3

Chairunnisa, C., Ernawati Iin, & Santoni, M. M. (2022). Klasifikasi Sentimen Ulasan Pengguna Aplikasi PeduliLindungi di Google Play Menggunakan Algoritma Support Vector Machine dengan Seleksi Fitur Chi-Square. Jurnal Informatik, 18(1), 69–79. https://doi.org/10.52958/iftk.v17i4.4594

Chavan, P. V., & Alone, N. V. (2025). Optimizing Intrusion Detection with Random Forest: A High-Accuracy Approach using CIC-IDS 2017. International Journal of Computer Applications, 187(3), 17–22. https://doi.org/10.5120/ijca2025924816

Chhaybi, A., & Lazaar, S. (2025). Enhancing malware detection utilizing Chi-Square distribution for optimal feature selection in machine learning black box models. Journal of Dynamics and Games, 14, 190–203. https://doi.org/10.3934/jdg.2025010

Dutschmann, T.-M., Kinzel, L., ter Laak, A., & Baumann, K. (2023). Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. Journal of Cheminformatics, 15(49), 1–16. https://doi.org/10.1186/s13321-023-00709-9

Fahrezi, S. Y., Nugraha, A., Luthfiarta, A., & Primadya, N. D. (2024). Optimizing Performance of AdaBoost Algorithm through Undersampling and Hyperparameter Tuning on CICIoT 2023 Dataset. Jurnal Ilmiah Elektroteknika, 23(2), 175–184. https://doi.org/10.31358/techne.v23i2.467

Fonda, H., Irawan, Y., Melyanti, R., Wahyuni, R., & Muhaimin, A. (2024). A Comprehensive Stacking Ensemble Approach for Stress Level Classification in Higher Education. Journal of Applied Data Sciences, 5(4), 1701–1714. https://doi.org/10.47738/jads.v5i4.388

Freda, P. J., Ye, S., Zhang, R., Moore, J. H., & Urbanowicz, R. J. (2024). Assessing the limitations of relief-based algorithms in detecting higher-order interactions. BioData Mining, 17(37), 1–18. https://doi.org/10.1186/s13040-024-00390-0

Gupta, S., Grover, D., Alzubi, A. A., Sachdeva, N., Baig, M. W., & Singla, J. (2022). Machine Learning with Dimensionality Reduction for DDoS Attack Detection. Computers, Materials and Continua, 72(2), 2665–2682. https://doi.org/10.32604/cmc.2022.025048

Hakkal, S., & Lahcen, A. A. (2024). XGBoost To Enhance Learner Performance Prediction. Computers and Education: Artificial Intelligence, 7, 1–10. https://doi.org/10.1016/j.caeai.2024.100254

Hussein, M. K., ALkahla, L. T., & Alqassab, A. (2024). Feature Selection Techniques in Intrusion Detection: A Comprehensive Review. Iraqi Journal for Computers and Informatics, 50(1), 46–53. https://doi.org/10.25195/ijci.v50i1.462

IBM. (2025). Cost of a Data Breach Report 2025.

Ismanto, E., Fadlil, A., Yudhana, A., & Kitagawa, K. (2024). A Comparative Study of Improved Ensemble Learning Algorithms for Patient Severity Condition Classification. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(3), 312–321. https://doi.org/10.35882/jeeemi.v6i3.452

Malashin, I., Tynchenko, V., Gantimurov, A., Nelyub, V., & Borodulin, A. (2025). Boosting-Based Machine Learning Applications in Polymer Science: A Review. Polymers, 17(4), 1–42. https://doi.org/10.3390/polym17040499

Masoodi, F., Bamhdi, A. M., & Teli, T. A. (2021). Machine Learning for Classification analysis of Intrusion Detection on NSL-KDD Dataset. Turkish Journal of Computer and Mathematics Education, 12(10), 2286–2293. https://turcomat.org/index.php/turkbilmat/article/view/4768

Microsoft Digital Defense. (2024). Microsoft Digital Defense Report 2024.

Mulyanto, Y., Susanto, E. S., Akbar, M. I., & Idifitriani, F. (2024). Analisis Keamanan Jaringan Komputer Menggunakan Metode Intrusion Detection System (IDS) dan Firewall. Digital Transformation Technology, 3(2), 864–870. https://doi.org/10.47709/digitech.v3i2.3402

Nabi, F. (2023). Enhancing Intrusion Detection Systems: A Comparative Study of Machine Learning Techniques for Cyber Security. https://doi.org/10.21203/rs.3.rs-3360502/v1

Natha, S., Leghari, M., Rajput, M. A., Zia, S. S., & Shabir, J. (2022). A Systematic Review of Anomaly detection using Machine and Deep Learning Techniques. Quaid-e-Awam University Research Journal of Engineering, Science & Technology, 20(1), 83–94. https://doi.org/10.52584/qrj.2001.11

Ngo, N., Michel, P., & Giorgi, R. (2024). Multivariate filter methods for feature selection with the γ metric. BMC Medical Research Methodology, 24(1), 1–22. https://doi.org/10.1186/s12874-024-02426-9

P, Poobalan., & S, Dr. P. (2022). Hybrid Sequential Feature Selection with Ensemble Boosting Class-based Classification Method. International Journal of Recent Technology and Engineering (IJRTE), 11(4), 13–18. https://doi.org/10.35940/ijrte.D7298.1111422

Putra, R. P., & Amarudin. (2025). Perbandingan Algoritma Machine Learning untuk Intrusion Detection System pada Dataset NSL-KDD. Jurnal Sistem Informasi, 14(4), 1654–1664. http://sistemasi.ftik.unisi.ac.id

Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-56706-x

Regragui, Y., Mazighi, A., Ballihi, L., & Orhanou, G. (2024). Impact Evaluation of Feature Selection Algorithms on Machine Learning-Based Intrusion Detection. Proceedings - 11th International Conference on Wireless Networks and Mobile Communications, WINCOM 2024. https://doi.org/10.1109/WINCOM62286.2024.10656421

Saha, S., & Nandi, D. (2024). SVM-RLF-DNN: A DNN with reliefF and SVM for automatic identification of COVID from chest X-ray and CT images. Digital Health, 10, 1–16. https://doi.org/10.1177/20552076241257045

Sahli, Y. (2022). A comparison of the NSL-KDD dataset and its predecessor the KDD Cup ’99 dataset. International Journal of Scientific Research and Management, 10(04), 832–839. https://doi.org/10.18535/ijsrm/v10i4.ec05

Saputra, N. A., Irawan, R. H., & Mahdiyah, U. (2025). Hybrid Ensemble Learning Sistem Keamanan Jaringan untuk Meningkatkan Performa Deteksi Anomali. Jurnal Nusantara Of Engineering, 8(2), 361–369. https://doi.org/10.29407/noe.v8i02.25617

Schock, C., Dumler, J., & Doepper, F. (2021). Data Acquisition and Preparation - Enabling Data Analytics Projects within Production. Procedia CIRP, 104, 636–640. https://doi.org/10.1016/j.procir.2021.11.107

Setiawan, Y. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Pengembangan IT, 8(2), 89–96. https://doi.org/10.30591/jpit.v8i2.4994

Shokri, B. J., Mirzaghorbanali, A., McDougall, K., Karunasena, W., Nourizadeh, H., Entezam, S., Hosseini, S., & Aziz, N. (2024). Data-Driven Optimised XGBoost for Predicting the Performance of Axial Load Bearing Capacity of Fully Cementitious Grouted Rock Bolting Systems. Applied Sciences (Switzerland), 14(21), 1–26. https://doi.org/10.3390/app14219925

Sujon, K. M., Hassan, R. B., Towshi, Z. T., Othman, M. A., Samad, M. A., & Choi, K. (2024). When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3462434

V. Priyalakshmi, & Dr. R. Devi. (2022). Evaluation of Efficient Classification Algorithm for Intrusion Detection System. International Journal of Advanced Research in Science, Communication and Technology, 2(2), 39–45. https://doi.org/10.48175/ijarsct-7751

Wang, S., Balarezo, J., Kandeepan, S., Al-Hourani, A., Gomez, K., & Rubinstein, B. (2021). Machine Learning in Network Anomaly Detection: A Survey. IEEE Access, 4, 1–17. https://doi.org/10.1109/ACCESS.2021.3126834

Xia, Y., Jiang, S., Meng, L., & Ju, X. (2024). XGBoost-B-GHM: An Ensemble Model with Feature Selection and GHM Loss Function Optimization for Credit Scoring. Systems, 12(7), 1–26. https://doi.org/10.3390/systems12070254

Yang, F., Xu, Z., Wang, H., Sun, L., Zhai, M., & Zhang, J. (2024). A hybrid feature selection algorithm combining information gain and grouping particle swarm optimization for cancer diagnosis. PLoS ONE, 19(3), 1–17. https://doi.org/10.1371/journal.pone.0290332

Yuan, Y., Shen, D., Cao, Y., Wang, X., Zhang, B., & Dong, H. (2025). An Ensemble Machine Learning Approach for High-Resolution Estimation of Groundwater Storage Anomalies. Water (Switzerland), 17(10), 1–32. https://doi.org/10.3390/w17101445

Yuliana, Supriyadi, D. H., Fahlevi, M. R., & Arisagas, M. R. (2023). Analysis of NSL-KDD for the Implementation of Machine Learning in Network Intrusion Detection System. Journal of Informatics, Information System, Software Engineering and Applications, 1(1), 001–010. https://doi.org/10.20895/inista.v6i2.1389

Downloads


Crossmark Updates

How to Cite

Soares, G. J. O. P., Fauzi Adi Rafrastara, & Ramadhan Rakhmat Sani. (2026). Enhancing Feature-Efficient Network Intrusion Detection Using Gradient Boosting and Chi-Square Selection on NSL-KDD. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 525-535. https://doi.org/10.33395/sinkron.v10i1.15650