Optimizing URL-Based Phishing Detection Using XGBoost and Relief Feature Selection

Authors

  • Wahyu Suryaning Tyas Universitas Dian Nuswantoro
  • Fauzi Adi Rafrastara Universitas Dian Nuswantoro
  • Wildanil Ghozi Universitas Dian Nuswantoro

DOI:

10.33395/sinkron.v10i1.15651

Keywords:

AdaBoost, Chi-Square, Information Gain, Phishing, ReliefF

Abstract

Phishing is a significant cybersecurity threat in which attackers exploit manipulated URLs to deceive users and obtain confidential information. As phishing attacks continue to grow in complexity, automated machine learning based detection methods have become essential to strengthen digital security. This study proposes a URL based phishing detection model using boosting algorithms while analyzing the role of feature selection in improving classification performance and computational efficiency. The experiments were conducted on a dataset consisting of 10000 instances with 50 features and balanced class labels. After data preparation, 48 features were retained as input variables, and min max normalization was applied to ensure uniform feature scaling. Three boosting algorithms namely Gradient Boosting, XGBoost, and AdaBoost were evaluated using accuracy, precision, recall, and F1 score. Among these methods, XGBoost achieved the highest accuracy of 98.8 percent, demonstrating its effectiveness in learning complex URL patterns. Subsequently, three feature selection techniques namely Information Gain, Chi Square, and ReliefF were applied and evaluated using 10 fold cross validation. The results indicate that ReliefF provides the most effective feature reduction by selecting 37 features while maintaining the same classification accuracy. Unlike previous studies that mainly focus on classifier comparison, this study demonstrates that integrating XGBoost with ReliefF enables significant feature dimensionality reduction without compromising predictive accuracy. This finding highlights an efficient trade off between detection performance and computational complexity. Overall, the proposed framework offers a robust, efficient, and scalable solution for fast and adaptive phishing detection in modern cybersecurity environments.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Agustina, T., Masrizal, & Irmayanti. (2024). Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection. Jurnal Dan Penelitian Teknik Informatika, 8(2), 1116–1123. https://doi.org/10.33395/sinkron.v8i2.13625

Ahmed, H. A., Ali, P. J. M., Faeq, A. K., & Abdullah, S. M. (2022). An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970

Alshdaifat, E., Alshdaifat, D., Alsarhan, A., Hussein, F., & El-Salhi, S. M. F. S. (2021). The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance. Data, 6(2), 1. https://doi.org/10.3390/data

Awasthi, A., & Goel, N. (2024). An Approach for Efficient and Accurate Phishing Website Prediction Using Improved ML Classifier Performance for Feature Selection. International Journal of Experimental Research and Review, 40(Special Issue), 73–89. https://doi.org/10.52756/ijerr.2024.v40spl.006

BSSN. (2023). LANSKAP KEAMANAN SIBER INDONESIA 2023.

Budiono, B., Fadillah, F. R., & Arinudin, N. (2025). The Dangers of Phishing to Personal Data Security. Formosa Journal of Applied Sciences, 4(3), 831–844. https://doi.org/10.55927/fjas.v4i3.61

Fahri, M. (2025). Penerapan Algoritma Random Forest untuk Deteksi Phishing pada Website. Jurnal Ilmiah Teknologi Sistem Informasi, 6(2), 186–194. https://doi.org/10.62527/jitsi.6.2.472

Fatiha, M. R., Setiawan, I., Ikhsan, A. N., & Yunita, I. R. (2024). Optimisasi Sitem Deteksi Phishing Berbasis Web Menggunakan Algoritma Decision Tree. Jurnal Ilmiah IT CIDA : Diseminasi Teknologi Informasi, 10(2), 97–108. https://doi.org/10.55635/jic.v10i2.212

Fauzan, R., Vitianingsih, A. V., Cahyono, D., Maukar, A. L., & Suprio, Y. A. B. (2025). Application of Classification Algorithms in Machine Learning for Phishing Detection. Indonesian Journal of Machine Learning and Computer Science, 5(2), 531–540. https://doi.org/10.57152/malcom.v5i2.1968

Hermawan, G. (2024). Memahami Peran Dataset dalam Penelitian Kecerdasan Buatan: Kualitas, Aksesbilitas, dan Tantangan.

Iwanowski, M., Olszewski, D., Graniszewski, W., Krupski, J., & Pelc, F. (2025). The Choice of Training Data and the Generalizability of Machine Learning Models for Network Intrusion Detection Systems. Applied Science, 15(15), 1–22. https://doi.org/10.3390/app15158466

Lin, W., Shi, S., Huang, H., Wen, J., & Chen, G. (2023). Predicting risk of Obesity in Overweight Adults Using Interpretable Machine Learning Algorithms. Frontiers in Endocrinology, 14, 01–10. https://doi.org/10.3389/fendo.2023.1292167

Mahmud Sujon, K., Binti Hassan, R., Tusnia Towshi, Z., Othman, M. A., Abdus Samad, M., & Choi, K. (2024). When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3462434

Narayana, G., Manchala, U. D., Naresh, U., Kiran, S., Kiran, M. A., & Ch, R. K. (2023). Improving Phishing Website Detection with Machine Learning: Revealing Hidden Patterns for Better Accuracy. International Journal on Recent and Innovation Trends in Computing and Communication, 11, 2321–8169. https://doi.org/10.17762/ijritcc.v11i8.8353

Preeti, & Sharma, P. (2024). Enhancing phishing URL detection through Comprehensive Feature Selection: A Comparative Analysis Across Diverse Datasets. Indonesian Journal of Electrical Engineering and Computer Science, 36(2), 1182–1188. https://doi.org/10.11591/ijeecs.v36.i2.pp1182-1188

Putri, N. B., & Wijayanto, A. W. (2022). Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing. Jurnal Sistem Komputer, 11(1), 59–66. https://doi.org/10.34010/komputika.v11i1.4350

Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-56706-x

Rayadin, M. A., Musaruddin, M., Saputra, R. A., & Isnawaty, I. (2024). Implementasi Ensemble Learning Metode XGBoost dan Random Forest untuk Prediksi Waktu Penggantian Baterai Aki. Jurnal Teknologi Informasi Dan Rekayasa Komputer, 5(2), 111–119. https://doi.org/10.37148/bios.v5i2.128

Savyanavar, A. S., Sankpal, P., & Mhala, N. C. (2024). Phishing Webpage Detection using Feature Selection Methods. Journal of Electrical Systems, 20(5s), 447–452. https://doi.org/https://doi.org/10.52783/jes.2070

Sirisha, M. L. (2025). Detection of Phishing Website Using Machine Learning. International Journal of Computer Science and Mobile Computing, 14(4), 98–103. https://doi.org/10.47760/ijcsmc.2025.v14i04.008

Sitarz, M. (2023). Extending F1 Metric, Probabilistic Approach. Advances in Artificial Intelligence and Machine Learning; Research, 3(2), 1025–1038. https://www.oajaiml.com/

Sukmawati, C. E., Pratama, A. R., Hikmayanti, H., & Juwita, A. R. (2025). Performance Optimization of Adaboost and XGBoost Algorithms on Obesity Disease Dataset with Smote Oversampling Technique. Jurnal Pengembangan IT, 10(3), 771–780. https://doi.org/10.30591/jpit.v10i3.8536

Vidhya, N. G., Nirmala, D., & Manju, T. (2023). Quality Challenges in Deep Learning Data Collection in Perspective of Artificial Intelligence. Journal of Information Technology and Computing, 4(1), 46–58. https://doi.org/10.48185/jitc.v4i1.725

White, J., & Power, S. D. (2023). k-Fold Cross-Validation Can Significantly Over-Estimate True Classification Accuracy in Common EEG-Based Passive BCI Experimental Designs: An Empirical Investigation. Sensors, 23(13), 1. https://doi.org/10.3390/s23136077

Zhang, P., Jia, Y., & Shang, Y. (2022). Research and Application of XGBoost in Imbalanced Data. International Journal of Distributed Sensor Networks, 18(6), 1–10. https://doi.org/10.1177/15501329221106935

Zieni, R., Massari, L., & Calzarossa, M. C. (2023). Phishing or Not Phishing? A Survey on the Detection of Phishing Websites. IEEE Access, 11, 18499–18519. https://doi.org/10.1109/ACCESS.2023.3247135

Downloads


Crossmark Updates

How to Cite

Tyas, W. S. ., Rafrastara, F. A. ., & Ghozi, W. . (2026). Optimizing URL-Based Phishing Detection Using XGBoost and Relief Feature Selection. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 430-438. https://doi.org/10.33395/sinkron.v10i1.15651