Optimizing URL-Based Phishing Detection Using XGBoost and Relief Feature Selection
DOI:
10.33395/sinkron.v10i1.15651Keywords:
AdaBoost, Chi-Square, Information Gain, Phishing, ReliefFAbstract
Phishing is a significant cybersecurity threat in which attackers exploit manipulated URLs to deceive users and obtain confidential information. As phishing attacks continue to grow in complexity, automated machine learning based detection methods have become essential to strengthen digital security. This study proposes a URL based phishing detection model using boosting algorithms while analyzing the role of feature selection in improving classification performance and computational efficiency. The experiments were conducted on a dataset consisting of 10000 instances with 50 features and balanced class labels. After data preparation, 48 features were retained as input variables, and min max normalization was applied to ensure uniform feature scaling. Three boosting algorithms namely Gradient Boosting, XGBoost, and AdaBoost were evaluated using accuracy, precision, recall, and F1 score. Among these methods, XGBoost achieved the highest accuracy of 98.8 percent, demonstrating its effectiveness in learning complex URL patterns. Subsequently, three feature selection techniques namely Information Gain, Chi Square, and ReliefF were applied and evaluated using 10 fold cross validation. The results indicate that ReliefF provides the most effective feature reduction by selecting 37 features while maintaining the same classification accuracy. Unlike previous studies that mainly focus on classifier comparison, this study demonstrates that integrating XGBoost with ReliefF enables significant feature dimensionality reduction without compromising predictive accuracy. This finding highlights an efficient trade off between detection performance and computational complexity. Overall, the proposed framework offers a robust, efficient, and scalable solution for fast and adaptive phishing detection in modern cybersecurity environments.
Downloads
References
Agustina, T., Masrizal, & Irmayanti. (2024). Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection. Jurnal Dan Penelitian Teknik Informatika, 8(2), 1116–1123. https://doi.org/10.33395/sinkron.v8i2.13625
Ahmed, H. A., Ali, P. J. M., Faeq, A. K., & Abdullah, S. M. (2022). An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method. ARO-The Scientific Journal of Koya University, 10(2), 29–37. https://doi.org/10.14500/aro.10970
Alshdaifat, E., Alshdaifat, D., Alsarhan, A., Hussein, F., & El-Salhi, S. M. F. S. (2021). The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance. Data, 6(2), 1. https://doi.org/10.3390/data
Awasthi, A., & Goel, N. (2024). An Approach for Efficient and Accurate Phishing Website Prediction Using Improved ML Classifier Performance for Feature Selection. International Journal of Experimental Research and Review, 40(Special Issue), 73–89. https://doi.org/10.52756/ijerr.2024.v40spl.006
BSSN. (2023). LANSKAP KEAMANAN SIBER INDONESIA 2023.
Budiono, B., Fadillah, F. R., & Arinudin, N. (2025). The Dangers of Phishing to Personal Data Security. Formosa Journal of Applied Sciences, 4(3), 831–844. https://doi.org/10.55927/fjas.v4i3.61
Fahri, M. (2025). Penerapan Algoritma Random Forest untuk Deteksi Phishing pada Website. Jurnal Ilmiah Teknologi Sistem Informasi, 6(2), 186–194. https://doi.org/10.62527/jitsi.6.2.472
Fatiha, M. R., Setiawan, I., Ikhsan, A. N., & Yunita, I. R. (2024). Optimisasi Sitem Deteksi Phishing Berbasis Web Menggunakan Algoritma Decision Tree. Jurnal Ilmiah IT CIDA : Diseminasi Teknologi Informasi, 10(2), 97–108. https://doi.org/10.55635/jic.v10i2.212
Fauzan, R., Vitianingsih, A. V., Cahyono, D., Maukar, A. L., & Suprio, Y. A. B. (2025). Application of Classification Algorithms in Machine Learning for Phishing Detection. Indonesian Journal of Machine Learning and Computer Science, 5(2), 531–540. https://doi.org/10.57152/malcom.v5i2.1968
Hermawan, G. (2024). Memahami Peran Dataset dalam Penelitian Kecerdasan Buatan: Kualitas, Aksesbilitas, dan Tantangan.
Iwanowski, M., Olszewski, D., Graniszewski, W., Krupski, J., & Pelc, F. (2025). The Choice of Training Data and the Generalizability of Machine Learning Models for Network Intrusion Detection Systems. Applied Science, 15(15), 1–22. https://doi.org/10.3390/app15158466
Lin, W., Shi, S., Huang, H., Wen, J., & Chen, G. (2023). Predicting risk of Obesity in Overweight Adults Using Interpretable Machine Learning Algorithms. Frontiers in Endocrinology, 14, 01–10. https://doi.org/10.3389/fendo.2023.1292167
Mahmud Sujon, K., Binti Hassan, R., Tusnia Towshi, Z., Othman, M. A., Abdus Samad, M., & Choi, K. (2024). When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3462434
Narayana, G., Manchala, U. D., Naresh, U., Kiran, S., Kiran, M. A., & Ch, R. K. (2023). Improving Phishing Website Detection with Machine Learning: Revealing Hidden Patterns for Better Accuracy. International Journal on Recent and Innovation Trends in Computing and Communication, 11, 2321–8169. https://doi.org/10.17762/ijritcc.v11i8.8353
Preeti, & Sharma, P. (2024). Enhancing phishing URL detection through Comprehensive Feature Selection: A Comparative Analysis Across Diverse Datasets. Indonesian Journal of Electrical Engineering and Computer Science, 36(2), 1182–1188. https://doi.org/10.11591/ijeecs.v36.i2.pp1182-1188
Putri, N. B., & Wijayanto, A. W. (2022). Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing. Jurnal Sistem Komputer, 11(1), 59–66. https://doi.org/10.34010/komputika.v11i1.4350
Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-56706-x
Rayadin, M. A., Musaruddin, M., Saputra, R. A., & Isnawaty, I. (2024). Implementasi Ensemble Learning Metode XGBoost dan Random Forest untuk Prediksi Waktu Penggantian Baterai Aki. Jurnal Teknologi Informasi Dan Rekayasa Komputer, 5(2), 111–119. https://doi.org/10.37148/bios.v5i2.128
Savyanavar, A. S., Sankpal, P., & Mhala, N. C. (2024). Phishing Webpage Detection using Feature Selection Methods. Journal of Electrical Systems, 20(5s), 447–452. https://doi.org/https://doi.org/10.52783/jes.2070
Sirisha, M. L. (2025). Detection of Phishing Website Using Machine Learning. International Journal of Computer Science and Mobile Computing, 14(4), 98–103. https://doi.org/10.47760/ijcsmc.2025.v14i04.008
Sitarz, M. (2023). Extending F1 Metric, Probabilistic Approach. Advances in Artificial Intelligence and Machine Learning; Research, 3(2), 1025–1038. https://www.oajaiml.com/
Sukmawati, C. E., Pratama, A. R., Hikmayanti, H., & Juwita, A. R. (2025). Performance Optimization of Adaboost and XGBoost Algorithms on Obesity Disease Dataset with Smote Oversampling Technique. Jurnal Pengembangan IT, 10(3), 771–780. https://doi.org/10.30591/jpit.v10i3.8536
Vidhya, N. G., Nirmala, D., & Manju, T. (2023). Quality Challenges in Deep Learning Data Collection in Perspective of Artificial Intelligence. Journal of Information Technology and Computing, 4(1), 46–58. https://doi.org/10.48185/jitc.v4i1.725
White, J., & Power, S. D. (2023). k-Fold Cross-Validation Can Significantly Over-Estimate True Classification Accuracy in Common EEG-Based Passive BCI Experimental Designs: An Empirical Investigation. Sensors, 23(13), 1. https://doi.org/10.3390/s23136077
Zhang, P., Jia, Y., & Shang, Y. (2022). Research and Application of XGBoost in Imbalanced Data. International Journal of Distributed Sensor Networks, 18(6), 1–10. https://doi.org/10.1177/15501329221106935
Zieni, R., Massari, L., & Calzarossa, M. C. (2023). Phishing or Not Phishing? A Survey on the Detection of Phishing Websites. IEEE Access, 11, 18499–18519. https://doi.org/10.1109/ACCESS.2023.3247135
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Wahyu Suryaning Tyas, Fauzi Adi Rafrastara, Wildanil Ghozi

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
