Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants

Authors

  • Akhmal Angga Syahputra Informatics, Faculty of Computer Science, University of Amikom Purwokerto
  • Rujianto Eko Saputro Information Technology, Faculty of Computer Science, University of Amikom Purwokerto

DOI:

10.33395/sinkron.v8i3.13840

Keywords:

XGBoost; Hyperparameter tuning; GridSearchCV; Cross-validation; Industry classification

Abstract

The development of technology and changes in job market dynamics have created new challenges in aligning education with industry needs. In this research, the XGBoost model with hyperparameter tuning was applied for industry classification on job applicant data taken from the Kaggle dataset LinkedIn Job Postings in 2023. This dataset consists of 23 attributes with a total of 33,085 job vacancy data points. The experimental results show that both the model without hyperparameter tuning and with GridSearchCV produce the same classification accuracy, which is 0.89 or 89%, with stable precision, recall, and F1-Score values. The best parameters found in this study are colsample_bytree = 1.0, learning_rate = 0.3, max_depth = 6, min_child_weight = 1, n_estimators = 100, and subsample = 1.0. However, cross-validation using k-fold shows a significant increase in accuracy to 0.90, or 90%. This finding confirms that the use of cross-validation can improve the performance estimation of the model more accurately and robustly by utilizing all available data for training and testing. Moreover, the implementation of cross-validation demonstrates the importance of leveraging all data points to enhance model reliability and robustness. Future research can explore alternative hyperparameter tuning methods and apply the model to larger datasets to further validate the generalizability and reliability of the XGBoost model in different application contexts. Thus, this study underscores the significance of rigorous model evaluation techniques in achieving high-performing machine learning models

GS Cited Analysis

Downloads

Download data is not yet available.

References

Abdurrahman, G., Oktavianto, H., & Sintawati, M. (2022). Optimasi Algoritma XGBoost Classifier Menggunakan Hyperparameter Gridesearch dan Random Search Pada Klasifikasi Penyakit Diabetes. INFORMAL: Informatics Journal, 7(3), 193. https://doi.org/10.19184/isj.v7i3.35441

Akbar, M., Nadyanto, W., & Komalasari, D. (2023). Prediction of Total Transaction Using Extreme Gradient Boosting ( XGBoost ). 9(1), 30–45.

Alhamad, A., Azis, A. I. S., Santoso, B., & Taliki, S. (2019). Prediksi Penyakit Jantung Menggunakan Metode-Metode Machine Learning Berbasis Ensemble – Weighted Vote. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 5(3), 352. https://doi.org/10.26418/jp.v5i3.37188

Alifah, R. N., Najib, M. K., Nurdiati, S., Sari, A. P., Herlambang, K., Putri, T., Ginting, B., & Sya’adah, S. N. (2024). Perbandingan Metode Tree Based Classification untuk Masalah Klasifikasi Data Body Mass Index. Indones. J. Math. Nat. Sci, 47(1), 2024. https://journal.unnes.ac.id/journals/JM/index

Azis, A. M. H., & Lestari, D. P. (2023). XGBoost and Convolutional Neural Network Classification Models on Pronunciation of Hijaiyah Letters According to Sanad. Jurnal Online Informatika, 8(2), 194–203. https://doi.org/10.15575/join.v8i2.1081

Cahyani, I. A., Ashuri, P. I., & Aditya, C. S. K. (2024). Stunting Disease Classification Using Multi-Layer Perceptron Algorithm with GridSearchCV. Sinkron, 9(1), 392–401. https://doi.org/10.33395/sinkron.v9i1.13245

Darvishy, A., Ibrahim, H., Sidi, F., & Mustapha, A. (2020). HYPNER: A Hybrid Approach for Personalized News Recommendation. IEEE Access, 8, 46877–46894. https://doi.org/10.1109/ACCESS.2020.2978505

Doghramachi, D. F., & Ameen, S. Y. (2023). Internet of Things (IoT) Security Enhancement Using XGboost Machine Learning Techniques. Computers, Materials and Continua, 77(1), 717–732. https://doi.org/10.32604/cmc.2023.041186

Farhan, N. M., & Setiaji, B. (2023). Indonesian Journal of Computer Science. Indonesian Journal of Computer Science, 12(2), 284–301. http://ijcs.stmikindonesia.ac.id/ijcs/index.php/ijcs/article/view/3135

Fatihah, A. M., Dharmawan, K., & Swastika, P. V. (2024). Implementation of X-Gradient Boosting in Banking Stock Price Predictions (Issue Icamsac 2023). Atlantis Press International BV. https://doi.org/10.2991/978-94-6463-413-6_17

Firdaus, A. A., Komarudin, A., Statistika, M. P., Matematika, F., Ilmu, D., & Alam, P. (2021). Klasifikasi Pemegang Polis Menggunakan Metode XGBoost. Prosiding Statistika, 7(2), 704–710. http://dx.doi.org/10.29313/.v0i0.30320

Gunawan, R. G., Erik Suanda Handika, & Edi Ismanto. (2022). Pendekatan Machine Learning Dengan Menggunakan Algoritma Xgboost (Extreme Gradient Boosting) Untuk Peningkatan Kinerja Klasifikasi Serangan Syn. Jurnal CoSciTech (Computer Science and Information Technology), 3(3), 453–463. https://doi.org/10.37859/coscitech.v3i3.4356

Hamami, F., & Dahlan, I. A. (2022). Klasifikasi Cuaca Provinsi Dki Jakarta Menggunakan Algoritma Random Forest Dengan Teknik Oversampling. Jurnal Teknoinfo, 16(1), 87. https://doi.org/10.33365/jti.v16i1.1533

Ihsan, C. N., Agustina, N., Naseer, M., Gusdevi, H., Rusdi, J. F., Hadhiwibowo, A., & Abdullah, F. (2024). Comparison of Machine Learning Algorithms in Detecting Tea Leaf Diseases. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(1), 135–141. https://doi.org/10.29207/resti.v8i1.5587

Kohli, S., & Joshi, P. (2021). “ A Brief Study on Random Forest Using Python .” 3(6), 2063–2069. https://doi.org/10.35629/5252-030620632069

Kuhn, M., & Johnson, K. (2019). Feature Engineering and Selection. In Feature Engineering and Selection. CRC Press. https://doi.org/10.1201/9781315108230

Kumar, R., & Geetha, S. (2020). Malware classification using XGboost-Gradient boosted decision tree. Advances in Science, Technology and Engineering Systems, 5(5), 536–549. https://doi.org/10.25046/AJ050566

Kurnia, D., Itqan Mazdadi, M., Kartini, D., Adi Nugroho, R., & Abadi, F. (2023). Seleksi Fitur dengan Particle Swarm Optimization pada Klasifikasi Penyakit Parkinson Menggunakan XGBoost. Jurnal Teknologi Informasi Dan Ilmu Komputer, 10(5), 1083–1094. https://doi.org/10.25126/jtiik.20231057252

Lampiran, L., & Smkn, M. O. U. (n.d.). library.uns.ac.id digilib.uns.ac.id.

Muslim Karo Karo, I. (2020). Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan. Journal of Software Engineering, Information and Communication Technology, 1(1), 11–18.

Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20, 1–32.

Punuri, S. B., Kuanar, S. K., Kolhar, M., Mishra, T. K., Alameen, A., Mohapatra, H., & Mishra, S. R. (2023). Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning. Mathematics, 11(3), 1–24. https://doi.org/10.3390/math11030776

Putri, R. T. E., Zeniarja, J., Winarno, S., Cahyani, A. N., & Maulani, A. A. (2024). GridSearch and Data Splitting for Effectiveness Heart Disease Classification. Sinkron, 9(1), 317–331. https://doi.org/10.33395/sinkron.v9i1.13198

Reza, A. A. R., & Muhammad Syaifur Rohman. (2024). Prediction Stunting Analysis Using Random Forest Algorithm and Random Search Optimization. Journal of Informatics and Telecommunication Engineering, 7(2), 534–544. https://doi.org/10.31289/jite.v7i2.10628

Ris, L., Parc, A., Fernanda, M., & Wanderley, B. (2019). O Ptimization for F Eature S Election in Dna. 11(May), 1–17.

Rusdah, D. A., & Murfi, H. (2020). XGBoost in handling missing values for life insurance risk prediction. SN Applied Sciences, 2(8), 1–10. https://doi.org/10.1007/s42452-020-3128-y

Ryan Afrizal, M., Adi Nugroho, R., Kartini, D., Herteno, R., Ahmad Yani Km, J., & Selatan, K. (2021). Xgboost Dengan Random Search Hyper-Parameter Tuning Untuk Klasifikasi Situs Phising. Jurnal Ilmu Komputer, 15(1), 40–47.

Sabilla, W. I., & Bella Vista, C. (2021). Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan. Jurnal Komputer Terapan, 7(2), 329–339. https://doi.org/10.35143/jkt.v7i2.5027

Saputra, A. A., Sari, B. N., Rozikin, C., Singaperbangsa, U., & Abstrak, K. (2024). Penerapan Algoritma Extreme Gradient Boosting (Xgboost) Untuk Analisis Risiko Kredit. Jurnal Ilmiah Wahana Pendidikan, 10(7), 27–36. https://doi.org/10.5281/zenodo.10960080

Sirait, E. M., Silalahi, R., Tambunan, A. A., & Amalia, J. (2024). News Classification Using Bidirectional Long Short Term Memory and GloVe. Sinkron, 9(1), 112–124. https://doi.org/10.33395/sinkron.v9i1.13005

Syukron, M., Santoso, R., & Widiharih, T. (2020). Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data. Jurnal Gaussian, 9(3), 227–236. https://doi.org/10.14710/j.gauss.v9i3.28915

Wicaksono, D. F., Basuki, R. S., & Setiawan, D. (2024). Peningkatan Performa Model Machine Learning XGBoost Classifier melalui Teknik Oversampling dalam Prediksi Penyakit AIDS. Jurnal Media Informatika Budidarma, 8(2), 736–747. https://doi.org/10.30865/mib.v8i2.7501

Downloads


Crossmark Updates

How to Cite

Syahputra, A. A., & Rujianto Eko Saputro. (2024). Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(3), 1920-1931. https://doi.org/10.33395/sinkron.v8i3.13840