Optimization of Machine Learning Models in Student Graduation Prediction Systems Using Ensemble Learning with PSO and SMOTE

Authors

  • Hamdani Universitas Sains dan Teknologi Indonesia, Indonesia
  • Susanti Universitas Sains dan Teknologi Indonesia, Indonesia
  • Lathifah Universitas Teknokrat Indonesia, Indonesia
  • M. Khairul Anam Universitas Samudra, Indonesia
  • Rahman Pradipta Universitas Samudra, Indonesia

DOI:

10.33395/sinkron.v9i4.15335

Keywords:

Graduation Prediction, Ensemble Learning, SMOTE, Particle Swarm Optimization, Voting Classifier

Abstract

The timely graduation of students is a key metric in evaluating the academic effectiveness of higher education institutions. However, accurately identifying students at risk of delayed graduation remains challenging due to imbalanced data distributions and the instability of single-model prediction approaches. This study proposes an optimized ensemble-based machine learning system for predicting on-time graduation among university students. The model integrates C4.5, K-Nearest Neighbor (KNN), and Random Forest algorithms through a hard voting classifier, which is further optimized using Particle Swarm Optimization (PSO) to determine the most effective weighting configuration. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is implemented, ensuring balanced representation between timely and delayed graduates. A dataset of 809 student academic records from Universitas Sains dan Teknologi Indonesia (USTI) was used, and performance was evaluated using 5-fold cross-validation. The proposed ensemble model achieved an average accuracy of 93.70%, a precision of 0.94, a recall of 0.93, and an F1-score of 0.94, outperforming each individual classifier. These results confirm that the combination of ensemble learning, PSO-based optimization, and data balancing effectively improves both accuracy and model stability. The findings highlight the system’s potential as a reliable decision-support tool for educational institutions to anticipate delayed graduations and improve academic supervision strategies.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Anam, M. K., Lestari, T. P., Efrizoni, L., Handayani, N. S. & Andhika, I. (2025). Sentiment Analysis Optimization Using Ensemble of Multiple SVM Kernel Functions. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 9(4), 905–914. https://doi.org/10.29207/resti.v9i4.6708

Anam, M. K., Lestari, T. P., Yenni, H., Nasution, T. & Firdaus, M. B. (2025). Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble. ECTI Transactions on Computer and Information Technology (ECTI-CIT), 19(2), 159–167. https://doi.org/10.37936/ecti-cit.2025192.257815

Azis, H., Purnawansyah, P., Fattah, F. & Putri, I. P. (2020). Performa Klasifikasi K-NN dan Cross Validation Pada Data Pasien Pengidap Penyakit Jantung. ILKOM Jurnal Ilmiah, 12(2), 81–86. https://doi.org/10.33096/ilkom.v12i2.507.81-86

Bakri, R., Astuti, N. P. & Ahmar, A. S. (2022). Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education. Journal of Applied Science, Engineering, Technology, and Education, 4(2), 259–265. https://doi.org/10.35877/454ri.asci1581

Chamorro-Atalaya, O., Arévalo-Tuesta, J., Balarezo-Mares, D., Gonzáles-Pacheco, A., Mendoza-León, O., Quipuscoa-Silvestre, M., Tomás-Quispe, G. & Suarez-Bazalar, R. (2023). K-Fold Cross-Validation through Identification of the Opinion Classification Algorithm for the Satisfaction of University Students. International Journal of Online and Biomedical Engineering, 19(11), 140–158. https://doi.org/10.3991/ijoe.v19i11.39887

Chen, L., Sun, X., Li, Y., Jaseemuddin, M. & Kazi, B. U. (2024). Automated Hyperparameter Tuning and Ensemble Machine Learning Approach for Network Traffic Classification. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB, 1–6. https://doi.org/10.1109/BMSB62888.2024.10608236

Chopannejad, S., Roshanpoor, A. & Sadoughi, F. (2024). Attention-assisted hybrid CNN-BILSTM-BiGRU model with SMOTE–Tomek method to detect cardiac arrhythmia based on 12-lead electrocardiogram signals. Digital Health, 10, 1–20. https://doi.org/10.1177/20552076241234624

Co, J. & Casillano, N. F. (2021). Predicting On-time Graduation based on Student Performance in Core Introductory Computing Courses using Decision Tree Algorithm. Jurnal Pendidikan Progresif, 11(3), 650–658. https://doi.org/10.23960/jpp.v11.i3.202116

Dina Amalia Putri, Naza Sefti Prianita & Elkin Rilvani. (2025). Penerapan Metode C4.5 dan K-Nearest Neighbor untuk Klasifikasi Kelulusan Mahasiswa Berdasarkan Data Akademik. Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro Dan Informatika, 3(4), 256–267. https://doi.org/10.61132/jupiter.v3i4.1032

Dwinanda, M. W., Satyahadewi, N. & Andani, W. (2023). Classification of Student Graduation Status Using XGBoost Algorithm. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(3), 1785–1794. https://doi.org/10.30598/barekengvol17iss3pp1785-1794

Erlin, E., Desnelita, Y., Nasution, N., Suryati, L. & Zoromi, F. (2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 677–690. https://doi.org/10.30812/matrik.v21i3.1726

Gupta, V. & Rattan, P. (2023). Improving Twitter Sentiment Analysis Efficiency with SVM-PSO Classification and EFWS Heuristic. Procedia Computer Science, 230, 698–715. https://doi.org/10.1016/j.procs.2023.12.125

Hasibuan, T. H. & Mahdiana, D. (2023). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Algoritma C4.5 Pada Uin Syarif Hidayatullah Jakarta. SKANIKA: Sistem Komputer Dan Teknik Informatika, 6, 61–74. https://doi.org/10.36080/skanika.v6i1.2976

Herianto, Kurniawan, B., Hartomi, Z. H., Irawan, Y. & Anam, M. K. (2024). Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction. Journal of Applied Data Sciences, 5(3), 1272–1285. https://doi.org/10.47738/jads.v5i3.316

Junaidi, S., Anggela, R. V. & Fadhli, I. (2023). Prediksi Kelulusan Tepat Waktu Mahasiswa Menggunakan Metode Data Mining Dengan Algoritma Naïve Bayes. Jurnal Edik Informatika, 9(2), 65–73. https://doi.org/10.22202/ei.2023.v9i2.7324

Junaidi, S., Anggela, R. V. & Kariman, D. (2024). Klasifikasi Metode Data Mining untuk Prediksi Kelulusan Tepat Waktu Mahasiswa dengan Algoritma Naïve Bayes, Random Forest, Support Vector Machine (SVM) dan Artificial Neural Nerwork (ANN). Journal of Applied Computer Science and Technology, 5(1), 109–119. https://doi.org/10.52158/jacost.v5i1.489

Latief, M. A., Nabila, L. R., Miftakhurrahman, W., Ma’rufatullah, S. & Tantyoko, H. (2024). Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification. International Journal of Engineering and Computer Science Applications (IJECSA), 3(1), 11–18. https://doi.org/10.30812/ijecsa.v3i1.3758

Li, H. (2024). Machine Learning-based Voting Classifier for Improving Sentiment Analysis on Twitter Data. Transactions on Computer Science and Intelligent Systems Research, 5, 2960–2238. https://doi.org/10.62051/nfkz3035

Mehta, S. (2023). Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes. International Transactions on Artificial Intelligence (ITALIC), 2(1), 60–75. https://doi.org/10.33050/italic.v2i1.405

Moerdyanto, O. P. & Nuryana, I. K. D. (2023). Prediksi Kelulusan Tepat Waktu Menggunakan Pendekatan Pohon Keputusan Algoritma Decision Tree. Journal of Informatics and Computer Science, 5(1), 90–96. https://doi.org/10.26740/jinacs.v5n01.p90-96

Mubarak, M. M. R., Chrisnanto, Y. H. & Sabrina, P. N. (2023). Implementation of Random Forest Using Smote and Smoteenn in Customer Churn Classification in E-Commerce. Enrichment: Journal of Multidisciplinary Research and Development, 1(8), 463–477. https://doi.org/10.55324/enrichment.v1i8.69

Omotehinwa, T. O. & Oyewola, D. O. (2023). Hyperparameter Optimization of Ensemble Models for Spam Email Detection. Applied Sciences (Switzerland), 13(3), 1–17. https://doi.org/10.3390/app13031971

Pavitha, N. & Sugave, S. (2022). Ensemble Approach with Hyperparameter Tuning for Credit Worthiness Prediction. 2022 IEEE 3rd Global Conference for Advancement in Technology, GCAT 2022, 1–6. https://doi.org/10.1109/GCAT55367.2022.9971879

Pirapong, P. I., Thiradet, T. S. & Sayan, S. K. (2024). Enhancing SVM Classification of Breast Cancer Using Dual-Stage PSO Optimization. ACM International Conference Proceeding Series, 153–157. https://doi.org/10.1145/3674658.3674683

Prayitno, J., Saputra, B. & Waluyo, R. (2021). Data Mining Implementation with Algorithm C4.5 for Predicting Graduation Rate College studentid 2 * corresponding author. Journal of Applied Data Sciences, 2(3), 74–83. https://doi.org/10.47738/jads.v2i3.37

Prayoga, I., Dwifebri p, M. & Adiwijaya. (2023). Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of Chi-Square Feature Selection. Jurnal Media Informatika Budidarma, 7(1), 369–375. https://doi.org/10.30865/mib.v7i1.5522

Putra, M. & Erwin Harahap. (2024). Machine Learning pada Prediksi Kelulusan Mahasiswa Menggunakan Algoritma Random Forest. Jurnal Riset Matematika, 4(2), 127–136. https://doi.org/10.29313/jrm.v4i2.5102

Rachardian, S. & Sediyono, E. (2024). Prediksi kelulusan tepat waktu mahasiswa untuk pemantauan program studi menggunakan metode data mining. AITI: Jurnal Teknologi Informasi, 21(2), 168–182. https://doi.org/10.24246/aiti.v21i2.168-182

Riadi, I., Umar, R. & Anggara, R. (2024). Prediksi Kelulusan Tepat Waktu Berdasarkan Riwayat Akademik Menggunakan Metode K-Nearest Neighbor. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(2), 249–256. https://doi.org/10.25126/jtiik.20241127330

Saputra, A., Arita Fitri, T., Karpen & Susanti. (2023). Penerapan Data Mining Algortima C4.5 Dalam Memprediksi Predikat Kelulusan Mahasiswa Di Politeknik Kampar. SATIN-Sains Dan Teknologi Informasi, 9, 149–157. https://doi.org/10.33372/stn.v9i1.990

Sari, J. S. I., Umar, E. & Momo, L. L. (2024). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Naïve Bayes Dan Decision Tree Pada Universitas Stella Maris Sumba. Journal Of Informatics And Busisnes, 3(2), 362–368.

Suandi, F., Anam, M. K., Firdaus, M. B., Fadli, S., Lathifah, L., Yumami, E., Saleh, A. & Hasibuan, A. Z. (2024). Enhancing Sentiment Analysis Performance Using SMOTE and Majority Voting in Machine Learning Algorithms. International Conference on Applied Engineering, 126–138. https://doi.org/10.2991/978-94-6463-620-8_10

Susanto, N. W. & Suparwito, H. (2023). SVM-PSO Algorithm for Tweet Sentiment Analysis #BesokSenin. Indonesian Journal of Information Systems (IJIS), 6(1), 36–47. https://doi.org/10.24002/ijis.v6i1.7551

Van FC, L. L., Anam, M. K., Bukhori, S., Mahamad, A. K., Saon, S. & Nyoto, R. L. V. (2025). The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection. Journal of Applied Data Sciences, 6(1), 71–85. https://doi.org/10.47738/jads.v6i1.416

Wahyudi, A., Kusrini & Wibowo, F. W. (2023). Predicting On-Time Graduation Of Students Using Decision Tree And Naïve Bayes Methods. 14(2), 132–138. https://doi.org/10.59737/jpi.v14i2.276

Yin, J. & Li, N. (2022). Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping. Ore Geology Reviews, 145, 1–19. https://doi.org/10.1016/j.oregeorev.2022.104916

Downloads


Crossmark Updates

How to Cite

Hamdani, H., Susanti, S., Lathifah, L., Anam, M. K., & Pradipta, R. (2025). Optimization of Machine Learning Models in Student Graduation Prediction Systems Using Ensemble Learning with PSO and SMOTE . Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 3149-3158. https://doi.org/10.33395/sinkron.v9i4.15335