Optimization of Machine Learning Models in Student Graduation Prediction Systems Using Ensemble Learning with PSO and SMOTE
DOI:
10.33395/sinkron.v9i4.15335Keywords:
Graduation Prediction, Ensemble Learning, SMOTE, Particle Swarm Optimization, Voting ClassifierAbstract
The timely graduation of students is a key metric in evaluating the academic effectiveness of higher education institutions. However, accurately identifying students at risk of delayed graduation remains challenging due to imbalanced data distributions and the instability of single-model prediction approaches. This study proposes an optimized ensemble-based machine learning system for predicting on-time graduation among university students. The model integrates C4.5, K-Nearest Neighbor (KNN), and Random Forest algorithms through a hard voting classifier, which is further optimized using Particle Swarm Optimization (PSO) to determine the most effective weighting configuration. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is implemented, ensuring balanced representation between timely and delayed graduates. A dataset of 809 student academic records from Universitas Sains dan Teknologi Indonesia (USTI) was used, and performance was evaluated using 5-fold cross-validation. The proposed ensemble model achieved an average accuracy of 93.70%, a precision of 0.94, a recall of 0.93, and an F1-score of 0.94, outperforming each individual classifier. These results confirm that the combination of ensemble learning, PSO-based optimization, and data balancing effectively improves both accuracy and model stability. The findings highlight the system’s potential as a reliable decision-support tool for educational institutions to anticipate delayed graduations and improve academic supervision strategies.
Downloads
References
Anam, M. K., Lestari, T. P., Efrizoni, L., Handayani, N. S. & Andhika, I. (2025). Sentiment Analysis Optimization Using Ensemble of Multiple SVM Kernel Functions. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 9(4), 905–914. https://doi.org/10.29207/resti.v9i4.6708
Anam, M. K., Lestari, T. P., Yenni, H., Nasution, T. & Firdaus, M. B. (2025). Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble. ECTI Transactions on Computer and Information Technology (ECTI-CIT), 19(2), 159–167. https://doi.org/10.37936/ecti-cit.2025192.257815
Azis, H., Purnawansyah, P., Fattah, F. & Putri, I. P. (2020). Performa Klasifikasi K-NN dan Cross Validation Pada Data Pasien Pengidap Penyakit Jantung. ILKOM Jurnal Ilmiah, 12(2), 81–86. https://doi.org/10.33096/ilkom.v12i2.507.81-86
Bakri, R., Astuti, N. P. & Ahmar, A. S. (2022). Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education. Journal of Applied Science, Engineering, Technology, and Education, 4(2), 259–265. https://doi.org/10.35877/454ri.asci1581
Chamorro-Atalaya, O., Arévalo-Tuesta, J., Balarezo-Mares, D., Gonzáles-Pacheco, A., Mendoza-León, O., Quipuscoa-Silvestre, M., Tomás-Quispe, G. & Suarez-Bazalar, R. (2023). K-Fold Cross-Validation through Identification of the Opinion Classification Algorithm for the Satisfaction of University Students. International Journal of Online and Biomedical Engineering, 19(11), 140–158. https://doi.org/10.3991/ijoe.v19i11.39887
Chen, L., Sun, X., Li, Y., Jaseemuddin, M. & Kazi, B. U. (2024). Automated Hyperparameter Tuning and Ensemble Machine Learning Approach for Network Traffic Classification. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB, 1–6. https://doi.org/10.1109/BMSB62888.2024.10608236
Chopannejad, S., Roshanpoor, A. & Sadoughi, F. (2024). Attention-assisted hybrid CNN-BILSTM-BiGRU model with SMOTE–Tomek method to detect cardiac arrhythmia based on 12-lead electrocardiogram signals. Digital Health, 10, 1–20. https://doi.org/10.1177/20552076241234624
Co, J. & Casillano, N. F. (2021). Predicting On-time Graduation based on Student Performance in Core Introductory Computing Courses using Decision Tree Algorithm. Jurnal Pendidikan Progresif, 11(3), 650–658. https://doi.org/10.23960/jpp.v11.i3.202116
Dina Amalia Putri, Naza Sefti Prianita & Elkin Rilvani. (2025). Penerapan Metode C4.5 dan K-Nearest Neighbor untuk Klasifikasi Kelulusan Mahasiswa Berdasarkan Data Akademik. Jupiter: Publikasi Ilmu Keteknikan Industri, Teknik Elektro Dan Informatika, 3(4), 256–267. https://doi.org/10.61132/jupiter.v3i4.1032
Dwinanda, M. W., Satyahadewi, N. & Andani, W. (2023). Classification of Student Graduation Status Using XGBoost Algorithm. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(3), 1785–1794. https://doi.org/10.30598/barekengvol17iss3pp1785-1794
Erlin, E., Desnelita, Y., Nasution, N., Suryati, L. & Zoromi, F. (2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 677–690. https://doi.org/10.30812/matrik.v21i3.1726
Gupta, V. & Rattan, P. (2023). Improving Twitter Sentiment Analysis Efficiency with SVM-PSO Classification and EFWS Heuristic. Procedia Computer Science, 230, 698–715. https://doi.org/10.1016/j.procs.2023.12.125
Hasibuan, T. H. & Mahdiana, D. (2023). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Algoritma C4.5 Pada Uin Syarif Hidayatullah Jakarta. SKANIKA: Sistem Komputer Dan Teknik Informatika, 6, 61–74. https://doi.org/10.36080/skanika.v6i1.2976
Herianto, Kurniawan, B., Hartomi, Z. H., Irawan, Y. & Anam, M. K. (2024). Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction. Journal of Applied Data Sciences, 5(3), 1272–1285. https://doi.org/10.47738/jads.v5i3.316
Junaidi, S., Anggela, R. V. & Fadhli, I. (2023). Prediksi Kelulusan Tepat Waktu Mahasiswa Menggunakan Metode Data Mining Dengan Algoritma Naïve Bayes. Jurnal Edik Informatika, 9(2), 65–73. https://doi.org/10.22202/ei.2023.v9i2.7324
Junaidi, S., Anggela, R. V. & Kariman, D. (2024). Klasifikasi Metode Data Mining untuk Prediksi Kelulusan Tepat Waktu Mahasiswa dengan Algoritma Naïve Bayes, Random Forest, Support Vector Machine (SVM) dan Artificial Neural Nerwork (ANN). Journal of Applied Computer Science and Technology, 5(1), 109–119. https://doi.org/10.52158/jacost.v5i1.489
Latief, M. A., Nabila, L. R., Miftakhurrahman, W., Ma’rufatullah, S. & Tantyoko, H. (2024). Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification. International Journal of Engineering and Computer Science Applications (IJECSA), 3(1), 11–18. https://doi.org/10.30812/ijecsa.v3i1.3758
Li, H. (2024). Machine Learning-based Voting Classifier for Improving Sentiment Analysis on Twitter Data. Transactions on Computer Science and Intelligent Systems Research, 5, 2960–2238. https://doi.org/10.62051/nfkz3035
Mehta, S. (2023). Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes. International Transactions on Artificial Intelligence (ITALIC), 2(1), 60–75. https://doi.org/10.33050/italic.v2i1.405
Moerdyanto, O. P. & Nuryana, I. K. D. (2023). Prediksi Kelulusan Tepat Waktu Menggunakan Pendekatan Pohon Keputusan Algoritma Decision Tree. Journal of Informatics and Computer Science, 5(1), 90–96. https://doi.org/10.26740/jinacs.v5n01.p90-96
Mubarak, M. M. R., Chrisnanto, Y. H. & Sabrina, P. N. (2023). Implementation of Random Forest Using Smote and Smoteenn in Customer Churn Classification in E-Commerce. Enrichment: Journal of Multidisciplinary Research and Development, 1(8), 463–477. https://doi.org/10.55324/enrichment.v1i8.69
Omotehinwa, T. O. & Oyewola, D. O. (2023). Hyperparameter Optimization of Ensemble Models for Spam Email Detection. Applied Sciences (Switzerland), 13(3), 1–17. https://doi.org/10.3390/app13031971
Pavitha, N. & Sugave, S. (2022). Ensemble Approach with Hyperparameter Tuning for Credit Worthiness Prediction. 2022 IEEE 3rd Global Conference for Advancement in Technology, GCAT 2022, 1–6. https://doi.org/10.1109/GCAT55367.2022.9971879
Pirapong, P. I., Thiradet, T. S. & Sayan, S. K. (2024). Enhancing SVM Classification of Breast Cancer Using Dual-Stage PSO Optimization. ACM International Conference Proceeding Series, 153–157. https://doi.org/10.1145/3674658.3674683
Prayitno, J., Saputra, B. & Waluyo, R. (2021). Data Mining Implementation with Algorithm C4.5 for Predicting Graduation Rate College studentid 2 * corresponding author. Journal of Applied Data Sciences, 2(3), 74–83. https://doi.org/10.47738/jads.v2i3.37
Prayoga, I., Dwifebri p, M. & Adiwijaya. (2023). Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of Chi-Square Feature Selection. Jurnal Media Informatika Budidarma, 7(1), 369–375. https://doi.org/10.30865/mib.v7i1.5522
Putra, M. & Erwin Harahap. (2024). Machine Learning pada Prediksi Kelulusan Mahasiswa Menggunakan Algoritma Random Forest. Jurnal Riset Matematika, 4(2), 127–136. https://doi.org/10.29313/jrm.v4i2.5102
Rachardian, S. & Sediyono, E. (2024). Prediksi kelulusan tepat waktu mahasiswa untuk pemantauan program studi menggunakan metode data mining. AITI: Jurnal Teknologi Informasi, 21(2), 168–182. https://doi.org/10.24246/aiti.v21i2.168-182
Riadi, I., Umar, R. & Anggara, R. (2024). Prediksi Kelulusan Tepat Waktu Berdasarkan Riwayat Akademik Menggunakan Metode K-Nearest Neighbor. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(2), 249–256. https://doi.org/10.25126/jtiik.20241127330
Saputra, A., Arita Fitri, T., Karpen & Susanti. (2023). Penerapan Data Mining Algortima C4.5 Dalam Memprediksi Predikat Kelulusan Mahasiswa Di Politeknik Kampar. SATIN-Sains Dan Teknologi Informasi, 9, 149–157. https://doi.org/10.33372/stn.v9i1.990
Sari, J. S. I., Umar, E. & Momo, L. L. (2024). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Naïve Bayes Dan Decision Tree Pada Universitas Stella Maris Sumba. Journal Of Informatics And Busisnes, 3(2), 362–368.
Suandi, F., Anam, M. K., Firdaus, M. B., Fadli, S., Lathifah, L., Yumami, E., Saleh, A. & Hasibuan, A. Z. (2024). Enhancing Sentiment Analysis Performance Using SMOTE and Majority Voting in Machine Learning Algorithms. International Conference on Applied Engineering, 126–138. https://doi.org/10.2991/978-94-6463-620-8_10
Susanto, N. W. & Suparwito, H. (2023). SVM-PSO Algorithm for Tweet Sentiment Analysis #BesokSenin. Indonesian Journal of Information Systems (IJIS), 6(1), 36–47. https://doi.org/10.24002/ijis.v6i1.7551
Van FC, L. L., Anam, M. K., Bukhori, S., Mahamad, A. K., Saon, S. & Nyoto, R. L. V. (2025). The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection. Journal of Applied Data Sciences, 6(1), 71–85. https://doi.org/10.47738/jads.v6i1.416
Wahyudi, A., Kusrini & Wibowo, F. W. (2023). Predicting On-Time Graduation Of Students Using Decision Tree And Naïve Bayes Methods. 14(2), 132–138. https://doi.org/10.59737/jpi.v14i2.276
Yin, J. & Li, N. (2022). Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping. Ore Geology Reviews, 145, 1–19. https://doi.org/10.1016/j.oregeorev.2022.104916
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Hamdani, Susanti, Lathifah, M. Khairul Anam, Rahman Pradipta

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit




















