Comparative Academic Performance Prediction in Primary Schools Using Linear Regression and Random Forest

Authors

  • Agustinus Sembiring Universitas Pradita
  • Handri Santoso Universitas Pradita

DOI:

10.33395/sinkron.v10i2.15953

Keywords:

Academic Performance Prediction, Educational Data Mining (EDM), Linear Regression, Machine Learning, Random Forest Regression

Abstract

Predicting academic performance is an important aspect of data-driven decision making in education, particularly in primary schools where early identification of learning difficulties is crucial. This study compares the performance of Linear Regression and Random Forest Regression models for predicting students’ academic performance using an Educational Data Mining approach. The experiment uses the Students Performance Dataset from Kaggle, consisting of 1000 student records with eight predictor variables, including demographic and learning-related attributes. The target variable is the average score derived from mathematics, reading, and writing results. Model development and evaluation are conducted using Python in Google Colaboratory. Performance is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²), while Random Forest is further optimized using GridSearchCV with 5-fold cross-validation. The results show that Linear Regression achieves the best performance (R² = 0.162, RMSE = 13.40, MAE = 10.49), outperforming both the default Random Forest (R² ≈ 0.000) and the tuned Random Forest (R² ≈ 0.112). Although the explained variance is relatively low, this finding indicates that simple demographic features provide limited predictive power for academic performance. A case study using a local dataset from a private primary school involving 132 sixth-grade students further confirms that Linear Regression performs more effectively than Random Forest for small and simple educational datasets. These results highlight the importance of aligning model selection with dataset characteristics in educational data mining.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Abro, M., Husain, I., Hassan Zaidi, S. M., Sheikh, F., & Murtaza, G. (2025). A Predictive Model and Performance Evaluation in Mathematics for Primary Education. Journal of Computing and Biomedical Informatics, 9(2). https://www.scopus.com/inward/record.uri?eid=2-s2.0-105027171524&partnerID=40&md5=18424cc099eb6457a06e345b46e99f6a

Ali, J. A., Abdi, M. K., Ali, T. A., Muse, A. H., & Cumar, M. A. (2025). Geographic and school-level disparities as primary predictors of numeracy skills: A supervised machine learning approach of Somaliland’s national learning assessment. Social Sciences and Humanities Open, 12(July), 102305. https://doi.org/10.1016/j.ssaho.2025.102305

Begum, S., & Padmannavar, S. S. (2023). Student Performance Analysis using Bayesian Optimized Random Forest Classifier and KNN. International Journal of Engineering Trends and Technology, 71(5), 132–140. https://doi.org/10.14445/22315381/IJETT-V71I5P213

Bulut, O., Tan, B., Mazzullo, E., & Syed, A. (2025). Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare. Information (Switzerland), 16(6), 1–21. https://doi.org/10.3390/info16060476

Bussaman, S., Nasa-Ngium, P., Nuankaew, W. S., Sararat, T., & Nuankaew, P. (2024). Ensemble Learning Approaches to Strategically Shaping Learner Achievement in Thailand Higher Education. Lecture Notes in Electrical Engineering, 1258, 329 – 339. https://doi.org/10.1007/978-981-97-7356-5_27

Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., & Gulam, S. B. (2025). Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors. Systems and Soft Computing, 7(June). https://doi.org/10.1016/j.sasc.2025.200352

Eriksson, M., Malefors, C., Secondi, L., & Marchetti, S. (2021). Guest attendance data from 34 Swedish pre-schools and primary schools. Data in Brief, 36, 107138. https://doi.org/10.1016/j.dib.2021.107138

Hegde, V., Abhinav, M. R., & Roshin, C. (2023). Predicting Student Placement using PCA and Machine Learning Technique. 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023. https://doi.org/10.1109/ICCCNT56998.2023.10307185

Jabir, B., Hamzaoui, R., Rahali, E. A., & Falih, N. (2025). A machine learning framework for early intervention in e-learning environments. EDPACS, 70(12), 53 – 66. https://doi.org/10.1080/07366981.2025.2515737

Kostopoulos, G., Tsiakmaki, M., & Kotsiantis, S. (2026). Benchmarking Statistical and Deep Generative Models for Privacy-Preserving Synthetic Student Data in Educational Data Mining. Algorithms, 19(1), 39. https://doi.org/10.3390/a19010039

Ling, N. Y., Tin, T. T., Keat, T. C., Khattak, U. F., & Almaiah, M. A. (2024). Educational Big Data Analytics: Machine Learning Based Academic Performance Predictive Modelling. Pakistan Journal of Life and Social Sciences, 22(2), 7442–7477. https://doi.org/10.57239/PJLSS-2024-22.2.00562

Lyu, H., & Xu, K. (2025). A SYSTEMATIC REVIEW OF AI-DRIVEN ANALYTICS IN EDUCATION: MAPPING THE EVIDENCE FOR PREDICTING AND ENHANCING STUDENT SUCCESS. Journal of Environmental Protection and Ecology, 26(7), 2767 – 2778. https://www.scopus.com/inward/record.uri?eid=2-s2.0-105025931874&partnerID=40&md5=a658789dad60ef69a485203a77bbe7b0

Nugraha, F. M., Dewi, K. K., Gunawan, A. A. S., & Tedjasulaksana, J. J. (2025). Leveraging Regression-Based Machine Learning for Predicting Middle School Student Passing Grades. 2025 IEEE International Conference on Artificial Intelligence and Mechatronics Systems, AIMS 2025. https://doi.org/10.1109/AIMS66189.2025.11229636

Poh, Z. X., & Khor, E. T. (2024). Predictive Analytics for Student Online Learning Performance Using Machine Learning and Data Mining Techniques. International Journal on E-Learning: Corporate, Government, Healthcare, and Higher Education, 23(3), 269 – 283. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214129363&partnerID=40&md5=95d98e3926257f20d0c1079327255cd4

Qureshi, R., & Lokhande, P. S. (2024). A Comprehensive Review of Machine Learning techniques used for Designing An Academic Result Predictor And Identifying The Multi-Dimensional Factors Affecting Student’s Academic Results. 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry, IDICAIEI 2024. https://doi.org/10.1109/IDICAIEI61867.2024.10842901

Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3), e1355. https://doi.org/https://doi.org/10.1002/widm.1355

Soares, W. L., Pereira De Carvalho, H. D., Santos, W. B., & Andrade De A. Fagundes, R. (2022). Regression models based in optimized Ensemble of Extreme Learning Machine Networks. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2022-Octob, 1140 – 1146. https://doi.org/10.1109/SMC53654.2022.9945088

Thaher, T., & Jayousi, R. (2020). Prediction of Student’s Academic Performance using Feedforward Neural Network Augmented with Stochastic Trainers. 14th IEEE International Conference on Application of Information and Communication Technologies, AICT 2020 - Proceedings. https://doi.org/10.1109/AICT50176.2020.9368820

Xu, W., & Hoang, V. T. (2021). MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification. Mobile Networks and Applications, 26(1), 191 – 199. https://doi.org/10.1007/s11036-020-01699-w

Downloads


Crossmark Updates

How to Cite

Sembiring, A., & Santoso, H. (2026). Comparative Academic Performance Prediction in Primary Schools Using Linear Regression and Random Forest. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(2), 1104-1113. https://doi.org/10.33395/sinkron.v10i2.15953

Most read articles by the same author(s)

1 2 3 > >>