Comparative Academic Performance Prediction in Primary Schools Using Linear Regression and Random Forest
DOI:
10.33395/sinkron.v10i2.15953Keywords:
Academic Performance Prediction, Educational Data Mining (EDM), Linear Regression, Machine Learning, Random Forest RegressionAbstract
Predicting academic performance is an important aspect of data-driven decision making in education, particularly in primary schools where early identification of learning difficulties is crucial. This study compares the performance of Linear Regression and Random Forest Regression models for predicting students’ academic performance using an Educational Data Mining approach. The experiment uses the Students Performance Dataset from Kaggle, consisting of 1000 student records with eight predictor variables, including demographic and learning-related attributes. The target variable is the average score derived from mathematics, reading, and writing results. Model development and evaluation are conducted using Python in Google Colaboratory. Performance is assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²), while Random Forest is further optimized using GridSearchCV with 5-fold cross-validation. The results show that Linear Regression achieves the best performance (R² = 0.162, RMSE = 13.40, MAE = 10.49), outperforming both the default Random Forest (R² ≈ 0.000) and the tuned Random Forest (R² ≈ 0.112). Although the explained variance is relatively low, this finding indicates that simple demographic features provide limited predictive power for academic performance. A case study using a local dataset from a private primary school involving 132 sixth-grade students further confirms that Linear Regression performs more effectively than Random Forest for small and simple educational datasets. These results highlight the importance of aligning model selection with dataset characteristics in educational data mining.
Downloads
References
Abro, M., Husain, I., Hassan Zaidi, S. M., Sheikh, F., & Murtaza, G. (2025). A Predictive Model and Performance Evaluation in Mathematics for Primary Education. Journal of Computing and Biomedical Informatics, 9(2). https://www.scopus.com/inward/record.uri?eid=2-s2.0-105027171524&partnerID=40&md5=18424cc099eb6457a06e345b46e99f6a
Ali, J. A., Abdi, M. K., Ali, T. A., Muse, A. H., & Cumar, M. A. (2025). Geographic and school-level disparities as primary predictors of numeracy skills: A supervised machine learning approach of Somaliland’s national learning assessment. Social Sciences and Humanities Open, 12(July), 102305. https://doi.org/10.1016/j.ssaho.2025.102305
Begum, S., & Padmannavar, S. S. (2023). Student Performance Analysis using Bayesian Optimized Random Forest Classifier and KNN. International Journal of Engineering Trends and Technology, 71(5), 132–140. https://doi.org/10.14445/22315381/IJETT-V71I5P213
Bulut, O., Tan, B., Mazzullo, E., & Syed, A. (2025). Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare. Information (Switzerland), 16(6), 1–21. https://doi.org/10.3390/info16060476
Bussaman, S., Nasa-Ngium, P., Nuankaew, W. S., Sararat, T., & Nuankaew, P. (2024). Ensemble Learning Approaches to Strategically Shaping Learner Achievement in Thailand Higher Education. Lecture Notes in Electrical Engineering, 1258, 329 – 339. https://doi.org/10.1007/978-981-97-7356-5_27
Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., & Gulam, S. B. (2025). Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors. Systems and Soft Computing, 7(June). https://doi.org/10.1016/j.sasc.2025.200352
Eriksson, M., Malefors, C., Secondi, L., & Marchetti, S. (2021). Guest attendance data from 34 Swedish pre-schools and primary schools. Data in Brief, 36, 107138. https://doi.org/10.1016/j.dib.2021.107138
Hegde, V., Abhinav, M. R., & Roshin, C. (2023). Predicting Student Placement using PCA and Machine Learning Technique. 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023. https://doi.org/10.1109/ICCCNT56998.2023.10307185
Jabir, B., Hamzaoui, R., Rahali, E. A., & Falih, N. (2025). A machine learning framework for early intervention in e-learning environments. EDPACS, 70(12), 53 – 66. https://doi.org/10.1080/07366981.2025.2515737
Kostopoulos, G., Tsiakmaki, M., & Kotsiantis, S. (2026). Benchmarking Statistical and Deep Generative Models for Privacy-Preserving Synthetic Student Data in Educational Data Mining. Algorithms, 19(1), 39. https://doi.org/10.3390/a19010039
Ling, N. Y., Tin, T. T., Keat, T. C., Khattak, U. F., & Almaiah, M. A. (2024). Educational Big Data Analytics: Machine Learning Based Academic Performance Predictive Modelling. Pakistan Journal of Life and Social Sciences, 22(2), 7442–7477. https://doi.org/10.57239/PJLSS-2024-22.2.00562
Lyu, H., & Xu, K. (2025). A SYSTEMATIC REVIEW OF AI-DRIVEN ANALYTICS IN EDUCATION: MAPPING THE EVIDENCE FOR PREDICTING AND ENHANCING STUDENT SUCCESS. Journal of Environmental Protection and Ecology, 26(7), 2767 – 2778. https://www.scopus.com/inward/record.uri?eid=2-s2.0-105025931874&partnerID=40&md5=a658789dad60ef69a485203a77bbe7b0
Nugraha, F. M., Dewi, K. K., Gunawan, A. A. S., & Tedjasulaksana, J. J. (2025). Leveraging Regression-Based Machine Learning for Predicting Middle School Student Passing Grades. 2025 IEEE International Conference on Artificial Intelligence and Mechatronics Systems, AIMS 2025. https://doi.org/10.1109/AIMS66189.2025.11229636
Poh, Z. X., & Khor, E. T. (2024). Predictive Analytics for Student Online Learning Performance Using Machine Learning and Data Mining Techniques. International Journal on E-Learning: Corporate, Government, Healthcare, and Higher Education, 23(3), 269 – 283. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214129363&partnerID=40&md5=95d98e3926257f20d0c1079327255cd4
Qureshi, R., & Lokhande, P. S. (2024). A Comprehensive Review of Machine Learning techniques used for Designing An Academic Result Predictor And Identifying The Multi-Dimensional Factors Affecting Student’s Academic Results. 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry, IDICAIEI 2024. https://doi.org/10.1109/IDICAIEI61867.2024.10842901
Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3), e1355. https://doi.org/https://doi.org/10.1002/widm.1355
Soares, W. L., Pereira De Carvalho, H. D., Santos, W. B., & Andrade De A. Fagundes, R. (2022). Regression models based in optimized Ensemble of Extreme Learning Machine Networks. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2022-Octob, 1140 – 1146. https://doi.org/10.1109/SMC53654.2022.9945088
Thaher, T., & Jayousi, R. (2020). Prediction of Student’s Academic Performance using Feedforward Neural Network Augmented with Stochastic Trainers. 14th IEEE International Conference on Application of Information and Communication Technologies, AICT 2020 - Proceedings. https://doi.org/10.1109/AICT50176.2020.9368820
Xu, W., & Hoang, V. T. (2021). MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification. Mobile Networks and Applications, 26(1), 191 – 199. https://doi.org/10.1007/s11036-020-01699-w
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Agustinus Sembiring, Handri Santoso

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
