Hybrid Deep Learning and USE Algorithm for Essay Scoring: Accuracy and Performance Analysis
DOI:
10.33395/sinkron.v9i2.14784Keywords:
Automated Essay Assessment; Cosine Similarity; Universal Sentence Encoder; Online Learning.Abstract
The main challenge in digital education, particularly in the automatic assessment of essay answers in online learning systems, lies in the complexity of natural language understanding and semantic evaluation required to achieve the level of precision equivalent to human judgment. This study aims to develop and analyze the performance of a hybrid model that combines deep learning with a semantic similarity-based approach to essay auto-grading. The methods used include the collection of essay answer data from various disciplines, text processing to extract semantic representations, and the calculation of the degree of similarity between the participant's answers and the answer key using the similarity measure. The evaluation was carried out by comparing the results of automatic assessments with manual assessments by teachers. The results showed that the developed model achieved the highest accuracy level of 90.22% at 0.8 treshold, with a precision of 84.63%, a recall of 100%, and an F1 score of 91.68%. To strengthen the reliability of the findings, statistical validation was carried out using error evaluation metrics. RMSE value is 0.32 and RMAE value is 0.19. These findings show that the model is able to mimic human judgment reliably and consistently, and can distinguish linguistic variations that arise in different types of essay questions. This system offers an effective solution for the automation of assessments in an online learning environment, while maintaining the integrity and objectivity of the evaluation.
Downloads
References
Ahmad, I. F., Setiawati, F. A., Prihatin, R. P., Fitriyah, Q. F., & Thontowi, Z. S. (2024). Technology-Based Learning Effect on the Learning Outcomes of Indonesian Students: A Meta-Analysis. International Journal of Evaluation and Research in Education (Ijere). https://doi.org/10.11591/ijere.v13i2.25383
Amalia, E. L., Jumadi, A. J., Mashudi, I. A., & Wibowo, D. W. (2021). Analisis Metode Cosine Similarity Pada Aplikasi Ujian Online Otomatis (Studi Kasus JTI POLINEMA). Jurnal Teknologi Informasi Dan Ilmu Komputer. https://doi.org/10.25126/jtiik.2021824356
Arfandy, H., & Musdar, I. A. (2020). Rancang Bangun Sistem Cerdas Pemberian Nilai Otomatis Untuk Ujian Essai Menggunakan Algoritma Cosine Similarity. Inspiration Jurnal Teknologi Informasi Dan Komunikasi, 10(2), 123. https://doi.org/10.35585/inspir.v10i2.2580
Aulianda, N., Wijayati, P. H., Ebner, M., & Schön, S. (2023). Analysis of Learning Management System Towards Students’ Cognitive Learning Outcome. International Journal of Emerging Technologies in Learning (Ijet). https://doi.org/10.3991/ijet.v18i23.36443
Beseiso, M., & Alzahrani, S. (2020). An Empirical Analysis of BERT Embedding for Automated Essay Scoring. In International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/ijacsa.2020.0111027
Birla, N., Kumar Jain, M., & Panwar, A. (2022). Automated assessment of subjective assignments: A hybrid approach. Expert Systems with Applications, 203, 117315. https://doi.org/10.1016/j.eswa.2022.117315
Bohara, B. (2020). Adaptive Threshold for Online Object Recognition and Re-identification Tasks. http://arxiv.org/abs/2012.14305
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. St., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., & Kurzweil, R. (2018). Universal Sentence Encoder. http://arxiv.org/abs/1803.11175
Chamidah, N., Yulianti, E., & Budi, I. (2023). Evaluating the Impact of Sentence Tokenization on Indonesian Automated Essay Scoring Using Pretrained Sentence Embeddings. Revue d’Intelligence Artificielle, 37(5), 1101–1108. https://doi.org/10.18280/ria.370502
Chansanam, W., Poonpon, K., Manakul, T., & Detthamrong, U. (2021). Success and Challenges in MOOCs: A Literature Systematic Review Technique. TEM Journal, 1728–1732. https://doi.org/10.18421/TEM104-32
Fajari, A. N., & A. Baizal, Z. K. (2022). Chatbot-Based Culinary Tourism Recommender System Using Named Entity Recognition. Jipi (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika). https://doi.org/10.29100/jipi.v7i4.3210
Hussein, M., Hassan, H., & Nassef, M. (2019). Automated Language Essay Scoring Systems: A Literature Review. In Peerj Computer Science. https://doi.org/10.7717/peerj-cs.208
Koseoglu, B., Traverso, L., Topiwalla, M., Kraev, E., & Szopory, Z. (2024). OTLP: Output Thresholding Using Mixed Integer Linear Programming. http://arxiv.org/abs/2405.11230
Kumar, V., & Boulanger, D. (2021). Automated Essay Scoring and the Deep Learning Black Box: How Are Rubric Scores Determined? International Journal of Artificial Intelligence in Education, 31(3), 538–584. https://doi.org/10.1007/S40593-020-00211-5
Kumari, P., & Seeja, K. R. (2021). A novel periocular biometrics solution for authentication during Covid-19 pandemic situation. Journal of Ambient Intelligence and Humanized Computing, 12(11), 1–17. https://doi.org/10.1007/S12652-020-02814-1
Lahitani, A. R. (2022). Automated Essay Scoring Menggunakan Cosine Similarity Pada Penilaian Esai Multi Soal. Jurnal Kajian Ilmiah. https://doi.org/10.31599/jki.v22i2.1121
Mahdi, H. S., & Alkhateeb, A. A. (2025). Revolutionising Essay Evaluation. International Journal of Computer-Assisted Language Learning and Teaching. https://doi.org/10.4018/ijcallt.368226
Mirda, M., Sarjan, M., & Khairat, U. (2022). Aplikasi Ujian Essay Koreksi Otomatis Menggunkan Metode Cosine Similarity. Journal Peqguruang Conference Series. https://doi.org/10.35329/jp.v4i1.2344
Mujianto, A. H., Aswin, I. B., & Susanto, E. S. (2024). Implementasi Algoritma Cosine Similarity Untuk Koreksi Jawaban Ujian Essay Berbasis Website. Jurnal Informatika Teknologi Dan Sains. https://doi.org/10.51401/jinteks.v6i3.4214
Pradani, K. A., & Suadaa, L. H. (2023). Automated Essay Scoring Menggunakan Semantic Textual Similarity Berbasis Transformer Untuk Penilaian Ujian Esai. Jurnal Teknologi Informasi Dan Ilmu Komputer, 10(6), 1177–1184. https://doi.org/10.25126/jtiik.2023107338
Shin, J., & Gierl, M. J. (2021). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing, 38(2), 247–272. https://doi.org/10.1177/0265532220937830
Sitarz, M. (2022). Extending F1 metric, probabilistic approach. https://doi.org/10.54364/AAIML.2023.1161
Slimi, Z., & Villarejo-Carballido, B. (2023). Systematic Review: AI’s Impact on Higher Education - Learning, Teaching, and Career Opportunities. Tem Journal. https://doi.org/10.18421/tem123-44
Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the Use of Bert for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3416–3425. https://doi.org/10.18653/v1/2022.naacl-main.249
Wong, W. S., & Bong, C. H. (2021). Assessing Malaysian University English Test (MUET) Essay on Language and Semantic Features Using Intelligent Essay Grader (IEG). Pertanika Journal of Science and Technology. https://doi.org/10.47836/pjst.29.2.12
Zhang, C., Li, Q., & Cheng, X. (2020). Text Sentiment Classification Based on Feature Fusion. In Revue D Intelligence Artificielle. https://doi.org/10.18280/ria.340418
Zhang, Y., Lin, C., & Chi, M. (2020). Going deeper: Automatic short-answer grading by combining student and question models. User Modeling and User-Adapted Interaction, 30(1), 51–80. https://doi.org/10.1007/s11257-019-09251-6
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Agus Sriyanto, Kusrini

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit




















