Enhancing Entity Extraction in E-Government Complaint Data using LDA-Assisted NER
DOI:
10.33395/sinkron.v9i4.15292Keywords:
Named Entity Recognition, Latent Dirichlet Allocation, Text Mining, Public Services, E-GovernmentAbstract
With the rapid development of information technology, governments are increasingly challenged to provide digital channels that enhance public participation in governance. LaporGub, an official platform managed by the Central Java Provincial Government, accommodates citizens' aspirations and complaints, but faces challenges in processing large amounts of unstructured text. Manual analysis is time-consuming and error-prone, resulting in delayed responses and decreased service quality. Conventional Named Entity Recognition (NER) models struggle to handle informal Indonesian-language text, while transformer-based approaches require substantial computing resources that are not widely available in local government environments. Therefore, this study aims to develop a lightweight NER approach by integrating Latent Dirichlet Allocation (LDA) as a semantic pre-annotation tool to improve the accuracy of entity extraction in Indonesian e-government complaint data. To achieve this goal, a dataset of 53,858 complaint reports from the LaporGub platform (2022–2025) was processed using LDA topic modeling (k=10) to provide semantic context during annotation. Next, the enriched dataset was used to train a spaCy-based NER model targeting three entity types: LOCATION, ORGANIZATION, and PERSON, with a training-validation-test split ratio of 70:15:15 using stratified sampling. The evaluation showed that the proposed NER+LDA model achieved a precision of 90.03%, a recall of 81.86%, and an F1-score of 85.75%, representing improvements of +5.78, +2.55, and +4.04, respectively, compared to the baseline NER model (F1-score: 81.71%). Furthermore, the most significant improvements occurred in the detection of ORGANIZATION and PERSON entities. These findings confirm that the integration of LDA as a pre-annotation strategy effectively improves NER performance on informal complaint texts in Indonesia, thus offering a practical and resource-efficient alternative to transformer-based methods for e-government applications.
Downloads
References
Aditama, A. R., & Wicaksono, A. F. (2025). Classification of customer complaints on social media for e-commerce in Indonesia. International Journal of Electrical and Computer Engineering (IJECE), 15(3), 2977–2985. https://doi.org/10.11591/ijece.v15i3.pp2977-2985
Azzahra, M. D. (2024). Analisis Implementasi Chatbot Sebagai Sarana Komunikasi dan Efisiensi Layanan Pelanggan Terhadap Peningkatan Kinerja PT Pelindo Terminal Petikemas Semarang [Thesis, Universitas Islam Indonesia]. https://dspace.uii.ac.id/handle/123456789/51150
Budi, I., & Suryono, R. R. (2023). Application of named entity recognition method for Indonesian datasets: A review. Bulletin of Electrical Engineering and Informatics, 12(2), 969–978. https://doi.org/10.11591/eei.v12i2.4529
Cahyo, P. W., Aesyi, U. S., Setianto, W. A., & Sulaiman, T. (2025). A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election. International Journal of Information Management Data Insights, 5(2), 100354. https://doi.org/10.1016/j.jjimei.2025.100354
Choirinnisa, D., Alzami, F., Indrayani, H., Rohmani, A., Nugraini, S. H., Zulfiningrumi, R., & Susanti, F. (2025). LDA Topic Modeling: Twitter-Based Public Opinion on Indonesian Ministry of Finance. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(2), 849–863. https://doi.org/10.33395/sinkron.v9i2.14719
Gangadharan, V., & Gupta, D. (2020). Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques. Procedia Computer Science, 171, 1337–1345. https://doi.org/10.1016/j.procs.2020.04.143
Jelita, M. (2024). Text Mining dengan Topic Modelling LDA dari Pertanyaan Gelar Wicara Literasi Perpustakaan Nasional RI. Media Pustakawan, 31(3), 253–265. https://doi.org/10.37014/medpus.v31i3.5237
Khadija, M. A., & Nurharjadmo, W. (2023). Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings. SINERGI, 28(1), 153–162. https://doi.org/10.22441/sinergi.2024.1.015
Kusumawardani, R. P., & Kusumawati, K. N. (2024). Named entity recognition in the medical domain for Indonesian language health consultation services using bidirectional-lstm-crf algorithm. Procedia Computer Science, 245, 1146–1156. https://doi.org/10.1016/j.procs.2024.10.344
Li, J., Sun, A., Han, J., & Li, C. (2022). A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1), 50–70. https://doi.org/10.1109/TKDE.2020.2981314
Muhammad, F., Maghfur, N. M., & Voutama, A. (2022). Sentiment Analysis Dataset On COVID-19 Variant News: Kumpulan Data Analisis Sentimen pada Berita Varian COVID-19. Systematics, 4(1), 382–391. https://doi.org/10.35706/sys.v4i1.6347
Nursyahrina, Defit, S., & Sovia, R. (2024). Metode BERTopic dan LDA untuk Analisis Tren Penelitian Bidang Ilmu Komputer. Jurnal KomtekInfo, 332–341. https://doi.org/10.35134/komtekinfo.v11i4.580
Pardede, J., & Darmawan, D. (2025). Perbandingan Algoritma Stemming Porter, Sastrawi, Idris, Dan Arifin & Setiono Pada Dokumen Teks Bahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(1), 69–76. https://doi.org/10.25126/jtiik.2025128860
Reddy, S. K., Sheshadri, S. K., Avatapalli, K. L., & Gupta, D. (2025). Empirical Study on Efficiency of Different Language Modeling Techniques using Masking of Named Entities for Indic Languages. Procedia Computer Science, 258, 146–159. https://doi.org/10.1016/j.procs.2025.04.228
Sakir, A. R. (2024). Tinjauan Literatur: Pemanfaatan Teknologi Informasi untuk Meningkatkan Mutu Pelayanan Publik. Jurnal Administrasi Publik dan Bisnis, 6(2), 165–171. https://doi.org/10.36917/japabis.v6i2.170
Shidik, G. F., Saputra, F. O., Saraswati, G. W., Winarsih, N. A. S., Rohman, M. S., Pramunendar, R. A., Kusuma, E. J., Ratmana, D. O., Venus, V., Andono, P. N., & Hasibuan, Z. A. (2024). Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM). Journal of Open Innovation: Technology, Market, and Complexity, 10(3), 100358. https://doi.org/10.1016/j.joitmc.2024.100358
Srivastava, S., Paul, B., & Gupta, D. (2023). Study of Word Embeddings for Enhanced Cyber Security Named Entity Recognition. Procedia Computer Science, 218, 449–460. https://doi.org/10.1016/j.procs.2023.01.027
Wafda, A. (2025). Aspect-Based Sentiment Analysis terhadap Cuitan Platform X tentang Kurikulum Merdeka Menggunakan IndoBERT [Thesis, Universitas Islam Indonesia]. https://dspace.uii.ac.id/handle/123456789/55157
Yanti, R. M., Santoso, I., & Suadaa, L. H. (2021). Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study: Power Failure in the Special Region of Yogyakarta). Indonesian Journal of Information Systems, 4(1), 76–86. https://doi.org/10.24002/ijis.v4i1.4677
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Ahmad Khotibul Umam, Farrikh Alzami, Ramadhan Rakhmat Sani, Asih Rohmani, Dwi Puji Prabowo, Dewi Pergiwati, Rama Aria Megantara, Iswahyudi Iswahyudi

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.