Enhancing Entity Extraction in E-Government Complaint Data using LDA-Assisted NER

Authors

  • Ahmad Khotibul Umam Universitas Dian Nuswantoro
  • Farrikh Alzami Universitas Dian Nuswantoro
  • Ramadhan Rakhmat Sani Universitas Dian Nuswantoro
  • Asih Rohmani Universitas Dian Nuswantoro
  • Dwi Puji Prabowo Universitas Dian Nuswantoro
  • Dewi Pergiwati Universitas Dian Nuswantoro
  • Rama Aria Megantara Universitas Dian Nuswantoro
  • Iswahyudi Iswahyudi Dinas Komunikasi dan Informatika Provinsi Jawa Tengah

DOI:

10.33395/sinkron.v9i4.15292

Keywords:

Named Entity Recognition, Latent Dirichlet Allocation, Text Mining, Public Services, E-Government

Abstract

With the rapid development of information technology, governments are increasingly challenged to provide digital channels that enhance public participation in governance. LaporGub, an official platform managed by the Central Java Provincial Government, accommodates citizens' aspirations and complaints, but faces challenges in processing large amounts of unstructured text. Manual analysis is time-consuming and error-prone, resulting in delayed responses and decreased service quality. Conventional Named Entity Recognition (NER) models struggle to handle informal Indonesian-language text, while transformer-based approaches require substantial computing resources that are not widely available in local government environments. Therefore, this study aims to develop a lightweight NER approach by integrating Latent Dirichlet Allocation (LDA) as a semantic pre-annotation tool to improve the accuracy of entity extraction in Indonesian e-government complaint data. To achieve this goal, a dataset of 53,858 complaint reports from the LaporGub platform (2022–2025) was processed using LDA topic modeling (k=10) to provide semantic context during annotation. Next, the enriched dataset was used to train a spaCy-based NER model targeting three entity types: LOCATION, ORGANIZATION, and PERSON, with a training-validation-test split ratio of 70:15:15 using stratified sampling. The evaluation showed that the proposed NER+LDA model achieved a precision of 90.03%, a recall of 81.86%, and an F1-score of 85.75%, representing improvements of +5.78, +2.55, and +4.04, respectively, compared to the baseline NER model (F1-score: 81.71%). Furthermore, the most significant improvements occurred in the detection of ORGANIZATION and PERSON entities. These findings confirm that the integration of LDA as a pre-annotation strategy effectively improves NER performance on informal complaint texts in Indonesia, thus offering a practical and resource-efficient alternative to transformer-based methods for e-government applications.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Aditama, A. R., & Wicaksono, A. F. (2025). Classification of customer complaints on social media for e-commerce in Indonesia. International Journal of Electrical and Computer Engineering (IJECE), 15(3), 2977–2985. https://doi.org/10.11591/ijece.v15i3.pp2977-2985

Azzahra, M. D. (2024). Analisis Implementasi Chatbot Sebagai Sarana Komunikasi dan Efisiensi Layanan Pelanggan Terhadap Peningkatan Kinerja PT Pelindo Terminal Petikemas Semarang [Thesis, Universitas Islam Indonesia]. https://dspace.uii.ac.id/handle/123456789/51150

Budi, I., & Suryono, R. R. (2023). Application of named entity recognition method for Indonesian datasets: A review. Bulletin of Electrical Engineering and Informatics, 12(2), 969–978. https://doi.org/10.11591/eei.v12i2.4529

Cahyo, P. W., Aesyi, U. S., Setianto, W. A., & Sulaiman, T. (2025). A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election. International Journal of Information Management Data Insights, 5(2), 100354. https://doi.org/10.1016/j.jjimei.2025.100354

Choirinnisa, D., Alzami, F., Indrayani, H., Rohmani, A., Nugraini, S. H., Zulfiningrumi, R., & Susanti, F. (2025). LDA Topic Modeling: Twitter-Based Public Opinion on Indonesian Ministry of Finance. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(2), 849–863. https://doi.org/10.33395/sinkron.v9i2.14719

Gangadharan, V., & Gupta, D. (2020). Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques. Procedia Computer Science, 171, 1337–1345. https://doi.org/10.1016/j.procs.2020.04.143

Jelita, M. (2024). Text Mining dengan Topic Modelling LDA dari Pertanyaan Gelar Wicara Literasi Perpustakaan Nasional RI. Media Pustakawan, 31(3), 253–265. https://doi.org/10.37014/medpus.v31i3.5237

Khadija, M. A., & Nurharjadmo, W. (2023). Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings. SINERGI, 28(1), 153–162. https://doi.org/10.22441/sinergi.2024.1.015

Kusumawardani, R. P., & Kusumawati, K. N. (2024). Named entity recognition in the medical domain for Indonesian language health consultation services using bidirectional-lstm-crf algorithm. Procedia Computer Science, 245, 1146–1156. https://doi.org/10.1016/j.procs.2024.10.344

Li, J., Sun, A., Han, J., & Li, C. (2022). A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1), 50–70. https://doi.org/10.1109/TKDE.2020.2981314

Muhammad, F., Maghfur, N. M., & Voutama, A. (2022). Sentiment Analysis Dataset On COVID-19 Variant News: Kumpulan Data Analisis Sentimen pada Berita Varian COVID-19. Systematics, 4(1), 382–391. https://doi.org/10.35706/sys.v4i1.6347

Nursyahrina, Defit, S., & Sovia, R. (2024). Metode BERTopic dan LDA untuk Analisis Tren Penelitian Bidang Ilmu Komputer. Jurnal KomtekInfo, 332–341. https://doi.org/10.35134/komtekinfo.v11i4.580

Pardede, J., & Darmawan, D. (2025). Perbandingan Algoritma Stemming Porter, Sastrawi, Idris, Dan Arifin & Setiono Pada Dokumen Teks Bahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(1), 69–76. https://doi.org/10.25126/jtiik.2025128860

Reddy, S. K., Sheshadri, S. K., Avatapalli, K. L., & Gupta, D. (2025). Empirical Study on Efficiency of Different Language Modeling Techniques using Masking of Named Entities for Indic Languages. Procedia Computer Science, 258, 146–159. https://doi.org/10.1016/j.procs.2025.04.228

Sakir, A. R. (2024). Tinjauan Literatur: Pemanfaatan Teknologi Informasi untuk Meningkatkan Mutu Pelayanan Publik. Jurnal Administrasi Publik dan Bisnis, 6(2), 165–171. https://doi.org/10.36917/japabis.v6i2.170

Shidik, G. F., Saputra, F. O., Saraswati, G. W., Winarsih, N. A. S., Rohman, M. S., Pramunendar, R. A., Kusuma, E. J., Ratmana, D. O., Venus, V., Andono, P. N., & Hasibuan, Z. A. (2024). Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM). Journal of Open Innovation: Technology, Market, and Complexity, 10(3), 100358. https://doi.org/10.1016/j.joitmc.2024.100358

Srivastava, S., Paul, B., & Gupta, D. (2023). Study of Word Embeddings for Enhanced Cyber Security Named Entity Recognition. Procedia Computer Science, 218, 449–460. https://doi.org/10.1016/j.procs.2023.01.027

Wafda, A. (2025). Aspect-Based Sentiment Analysis terhadap Cuitan Platform X tentang Kurikulum Merdeka Menggunakan IndoBERT [Thesis, Universitas Islam Indonesia]. https://dspace.uii.ac.id/handle/123456789/55157

Yanti, R. M., Santoso, I., & Suadaa, L. H. (2021). Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study: Power Failure in the Special Region of Yogyakarta). Indonesian Journal of Information Systems, 4(1), 76–86. https://doi.org/10.24002/ijis.v4i1.4677

Downloads


Crossmark Updates

How to Cite

Umam, A. K. ., Alzami, F., Sani, R. R. ., Rohmani, A. ., Prabowo, D. P. ., Pergiwati, D. ., Megantara, R. A. ., & Iswahyudi, I. (2025). Enhancing Entity Extraction in E-Government Complaint Data using LDA-Assisted NER. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 1878-1888. https://doi.org/10.33395/sinkron.v9i4.15292

Most read articles by the same author(s)

1 2 > >>