Fairer Public Complaint Classification on LaporGub: Integrating XLM-RoBERTa with Focal Loss for Imbalance Data
DOI:
10.33395/sinkron.v9i4.15260Keywords:
RoBERTa, Focal Loss, Text ClassificationAbstract
The advancement of digital technology has provided opportunities for governments to improve the quality of public services through citizen complaint channels. One example of this implementation in Indonesia is Lapor Gub, managed by the Dinas Komunikasi dan Informasi Provinsi Jawa Tengah (Communication and Information Agency of Central Java Province). This platform receives thousands of complaints daily, ranging from infrastructure, social issues, to illegal levies. However, the large volume of data and the imbalanced distribution of categories pose significant challenges for both manual and automated processing. This study aims to classify citizen complaint texts using XLM-RoBERTa combined with Focal Loss as an approach to handle data imbalance. The dataset consists of 53,774 complaints after data cleaning and text preprocessing. The training process applied a stratified split (78% training, 18% validation, 10% testing) and fine-tuning for 10 epochs. Model performance was evaluated using accuracy, precision, recall, and macro F1-score. The results show that the model without Focal Loss achieved 78.1% accuracy with a macro F1-score of 0.606, while the model with Focal Loss improved the macro F1-score to 0.625 with 78.5% accuracy. These findings demonstrate that the application of Focal Loss enhances the model’s ability to recognize minority categories without reducing performance on majority classes. Therefore, the combination of RoBERTa and Focal Loss offers an effective solution to support faster, fairer, and more transparent public complaint management.
Downloads
References
Afida, D., Udayanti, ED, & Kartikadarma, E. (2021). Text Mining Application for Clustering Public Complaints in Semarang City Using the K-means Algorithm. Transformatika Journal , 18 (2), 215–224. https://doi.org/10.26623/transformatika.v18i2.2362
Kristina, EC, Setyawati, E., & Wati, L. (2023). Android-Based Public Complaint Information System for Banyumas. Electro Luceat , 9 (2), 1–13. https://doi.org/10.32531/jelekn.v9i2.684
Kunaefi, A., Abidin, Z., & Kusumawati, R. (2025). CLASSIFICATION OF HOAX NEWS IN INDONESIAN LANGUAGE USING INDOBERT FINE-TUNING WITH A FOCAL LOSS APPROACH ON UNBALANCED DATA. JIPI (Scientific Journal of Informatics Research and Learning) , 10 (2), Article 2. https://doi.org/10.29100/jipi.v10i2.7811
Mufidah, FS, Winarno, S., Alzami, F., Udayanti, ED, & Sani, RR (2022). Analysis of Public Sentiment Towards Shopeefood Services Through Twitter Social Media Using the Naïve Bayes Classifier Algorithm. JOINS (Journal of Information System) , 7 (1), 14–25.
Pusung, EM, & Dewi, IN (2024). RoBERTa Optimization with Hyperparameter Tuning for Text-based Emotion Detection. National Journal of Technology and Information Systems , 10 (3), Article 3. https://doi.org/10.25077/TEKNOSI.v10i3.2024.240-248
Rahma, IA, & Suadaa, LH (2023). Application of Text Augmentation to Overcome Imbalanced Data in Indonesian Text Classification. Journal of Information Technology and Computer Science , 10 (6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325
Sani, RR, Pratiwi, YA, Winarno, S., Udayanti, ED, & Alzami, F. (2022). Comparative Analysis of Naive Bayes Classifier and Support Vector Machine Algorithms for Hoax News Classification in Indonesian Online News. Journal of Informatics Society , 13 (2), 85–98.
Wiciaputra, Y., Young, J., & Rusli, A. (2021). Bilingual Text Classification in English and Indonesian via Transfer Learning using XLM-RoBERTa. International Journal of Advances in Soft Computing and Its Applications , 13 (3), 73–87. https://doi.org/10.15849/IJASCA.211128.06
Abdel-salam, R. (2022). reamtchka at SemEval-2022 Task 6: Investigating the effect of different loss functions for Sarcasm detection for unbalanced datasets. 896–906.
Arham, M., Mohan, R., & Kadiyala, R. (2025). 1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs. 1(1), 1–13.
Azadi, A., Ansari, B., Zamani, S., & Eetemadi, S. (2024). Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa. 1(1), 1–8.
Jurn, S., & Kim, W. (2025). Improving Text Classification of Imbalanced Call Center Conversations Through Data Cleansing, Augmentation, and NER Metadata. Electronics, 14(11), 2259. https://doi.org/10.3390/electronics14112259
Kunaefi, A., Abidin, Z., & Kusumawati, R. (2025). KLASIFIKASI BERITA HOAKS BAHASA INDONESIA MENGGUNAKAN INDOBERT FINE-TUNING DENGAN PENDEKA-TAN FOCAL LOSS PADA DATA TIDAK SEIMBANG. JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), 10(2), Article 2. https://doi.org/10.29100/jipi.v10i2.7811
Mufidah, F. S., Winarno, S., Alzami, F., Udayanti, E. D., & Sani, R. R. (2022). Analisis Sentimen Masyarakat Terhadap Layanan Shopeefood Melalui Media Sosial Twitter Dengan Algoritma Naïve Bayes Classifier. JOINS (Journal of Information System), 7(1), 14–25.
Nemoto, S., Kitada, S., & Iyatomi, H. (2021). Majority or Minority: Data Imbalance Learning Method for Named Entity Recognition. 1(1).
Rahma, I. A., & Suadaa, L. H. (2023). Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325
Sani, R. R., Pratiwi, Y. A., Winarno, S., Udayanti, E. D., & Alzami, F. (2022). Analisis Perbandingan Algoritma Naive Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia. Jurnal Masyarakat Informatika, 13(2), 85–98.
Song, G., Huang, D., & Xiao, Z. (2021). A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution. 1–16.
Vasyl, D., Vitalii, B., Renat, A., & Mykola, B. (2024). EVALUATING CUSTOMER EXPERIENCE IN E-COMMERCE : MULTILINGUAL SENTIMENT ANALYSIS OF USER REVIEWS USING. 0.
Younes, Y., & Mathiak, B. (2021). Handling Class Imbalance when Detecting Dataset Mentions with Pre-trained Language Models.
Zhao, H., Chen, H., Ruggles, T. A., Feng, Y., Singh, D., & Yoon, H.-J. (2024). Improving Text Classification with Large Language Model-Based Data Augmentation. 11(2535), 1–14.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Azzula Cerliana Zahro, Farrikh Alzami, Ramadhan Rakhmat Sani, Amiq Fahmi, Rama Aria Megantara, Muhammad Naufal, Harun Al Azies, Iswahyudi Iswahyudi

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.