Fairer Public Complaint Classification on LaporGub: Integrating XLM-RoBERTa with Focal Loss for Imbalance Data

Authors

  • Azzula Cerliana Zahro Universitas Dian Nuswantoro
  • Farrikh Alzami Universitas Dian Nuswantoro
  • Ramadhan Rakhmat Sani Universitas Dian Nuswantoro
  • Amiq Fahmi Universitas Dian Nuswantoro
  • Rama Aria Megantara Universitas Dian Nuswantoro
  • Muhammad Naufal Universitas Dian Nuswantoro
  • Harun Al Azies Universitas Dian Nuswantoro
  • Iswahyudi Iswahyudi Dinas Komunikasi dan Informatika Provinsi Jawa Tengah

DOI:

10.33395/sinkron.v9i4.15260

Keywords:

RoBERTa, Focal Loss, Text Classification

Abstract

The advancement of digital technology has provided opportunities for governments to improve the quality of public services through citizen complaint channels. One example of this implementation in Indonesia is Lapor Gub, managed by the Dinas Komunikasi dan Informasi Provinsi Jawa Tengah (Communication and Information Agency of Central Java Province). This platform receives thousands of complaints daily, ranging from infrastructure, social issues, to illegal levies. However, the large volume of data and the imbalanced distribution of categories pose significant challenges for both manual and automated processing. This study aims to classify citizen complaint texts using XLM-RoBERTa combined with Focal Loss as an approach to handle data imbalance. The dataset consists of 53,774 complaints after data cleaning and text preprocessing. The training process applied a stratified split (78% training, 18% validation, 10% testing) and fine-tuning for 10 epochs. Model performance was evaluated using accuracy, precision, recall, and macro F1-score. The results show that the model without Focal Loss achieved 78.1% accuracy with a macro F1-score of 0.606, while the model with Focal Loss improved the macro F1-score to 0.625 with 78.5% accuracy. These findings demonstrate that the application of Focal Loss enhances the model’s ability to recognize minority categories without reducing performance on majority classes. Therefore, the combination of RoBERTa and Focal Loss offers an effective solution to support faster, fairer, and more transparent public complaint management.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Afida, D., Udayanti, ED, & Kartikadarma, E. (2021). Text Mining Application for Clustering Public Complaints in Semarang City Using the K-means Algorithm. Transformatika Journal , 18 (2), 215–224. https://doi.org/10.26623/transformatika.v18i2.2362

Kristina, EC, Setyawati, E., & Wati, L. (2023). Android-Based Public Complaint Information System for Banyumas. Electro Luceat , 9 (2), 1–13. https://doi.org/10.32531/jelekn.v9i2.684

Kunaefi, A., Abidin, Z., & Kusumawati, R. (2025). CLASSIFICATION OF HOAX NEWS IN INDONESIAN LANGUAGE USING INDOBERT FINE-TUNING WITH A FOCAL LOSS APPROACH ON UNBALANCED DATA. JIPI (Scientific Journal of Informatics Research and Learning) , 10 (2), Article 2. https://doi.org/10.29100/jipi.v10i2.7811

Mufidah, FS, Winarno, S., Alzami, F., Udayanti, ED, & Sani, RR (2022). Analysis of Public Sentiment Towards Shopeefood Services Through Twitter Social Media Using the Naïve Bayes Classifier Algorithm. JOINS (Journal of Information System) , 7 (1), 14–25.

Pusung, EM, & Dewi, IN (2024). RoBERTa Optimization with Hyperparameter Tuning for Text-based Emotion Detection. National Journal of Technology and Information Systems , 10 (3), Article 3. https://doi.org/10.25077/TEKNOSI.v10i3.2024.240-248

Rahma, IA, & Suadaa, LH (2023). Application of Text Augmentation to Overcome Imbalanced Data in Indonesian Text Classification. Journal of Information Technology and Computer Science , 10 (6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325

Sani, RR, Pratiwi, YA, Winarno, S., Udayanti, ED, & Alzami, F. (2022). Comparative Analysis of Naive Bayes Classifier and Support Vector Machine Algorithms for Hoax News Classification in Indonesian Online News. Journal of Informatics Society , 13 (2), 85–98.

Wiciaputra, Y., Young, J., & Rusli, A. (2021). Bilingual Text Classification in English and Indonesian via Transfer Learning using XLM-RoBERTa. International Journal of Advances in Soft Computing and Its Applications , 13 (3), 73–87. https://doi.org/10.15849/IJASCA.211128.06

Abdel-salam, R. (2022). reamtchka at SemEval-2022 Task 6: Investigating the effect of different loss functions for Sarcasm detection for unbalanced datasets. 896–906.

Arham, M., Mohan, R., & Kadiyala, R. (2025). 1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs. 1(1), 1–13.

Azadi, A., Ansari, B., Zamani, S., & Eetemadi, S. (2024). Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa. 1(1), 1–8.

Jurn, S., & Kim, W. (2025). Improving Text Classification of Imbalanced Call Center Conversations Through Data Cleansing, Augmentation, and NER Metadata. Electronics, 14(11), 2259. https://doi.org/10.3390/electronics14112259

Kunaefi, A., Abidin, Z., & Kusumawati, R. (2025). KLASIFIKASI BERITA HOAKS BAHASA INDONESIA MENGGUNAKAN INDOBERT FINE-TUNING DENGAN PENDEKA-TAN FOCAL LOSS PADA DATA TIDAK SEIMBANG. JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), 10(2), Article 2. https://doi.org/10.29100/jipi.v10i2.7811

Mufidah, F. S., Winarno, S., Alzami, F., Udayanti, E. D., & Sani, R. R. (2022). Analisis Sentimen Masyarakat Terhadap Layanan Shopeefood Melalui Media Sosial Twitter Dengan Algoritma Naïve Bayes Classifier. JOINS (Journal of Information System), 7(1), 14–25.

Nemoto, S., Kitada, S., & Iyatomi, H. (2021). Majority or Minority: Data Imbalance Learning Method for Named Entity Recognition. 1(1).

Rahma, I. A., & Suadaa, L. H. (2023). Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325

Sani, R. R., Pratiwi, Y. A., Winarno, S., Udayanti, E. D., & Alzami, F. (2022). Analisis Perbandingan Algoritma Naive Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia. Jurnal Masyarakat Informatika, 13(2), 85–98.

Song, G., Huang, D., & Xiao, Z. (2021). A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution. 1–16.

Vasyl, D., Vitalii, B., Renat, A., & Mykola, B. (2024). EVALUATING CUSTOMER EXPERIENCE IN E-COMMERCE : MULTILINGUAL SENTIMENT ANALYSIS OF USER REVIEWS USING. 0.

Younes, Y., & Mathiak, B. (2021). Handling Class Imbalance when Detecting Dataset Mentions with Pre-trained Language Models.

Zhao, H., Chen, H., Ruggles, T. A., Feng, Y., Singh, D., & Yoon, H.-J. (2024). Improving Text Classification with Large Language Model-Based Data Augmentation. 11(2535), 1–14.

Downloads


Crossmark Updates

How to Cite

Zahro, A. C. ., Alzami, F., Sani, R. R. ., Fahmi, A., Megantara, R. A., Naufal, M., Azies, H. A., & Iswahyudi, I. (2025). Fairer Public Complaint Classification on LaporGub: Integrating XLM-RoBERTa with Focal Loss for Imbalance Data. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 1850-1862. https://doi.org/10.33395/sinkron.v9i4.15260

Most read articles by the same author(s)

1 2 > >>