Sarcasm Detection in Indonesian YouTube Comments using Fine-Tuned IndoBERT with Class Imbalance Handling

Authors

  • Ahmad Muhlis Fanani Magister Teknologi Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional, Jakarta , Indonesia
  • Moh. Iwan Wahyuddin Magister Teknologi Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional, Jakarta , Indonesia

DOI:

10.33395/sinkron.v10i1.15607

Keywords:

BERT, Class Imbalance, GPT-4o, IndoBERT, Indonesian Language, Natural Language Processing, sarcasm detection, YouTube Comments

Abstract

Sarcasm detection in Indonesian social media faces challenges in natural language processing due to implicit meanings and limited labeled datasets. YouTube, with 143 million users in Indonesia, represents a largely unexplored source of sarcastic expressions. This study aims to develop an automatic sarcasm detection system for Indonesian YouTube comments using fine-tuned IndoBERT and evaluate the performance of two IndoBERT variants. A dataset of 5,291 YouTube comments was collected and automatically labeled using GPT-4o with structured prompts based on linguistic indicators of sarcasm. Two IndoBERT variants (IndoNLU and IndoLEM) were fine-tuned with three class imbalance mitigation strategies: imbalanced, under-sampling, and class weighting. Zero-shot evaluation was conducted as a baseline to measure fine-tuning effectiveness. Models were evaluated using accuracy, precision, recall, and F1-score metrics. Pre-trained models without fine-tuning showed very limited sarcasm detection capability with F1-scores of 0.1613 for IndoNLU and 0.3519 for IndoLEM. Fine-tuning with under-sampling dramatically improved F1-scores to 0.6499 for IndoNLU and 0.6568 for IndoLEM, showing improvements up to 303%. IndoBERT-IndoNLU provided more balanced performance with 0.6424 accuracy, while IndoLEM showed higher sarcasm recall of 0.7639. Fine-tuning IndoBERT is effective for detecting sarcasm in Indonesian YouTube comments. This study contributes by providing a new labeled dataset, demonstrating the effectiveness of automatic labeling using large language models, and providing empirical evidence of the significant value of fine-tuning for Indonesian sarcasm detection.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://arxiv.org/abs/1810.04805

Gole, Montgomery, Williams-Paul Nwadiugwu, and Andriy Miranskyy. 2024. “On Sarcasm Detection with OpenAI GPT-Based Models.” Pp. 1–6 in 2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON). doi: 10.1109/CASCON62161.2024.10837875.

Grice, Herbert Paul. 1975. Logic and Conversation. New York: Academic Press.

Islam, Saiful, Mahmuda Ferdusi, and Tanjim Taharat Aurpa. 2025. “Words of War: A Hybrid BERT-CNN Approach for Topic-Wise Sentiment Analysis on The Russia-Ukraine War.” Expert Systems with Applications 284. doi:https://doi.org/10.1016/j.eswa.2025.127759.

Jia, Mengzhao, Can Xie, and Liqiang Jing. 2024. “Debiasing Multimodal Sarcasm Detection with Contrastive Learning.” Proceedings of the AAAI Conference on Artificial Intelligence 38(16):18354–62. doi:10.1609/aaai.v38i16.29795.

Koto, Fajri, Afshin Rahimi, Jey Han Lau, and Timothy Baldwin. 2020. “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-Trained Language Model for Indonesian NLP.” Pp. 757–70 in Proceedings of the 28th International Conference on Computational Linguistics, edited by D. Scott, N. Bel, and C. Zong. Barcelona, Spain (Online): International Committee on Computational Linguistics.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://arxiv.org/abs/1810.04805

Gole, Montgomery, Williams-Paul Nwadiugwu, and Andriy Miranskyy. 2024. “On Sarcasm Detection with OpenAI GPT-Based Models.” Pp. 1–6 in 2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON). doi: 10.1109/CASCON62161.2024.10837875.

Grice, Herbert Paul. 1975. Logic and Conversation. New York: Academic Press.

Islam, Saiful, Mahmuda Ferdusi, and Tanjim Taharat Aurpa. 2025. “Words of War: A Hybrid BERT-CNN Approach for Topic-Wise Sentiment Analysis on The Russia-Ukraine War.” Expert Systems with Applications 284. doi:https://doi.org/10.1016/j.eswa.2025.127759.

Jia, Mengzhao, Can Xie, and Liqiang Jing. 2024. “Debiasing Multimodal Sarcasm Detection with Contrastive Learning.” Proceedings of the AAAI Conference on Artificial Intelligence 38(16):18354–62. doi:10.1609/aaai.v38i16.29795.

Koto, Fajri, Afshin Rahimi, Jey Han Lau, and Timothy Baldwin. 2020. “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-Trained Language Model for Indonesian NLP.” Pp. 757–70 in Proceedings of the 28th International Conference on Computational Linguistics, edited by D. Scott, N. Bel, and C. Zong. Barcelona, Spain (Online): International Committee on Computational Linguistics.

Ma’aly, Ahmad Nahid, Dita Pramesti, Ariadani Dwi Fathurahman, and Hanif Fakhrurroja. 2024. “Exploring Sentiment Analysis for the Indonesian Presidential Election Through Online Reviews Using Multi-Label Classification with a Deep Learning Algorithm.” Information 15(11):705. doi:10.3390/info15110705.

Mandhasiya, Dwi Guna, Hendri Murfi, and Alhadi Bustamam. 2024. “The Hybrid of BERT and Deep Learning Models for Indonesian Sentiment Analysis.” Indonesian Journal of Electrical Engineering and Computer Science 33(1):591. doi:10.11591/ijeecs.v33.i1.pp591-602.

Qin, Zhenkai, Qining Luo, Zhidong Zang, and Hongpeng Fu. 2025. “Detecting Sarcasm in User-Generated Content Integrating Transformers and Gated Graph Neural Networks.” PeerJ Computer Science 11:e2817. doi:10.7717/peerj-cs.2817.

Ranti, Kiefer Stefano, and Abba Suganda Girsang. 2020. “Indonesian Sarcasm Detection Using Convolutional Neural Network.” International Journal of Emerging Trends in Engineering Research 8(9):4952–55. doi:10.30534/ijeter/2020/10892020.

Razali, Md Saifullah, Alfian Abdul Halin, Lei Ye, Shyamala Doraisamy, and Noris Mohd Norowi. 2021. “Sarcasm Detection Using Deep Learning With Contextual Features.” IEEE Access 9:68609–18. doi:10.1109/ACCESS.2021.3076789.

Sharma, Dilip Kumar, Bhuvanesh Singh, Saurabh Agarwal, Hyunsung Kim, and Raj Sharma. 2022. “Sarcasm Detection over Social Media Platforms Using Hybrid Auto-Encoder-Based Model.” Electronics 11(18):2844. doi:10.3390/electronics11182844.

Suhartono, Derwin, Wilson Wongso, and Alif Tri Handoyo. 2024. “IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection.” IEEE Access 12:87323–32. doi:10.1109/ACCESS.2024.3416955.

We Are Social, and Meltwater. 2025. Digital 2025 Global Overview Report. Research Report. 2. London: We Are Social. https://wearesocial.com/wp-content/uploads/2025/02/GDR-2025-v2.pdf.

Wilie, Bryan, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and Ayu Purwarianti. 2020. “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding.”. doi:10.48550/arXiv.2009.05387

Downloads


Crossmark Updates

How to Cite

Fanani, A. M., & Wahyuddin, M. I. . (2026). Sarcasm Detection in Indonesian YouTube Comments using Fine-Tuned IndoBERT with Class Imbalance Handling. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1). https://doi.org/10.33395/sinkron.v10i1.15607