IndoBERT-Based Pediatric Disease Classification and Symptom-Based Traditional Medicine Recommendation from Lontar Usada Rare

Authors

  • I Putu Erick Prawira Winata Fakultas Teknologi dan Informatika, Program Studi Teknik Informatika, Institut Bisnis dan Teknologi Indonesia, Bali, Indonesia
  • I Gede Iwan Sudipa Fakultas Teknologi dan Informatika, Program Studi Teknik Informatika, Institut Bisnis dan Teknologi Indonesia, Bali, Indonesia
  • Ni Putu Suci Meinarni Fakultas Teknologi dan Informatika, Program Studi Teknik Informatika, Institut Bisnis dan Teknologi Indonesia, Bali, Indonesia
  • Dewa Ayu Putri Wulandari Fakultas Teknologi dan Informatika, Program Studi Teknik Informatika, Institut Bisnis dan Teknologi Indonesia, Bali, Indonesia
  • Christina Purnama Yanti Fakultas Teknologi dan Informatika, Program Studi Teknik Informatika, Institut Bisnis dan Teknologi Indonesia, Bali, Indonesia

DOI:

10.33395/sinkron.v10i1.15507

Keywords:

IndoBERT Method, Lontar Usada Rare, pediatric disease classification, traditional Balinese medicine

Abstract

This study aims to develop a Balinese traditional text-based pediatric disease classification model using a fine-tuned IndoBERT model on the Lontar Usada Rare dataset. The dataset used consists of 422 entries containing disease symptoms, disease types, medicinal ingredients, and treatment procedures obtained from transliteration of lontar manuscripts and interviews with traditional medicine experts. Pre-processing was done through case folding, cleansing, and normalization, followed by label encoding on 35 disease classes. The IndoBERT model was fine-tuned using the AdamW optimizer with a learning rate of 5e-5, batch size 8, and 15 epochs. Evaluation results showed the model was able to achieve 90.59% accuracy, 94.71% precision, 90.59% recall, and 90.99% F1-score, indicating excellent performance in understanding the linguistic context of traditional medical text. The developed recommendation system integrates model prediction with TF-IDF-based cosine similarity method to provide the most relevant treatment recommendations based on user symptom input. This research makes an important contribution to the digitization and preservation of Balinese traditional medical knowledge through the development of a structured and widely accessible digital knowledge base.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Adnyana, P. E. S. (2020). Lontar Usada Rare: Memahami Kearifan Lokal Tradisional Bali dalam Mendiagnosa Gejala Penyakit Anak. Jurnal Yoga Dan Kesehatan, 3(2), 163–173.

Ansyah, F., & Suryono, R. R. (2025). Sentiment Classification of Indonesian-Language Roblox Reviews Using IndoBERT with SMOTE Optimization. Journal of Applied Informatics and Computing, 9(4), 1868–1877. https://doi.org/https://doi.org/10.30871/jaic.v9i4.10155

Asri, Y., Kuswardani, D., Suliyanti, W. N., Manullang, Y. O., & Ansyari, A. R. (2025). Sentiment analysis based on Indonesian language lexicon and IndoBERT on user reviews PLN mobile application. Indonesian Journal of Electrical Engineering and Computer Science, 38(1), 677–668. https://doi.org/10.11591/ijeecs.v38.i1.pp677-688

Balipost.com. (2025). DISBUD BADUNG LESTARIKAN LONTAR DENGAN DIGITALISASI.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1(Mlm), 4171–4186.

Dewi, N. K. F. P., Sudipa, I. G. I., Sunarya, I. W., Dewi, N. W. J. K., & Kusuma, A. S. (2025). Sentiment Analysis of Roblox Game Reviews Using Support Vector Machine Method. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 9(4), 1863–1876. https://doi.org/https://doi.org/10.33395/sinkron.v9i4.15272

Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, 10, 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048

Holis, R. M., Eko, P., Utomo, P., & Hutabarat, B. F. (2025). Semantic FAQ Chatbot Using SBERT ( Sentence-BERT ) and Cosine Similarity for Academic Services. 5(2), 915–922.

Ihtada, F. K., Alfianita, R., & Aziz, O. Q. (2025). Aspect-based multilabel classification of e-commerce reviews using fine-tuned IndoBERT. 4(1).

Lal Pal, T., & Dutta, K. (2020). Similarity Metrics for Aspect-based Text Classification. Turkish Journal of Computer and Mathematics Education, 11(03), 1607–1611.

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2020). A Survey on Text Classification: From Shallow to Deep Learning. ACM Transactions on Intelligent Systems and Technology, 37(4).

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. 7th International Conference on Learning Representations, ICLR 2019.

Lubis, A. R., & Nasution, M. K. M. (2023). Twitter Data Analysis and Text Normalization in Collecting Standard Word. Journal of Applied Engineering and Technological Science, 4(2), 855–863. https://doi.org/10.37385/jaets.v4i2.1991

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32(NeurIPS).

Pemerintah Provinsi Bali. (2019). Peraturan Gubernur Bali Nomor 55 Tahun 2019 tentang Pelayanan Kesehatan Tradisional Bali. 1–27.

Priadinata, I. P. B., Sudipa, I. G. I., Meinarni, N. P. S., Radhitya, I. M. L., & Supartha, I. K. D. G. (2025). Comparative Analysis of LSTM, GRU, and Bi-LSTM Deep Learning Models for Time Series Cryptocurrency Price Forecasting. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 9(3), 1024–1035. https://doi.org/https://doi.org/10.33395/sinkron.v9i3.14795

Siagian, N. A., Sipayung, S. P., Rikki, A., & Marbun, N. (2025). Integrating SMOTE with XGBoost for Robust Classification on Imbalanced Datasets: A Dual-Domain Evaluation. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 9(3), 1094–1107. https://doi.org/https://doi.org/10.33395/sinkron.v9i3.15029

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., & Purwarianti, A. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding.

Yazid, A. S., & Winarko, E. (2023). Fine-Tuning BERT untuk Menangani Ambiguitas Pada POS Tagging Bahasa Indonesia. Jurnal Linguistik Komputasional (JLK), 6(2), 57–64. https://doi.org/10.26418/jlk.v6i2.148

Downloads


Crossmark Updates

How to Cite

Winata, I. P. E. P., Sudipa, I. G. I. ., Meinarni , N. P. S. ., Wulandari, D. A. P. ., & Yanti, C. P. (2026). IndoBERT-Based Pediatric Disease Classification and Symptom-Based Traditional Medicine Recommendation from Lontar Usada Rare. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 496-511. https://doi.org/10.33395/sinkron.v10i1.15507

Most read articles by the same author(s)

1 2 > >>