Security Evaluation of Indonesian LLMs for Digital Business Using STAR Prompt Injection

Authors

  • Hafiz Irwandi Universitas Negeri Medan
  • Agnes Irene Silitonga Universitas Negeri Medan
  • Rudy Chandra Institut Teknologi Del
  • Windi Saputri Simamora Universitas Satya Terra Bhinneka

DOI:

10.33395/sinkron.v10i1.15662

Keywords:

prompt injection, LLM, red teaming, STAR (Sociotechnical Approach to Red Teaming), digital business applications

Abstract

The adoption of Large Language Models (LLMs) in digital business systems in Indonesia is rapidly increasing; however, systematic security evaluation against Indonesian language prompt injection remains limited. This study introduces the Indonesian Prompt Injection Dataset, consisting of 50 attack scenarios constructed using the STAR framework, which combines structured instruction variations with sociotechnical context to expose potential model vulnerabilities. The dataset was used to evaluate three commercial LLM platforms ChatGPT using a GPT-4 class lightweight variant (OpenAI), Gemini 2.5 Flash (Google), and Claude Sonnet 4.5 (Anthropic) through controlled experiments targeting instruction manipulation in Indonesian. The results reveal distinct robustness profiles across models. Gemini 2.5 Flash exhibits moderate observed resilience, with 76% of scenarios classified as medium risk and 12% as high risk. ChatGPT demonstrates higher observed robustness under the tested scenarios, with 88% of cases classified as low risk and no high-risk outcomes. Claude Sonnet 4.5 shows intermediate observed resilience, with 72% low-risk and 28% medium-risk scenarios. High-risk cases primarily involve direct role override, urgency- or emotion-based prompts, and anti-censorship instructions, while structural ambiguities and multi-intent manipulations tend to result in medium risk, and mildly persuasive prompts fall under low risk. These findings suggest that while contemporary LLM defense mechanisms are effective against explicit attacks, contextual and emotionally framed manipulations continue to pose residual security challenges. This study contributes the first Indonesian-language prompt injection dataset and demonstrates the STAR framework as a practical and standardized approach for evaluating LLM security in digital business applications.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Abdelnabi, S., Greshake, K., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 79–90. https://doi.org/10.1145/3605764.3623985

Azmi, M. F., Dehan, M., Kautsar, A., Wicaksono, A. F., & Koto, F. (2025). IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages. https://arxiv.org/pdf/2506.02573

Benjamin, V., Braca, E., Carter, I., Kanchwala, H., Khojasteh, N., Landow, C., Luo, Y., Ma, C., Magarelli, A., Mirin, R., Moyer, A., Simpson, K., Skawinski, A., & Heverin, T. (2024). Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures. International Conference on Cyber Warfare and Security, 20(1), 142–150. https://doi.org/10.34190/iccws.20.1.3292

Chen, S. W., Chen, K. L., Li, J. S., & Liu, I. H. (2025). Hands-On Training Framework for Prompt Injection Exploits in Large Language Models. Engineering Proceedings 2025, Vol. 108, Page 25, 108(1), 25. https://doi.org/10.3390/ENGPROC2025108025

Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A., Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., … Clark, J. (2022). Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. https://arxiv.org/pdf/2209.07858

Lee, S., Kim, J., & Pak, W. (2025). Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models. Electronics 2025, Vol. 14, Page 1907, 14(10), 1907. https://doi.org/10.3390/ELECTRONICS14101907

Li, Z., Peng, B., He, P., & Yan, X. (2023). Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 557–568. https://doi.org/10.18653/v1/2024.emnlp-main.33

Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y., Wang, H., Zheng, Y., & Liu, Y. (2023). Prompt Injection attack against LLM-integrated Applications. https://arxiv.org/pdf/2306.05499

Liu, Y., Jia, Y., Geng, R., Jia, J., & Gong, N. Z. (2023). Formalizing and Benchmarking Prompt Injection Attacks and Defenses. Proceedings of the 33rd USENIX Security Symposium, 1831–1847. https://arxiv.org/pdf/2310.12815

Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., Forsyth, D., & Hendrycks, D. (2024). HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal. Proceedings of Machine Learning Research, 235, 35181–35224. https://arxiv.org/pdf/2402.04249

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2023). GPT-4 Technical Report. https://arxiv.org/pdf/2303.08774

Pathade, C. (2025). Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs. https://arxiv.org/pdf/2505.04806

Sarah Mathew, E., & Author, C. (2025). Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses. Journal on Artificial Intelligence, 7(1), 347–363. https://doi.org/10.32604/JAI.2025.069841

Shu, D., Zhang, C., Jin, M., Zhou, Z., & Li, L. (2025). AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models. ACM SIGKDD Explorations Newsletter, 27(1), 10–19. https://doi.org/10.1145/3748239.3748242;WGROUP:STRING:ACM

Toyer, S., Watkins, O., Mendes, E., Svegliato, J., Bailey, L., Wang, T., Ong, I., Elmaaroufi, K., Abbeel, P., Darrell, T., Ritter, A., & Russell, S. (2023). Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game. 12th International Conference on Learning Representations, ICLR 2024. https://arxiv.org/pdf/2311.01011

Weidinger, L., Mellor, J., Pegueroles, B. G., Marchal, N., Kumar, R., Lum, K., Akbulut, C., Diaz, M., Bergman, S., Rodriguez, M., Rieser, V., & Isaac, W. (2024). STAR: SocioTechnical Approach to Red Teaming Language Models. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 21516–21532. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.1200

Wichers Google Research, N., Denison Anthropic, C., & Beirami Google Research, A. (2024). Gradient-Based Language Model Red Teaming. 2862–2881. https://doi.org/10.18653/v1/2024.eacl-long.175

Yi, J., Xie, Y., Zhu, B., Kiciman, E., Sun, G., Xie, X., & Wu, F. (2025). Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models. Proceedings of the International Conference on Knowledge Discovery and Data Mining, 1, 1809–1820. https://doi.org/10.1145/3690624.3709179

Downloads


Crossmark Updates

How to Cite

Irwandi, H., Silitonga, A. I., Rudy Chandra, & Simamora, W. S. (2026). Security Evaluation of Indonesian LLMs for Digital Business Using STAR Prompt Injection. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 449-457. https://doi.org/10.33395/sinkron.v10i1.15662