Security Evaluation of Indonesian LLMs for Digital Business Using STAR Prompt Injection
DOI:
10.33395/sinkron.v10i1.15662Keywords:
prompt injection, LLM, red teaming, STAR (Sociotechnical Approach to Red Teaming), digital business applicationsAbstract
The adoption of Large Language Models (LLMs) in digital business systems in Indonesia is rapidly increasing; however, systematic security evaluation against Indonesian language prompt injection remains limited. This study introduces the Indonesian Prompt Injection Dataset, consisting of 50 attack scenarios constructed using the STAR framework, which combines structured instruction variations with sociotechnical context to expose potential model vulnerabilities. The dataset was used to evaluate three commercial LLM platforms ChatGPT using a GPT-4 class lightweight variant (OpenAI), Gemini 2.5 Flash (Google), and Claude Sonnet 4.5 (Anthropic) through controlled experiments targeting instruction manipulation in Indonesian. The results reveal distinct robustness profiles across models. Gemini 2.5 Flash exhibits moderate observed resilience, with 76% of scenarios classified as medium risk and 12% as high risk. ChatGPT demonstrates higher observed robustness under the tested scenarios, with 88% of cases classified as low risk and no high-risk outcomes. Claude Sonnet 4.5 shows intermediate observed resilience, with 72% low-risk and 28% medium-risk scenarios. High-risk cases primarily involve direct role override, urgency- or emotion-based prompts, and anti-censorship instructions, while structural ambiguities and multi-intent manipulations tend to result in medium risk, and mildly persuasive prompts fall under low risk. These findings suggest that while contemporary LLM defense mechanisms are effective against explicit attacks, contextual and emotionally framed manipulations continue to pose residual security challenges. This study contributes the first Indonesian-language prompt injection dataset and demonstrates the STAR framework as a practical and standardized approach for evaluating LLM security in digital business applications.
Downloads
References
Abdelnabi, S., Greshake, K., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec 2023 - Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 79–90. https://doi.org/10.1145/3605764.3623985
Azmi, M. F., Dehan, M., Kautsar, A., Wicaksono, A. F., & Koto, F. (2025). IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages. https://arxiv.org/pdf/2506.02573
Benjamin, V., Braca, E., Carter, I., Kanchwala, H., Khojasteh, N., Landow, C., Luo, Y., Ma, C., Magarelli, A., Mirin, R., Moyer, A., Simpson, K., Skawinski, A., & Heverin, T. (2024). Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures. International Conference on Cyber Warfare and Security, 20(1), 142–150. https://doi.org/10.34190/iccws.20.1.3292
Chen, S. W., Chen, K. L., Li, J. S., & Liu, I. H. (2025). Hands-On Training Framework for Prompt Injection Exploits in Large Language Models. Engineering Proceedings 2025, Vol. 108, Page 25, 108(1), 25. https://doi.org/10.3390/ENGPROC2025108025
Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A., Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., … Clark, J. (2022). Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. https://arxiv.org/pdf/2209.07858
Lee, S., Kim, J., & Pak, W. (2025). Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models. Electronics 2025, Vol. 14, Page 1907, 14(10), 1907. https://doi.org/10.3390/ELECTRONICS14101907
Li, Z., Peng, B., He, P., & Yan, X. (2023). Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 557–568. https://doi.org/10.18653/v1/2024.emnlp-main.33
Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y., Wang, H., Zheng, Y., & Liu, Y. (2023). Prompt Injection attack against LLM-integrated Applications. https://arxiv.org/pdf/2306.05499
Liu, Y., Jia, Y., Geng, R., Jia, J., & Gong, N. Z. (2023). Formalizing and Benchmarking Prompt Injection Attacks and Defenses. Proceedings of the 33rd USENIX Security Symposium, 1831–1847. https://arxiv.org/pdf/2310.12815
Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., Forsyth, D., & Hendrycks, D. (2024). HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal. Proceedings of Machine Learning Research, 235, 35181–35224. https://arxiv.org/pdf/2402.04249
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2023). GPT-4 Technical Report. https://arxiv.org/pdf/2303.08774
Pathade, C. (2025). Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs. https://arxiv.org/pdf/2505.04806
Sarah Mathew, E., & Author, C. (2025). Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses. Journal on Artificial Intelligence, 7(1), 347–363. https://doi.org/10.32604/JAI.2025.069841
Shu, D., Zhang, C., Jin, M., Zhou, Z., & Li, L. (2025). AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models. ACM SIGKDD Explorations Newsletter, 27(1), 10–19. https://doi.org/10.1145/3748239.3748242;WGROUP:STRING:ACM
Toyer, S., Watkins, O., Mendes, E., Svegliato, J., Bailey, L., Wang, T., Ong, I., Elmaaroufi, K., Abbeel, P., Darrell, T., Ritter, A., & Russell, S. (2023). Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game. 12th International Conference on Learning Representations, ICLR 2024. https://arxiv.org/pdf/2311.01011
Weidinger, L., Mellor, J., Pegueroles, B. G., Marchal, N., Kumar, R., Lum, K., Akbulut, C., Diaz, M., Bergman, S., Rodriguez, M., Rieser, V., & Isaac, W. (2024). STAR: SocioTechnical Approach to Red Teaming Language Models. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 21516–21532. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.1200
Wichers Google Research, N., Denison Anthropic, C., & Beirami Google Research, A. (2024). Gradient-Based Language Model Red Teaming. 2862–2881. https://doi.org/10.18653/v1/2024.eacl-long.175
Yi, J., Xie, Y., Zhu, B., Kiciman, E., Sun, G., Xie, X., & Wu, F. (2025). Benchmarking and Defending against Indirect Prompt Injection Attacks on Large Language Models. Proceedings of the International Conference on Knowledge Discovery and Data Mining, 1, 1809–1820. https://doi.org/10.1145/3690624.3709179
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Hafiz Irwandi, Agnes Irene Silitonga, Rudy Chandra, Windi Saputri Simamora

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit




















