Implementation of LSA for Topic Modeling on Tweets with the Keyword ‘Kemenkeu’
DOI:
10.33395/sinkron.v9i1.14309Keywords:
Topic Modeling, Latent Semantic Analysis, TruncatedSVD, Ministry of Finance, Sentiment AnalysisAbstract
This research explores public discourse on financial policies by analyzing tweets mentioning the keyword 'Kemenkeu' (Ministry of Finance). Using Latent Semantic Analysis (LSA), the study examined 10,099 tweets to uncover key topics that reflect public sentiment toward the Ministry’s policies. Preprocessing steps, such as stopword removal and stemming with Sastrawi, were essential to ensure the effectiveness of the analysis. The results revealed three main topics: Finance and Budget, Salaries and Employee Welfare, and Excise and Customs Regulations. These insights provide a better understanding of public opinion on financial issues and highlight the importance of proper text preprocessing in topic modeling. This approach demonstrates how LSA can be used as a tool for analyzing large-scale social media data, offering valuable input for policymakers. Future research could expand on this by using more advanced models or larger datasets to gain deeper insights.
Downloads
References
Ahammad, T. (2024). Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach. Natural Language Processing Journal, 6, 100053. https://doi.org/10.1016/j.nlp.2024.100053
Birks, D., Coleman, A., & Jackson, D. (2020). Unsupervised identification of crime problems from police free-text data. Crime Science, 9(1), 18. https://doi.org/10.1186/s40163-020-00127-4
Chen, Y., He, S., Yang, Y., & Liang, F. (2023). Learning Topic Models: Identifiability and Finite-Sample Analysis. Journal of the American Statistical Association, 118(544), 2860–2875. https://doi.org/10.1080/01621459.2022.2089574
Egorova, E., Glukhov, G., & Shikov, E. (2022). Customer transactional behaviour analysis through embedding interpretation. Procedia Computer Science, 212, 284–294. https://doi.org/10.1016/j.procs.2022.11.012
Finansyah, A. Y. W., Afiahayati, F., & Sutanto, V. M. (2022). Performance Comparison of Similarity Measure Algorithm as Data Preprocessing Stage: Text Normalization in Bahasa. Scientific Journal of Informatics, 9(1), 1–7. https://doi.org/10.15294/sji.v9i1.30052
Hepworth, N. (2024). Public Financial Management and Internal Control: The Importance of Managerial Capability for Successful Reform in Developing and Transition Economies. Springer International Publishing. https://doi.org/10.1007/978-3-031-35066-5
Hubert, Phoenix, P., Sudaryono, R., & Suhartono, D. (2021). Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier. Procedia Computer Science, 179, 498–506. https://doi.org/10.1016/j.procs.2021.01.033
Huwaidah, A., Adiwijaya, & Faraby, S. A. (2021). Argument Identification in Indonesian Tweets on the Issue of Moving the Indonesian Capital. Procedia Computer Science, 179, 407–415. https://doi.org/10.1016/j.procs.2021.01.023
Huyut, M. M., Kocaoğlu, B., & Meram, Ü. (2022). Regulation Relatedness Map Creation Method with Latent Semantic Analysis. Computers, Materials and Continua, 72(1), 2093–2107. https://doi.org/10.32604/cmc.2022.024190
Li, Q., Zhao, S., He, T., & Wen, J. (2024). A simple and efficient filter feature selection method via document-term matrix unitization. Pattern Recognition Letters, 181, 23–29. https://doi.org/10.1016/j.patrec.2024.02.025
Morozovskii, D., & Ramanna, S. (2023). Rare words in text summarization. Natural Language Processing Journal, 3, 100014. https://doi.org/10.1016/j.nlp.2023.100014
Murfi, H., Rosaline, N., & Hariadi, N. (2022). Deep autoencoder-based fuzzy c-means for topic detection. Array, 13, 100124. https://doi.org/10.1016/j.array.2021.100124
Nair, R. P., & Thushara, M. G. (2024). Investigating Natural Language Techniques for Accurate Noun and Verb Extraction. Procedia Computer Science, 235, 2876–2885. https://doi.org/10.1016/j.procs.2024.04.272
Nolasco, D., & Oliveira, J. (2019). Subevents detection through topic modeling in social media posts. Future Generation Computer Systems, 93, 290–303. https://doi.org/10.1016/j.future.2018.09.008
Parveen, N., Chakrabarti, P., Hung, B. T., & Shaik, A. (2023). Twitter sentiment analysis using hybrid gated attention recurrent network. Journal of Big Data, 10(1), 50. https://doi.org/10.1186/s40537-023-00726-3
Peng, J., Shen, D., Nie, T., & Kou, Y. (2024). RLclean: An unsupervised integrated data cleaning framework based on deep reinforcement learning. Information Sciences, 682, 121281. https://doi.org/10.1016/j.ins.2024.121281
Qorib, M., Oladunni, T., Denis, M., Ososanya, E., & Cotae, P. (2023). Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Systems with Applications, 212, 118715. https://doi.org/10.1016/j.eswa.2022.118715
Rianto, Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8(1), 26. https://doi.org/10.1186/s40537-021-00413-1
Sagum, R. A., Clacio, P. A. C., Cayetano, R. E. R., & Lobrio, A. D. F. (2023). Philippine Court Case Summarizer using Latent Semantic Analysis. 8th International Conference on Computer Science and Computational Intelligence (ICCSCI 2023), 227, 474–481. https://doi.org/10.1016/j.procs.2023.10.548
Saheb, T., Dehghani, M., & Saheb, T. (2022). Artificial intelligence for sustainable energy: A contextual topic modeling and content analysis. Sustainable Computing: Informatics and Systems, 35, 100699. https://doi.org/10.1016/j.suscom.2022.100699
Siddhartha B S, & N. M. Niveditha, (second). (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Journal of University of Shanghai for Science and Technology, 1–9.
Silva, C. C., Galster, M., & Gilson, F. (2021). Topic modeling in software engineering research. Empirical Software Engineering, 26(6), 120. https://doi.org/10.1007/s10664-021-10026-0
Stevany, R. (2024, July 28). Indonesia Pengguna X atau Twitter Terbanyak Keempat di Dunia. Radio Republik Indonesia. https://rri.co.id/lain-lain/859350/indonesia-pengguna-x-atau-twitter-terbanyak-keempat-di-dunia
Viani, N., Botelle, R., Kerwin, J., Yin, L., Patel, R., Stewart, R., & Velupillai, S. (2021). A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Scientific Reports, 11(1), 757. https://doi.org/10.1038/s41598-020-80457-0
Wang, S., Schraagen, M., Tjong Kim Sang, E., & Dastani, M. (2020). Public Sentiment on Governmental COVID-19 Measures in Dutch Social Media. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. https://doi.org/10.18653/v1/2020.nlpcovid19-2.17
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Shofiyatul Khariroh, Farrikh Alzami, Heni Indrayani, Ika Novita Dewi, Aris Marjuni, Mira Riezky Adriani, Moh Hadi Subowo

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.