Implementation of LSA for Topic Modeling on Tweets with the Keyword ‘Kemenkeu’

Authors

  • Shofiyatul Khariroh Information Systems, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Farrikh Alzami Information Systems, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Heni Indrayani Universitas Dian Nuswantoro
  • Ika Novita Dewi Information Systems, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Aris Marjuni Information Systems, Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Mira Riezky Adriani Kementrian Keuangan Republik Indonesia
  • Moh Hadi Subowo UIN Walisongo, Semarang, Indonesia

DOI:

10.33395/sinkron.v9i1.14309

Keywords:

Topic Modeling, Latent Semantic Analysis, TruncatedSVD, Ministry of Finance, Sentiment Analysis

Abstract

This research explores public discourse on financial policies by analyzing tweets mentioning the keyword 'Kemenkeu' (Ministry of Finance). Using Latent Semantic Analysis (LSA), the study examined 10,099 tweets to uncover key topics that reflect public sentiment toward the Ministry’s policies. Preprocessing steps, such as stopword removal and stemming with Sastrawi, were essential to ensure the effectiveness of the analysis. The results revealed three main topics: Finance and Budget, Salaries and Employee Welfare, and Excise and Customs Regulations. These insights provide a better understanding of public opinion on financial issues and highlight the importance of proper text preprocessing in topic modeling. This approach demonstrates how LSA can be used as a tool for analyzing large-scale social media data, offering valuable input for policymakers. Future research could expand on this by using more advanced models or larger datasets to gain deeper insights.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Ahammad, T. (2024). Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach. Natural Language Processing Journal, 6, 100053. https://doi.org/10.1016/j.nlp.2024.100053

Birks, D., Coleman, A., & Jackson, D. (2020). Unsupervised identification of crime problems from police free-text data. Crime Science, 9(1), 18. https://doi.org/10.1186/s40163-020-00127-4

Chen, Y., He, S., Yang, Y., & Liang, F. (2023). Learning Topic Models: Identifiability and Finite-Sample Analysis. Journal of the American Statistical Association, 118(544), 2860–2875. https://doi.org/10.1080/01621459.2022.2089574

Egorova, E., Glukhov, G., & Shikov, E. (2022). Customer transactional behaviour analysis through embedding interpretation. Procedia Computer Science, 212, 284–294. https://doi.org/10.1016/j.procs.2022.11.012

Finansyah, A. Y. W., Afiahayati, F., & Sutanto, V. M. (2022). Performance Comparison of Similarity Measure Algorithm as Data Preprocessing Stage: Text Normalization in Bahasa. Scientific Journal of Informatics, 9(1), 1–7. https://doi.org/10.15294/sji.v9i1.30052

Hepworth, N. (2024). Public Financial Management and Internal Control: The Importance of Managerial Capability for Successful Reform in Developing and Transition Economies. Springer International Publishing. https://doi.org/10.1007/978-3-031-35066-5

Hubert, Phoenix, P., Sudaryono, R., & Suhartono, D. (2021). Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier. Procedia Computer Science, 179, 498–506. https://doi.org/10.1016/j.procs.2021.01.033

Huwaidah, A., Adiwijaya, & Faraby, S. A. (2021). Argument Identification in Indonesian Tweets on the Issue of Moving the Indonesian Capital. Procedia Computer Science, 179, 407–415. https://doi.org/10.1016/j.procs.2021.01.023

Huyut, M. M., Kocaoğlu, B., & Meram, Ü. (2022). Regulation Relatedness Map Creation Method with Latent Semantic Analysis. Computers, Materials and Continua, 72(1), 2093–2107. https://doi.org/10.32604/cmc.2022.024190

Li, Q., Zhao, S., He, T., & Wen, J. (2024). A simple and efficient filter feature selection method via document-term matrix unitization. Pattern Recognition Letters, 181, 23–29. https://doi.org/10.1016/j.patrec.2024.02.025

Morozovskii, D., & Ramanna, S. (2023). Rare words in text summarization. Natural Language Processing Journal, 3, 100014. https://doi.org/10.1016/j.nlp.2023.100014

Murfi, H., Rosaline, N., & Hariadi, N. (2022). Deep autoencoder-based fuzzy c-means for topic detection. Array, 13, 100124. https://doi.org/10.1016/j.array.2021.100124

Nair, R. P., & Thushara, M. G. (2024). Investigating Natural Language Techniques for Accurate Noun and Verb Extraction. Procedia Computer Science, 235, 2876–2885. https://doi.org/10.1016/j.procs.2024.04.272

Nolasco, D., & Oliveira, J. (2019). Subevents detection through topic modeling in social media posts. Future Generation Computer Systems, 93, 290–303. https://doi.org/10.1016/j.future.2018.09.008

Parveen, N., Chakrabarti, P., Hung, B. T., & Shaik, A. (2023). Twitter sentiment analysis using hybrid gated attention recurrent network. Journal of Big Data, 10(1), 50. https://doi.org/10.1186/s40537-023-00726-3

Peng, J., Shen, D., Nie, T., & Kou, Y. (2024). RLclean: An unsupervised integrated data cleaning framework based on deep reinforcement learning. Information Sciences, 682, 121281. https://doi.org/10.1016/j.ins.2024.121281

Qorib, M., Oladunni, T., Denis, M., Ososanya, E., & Cotae, P. (2023). Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Systems with Applications, 212, 118715. https://doi.org/10.1016/j.eswa.2022.118715

Rianto, Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8(1), 26. https://doi.org/10.1186/s40537-021-00413-1

Sagum, R. A., Clacio, P. A. C., Cayetano, R. E. R., & Lobrio, A. D. F. (2023). Philippine Court Case Summarizer using Latent Semantic Analysis. 8th International Conference on Computer Science and Computational Intelligence (ICCSCI 2023), 227, 474–481. https://doi.org/10.1016/j.procs.2023.10.548

Saheb, T., Dehghani, M., & Saheb, T. (2022). Artificial intelligence for sustainable energy: A contextual topic modeling and content analysis. Sustainable Computing: Informatics and Systems, 35, 100699. https://doi.org/10.1016/j.suscom.2022.100699

Siddhartha B S, & N. M. Niveditha, (second). (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Journal of University of Shanghai for Science and Technology, 1–9.

Silva, C. C., Galster, M., & Gilson, F. (2021). Topic modeling in software engineering research. Empirical Software Engineering, 26(6), 120. https://doi.org/10.1007/s10664-021-10026-0

Stevany, R. (2024, July 28). Indonesia Pengguna X atau Twitter Terbanyak Keempat di Dunia. Radio Republik Indonesia. https://rri.co.id/lain-lain/859350/indonesia-pengguna-x-atau-twitter-terbanyak-keempat-di-dunia

Viani, N., Botelle, R., Kerwin, J., Yin, L., Patel, R., Stewart, R., & Velupillai, S. (2021). A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Scientific Reports, 11(1), 757. https://doi.org/10.1038/s41598-020-80457-0

Wang, S., Schraagen, M., Tjong Kim Sang, E., & Dastani, M. (2020). Public Sentiment on Governmental COVID-19 Measures in Dutch Social Media. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. https://doi.org/10.18653/v1/2020.nlpcovid19-2.17

Downloads


Crossmark Updates

How to Cite

Khariroh, S. ., Alzami, F., Indrayani, H. ., Dewi, I. N. ., Marjuni, A. ., Adriani, M. R. ., & Subowo, M. H. . (2025). Implementation of LSA for Topic Modeling on Tweets with the Keyword ‘Kemenkeu’. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 129-148. https://doi.org/10.33395/sinkron.v9i1.14309

Most read articles by the same author(s)