Enhancing Sentiment Analysis Accuracy Using SVM and Slang Word Normalization on YouTube Comments
DOI:
10.33395/sinkron.v9i2.14613Keywords:
Sentiment Analysis, Support Vector Machine, Slang Word Normalization, YouTube Comments, Text ProcessingAbstract
Sentiment analysis is a crucial technique in understanding public opinion, particularly on social media platforms such as YouTube. However, the presence of informal language, including slang words, poses significant challenges to accurate sentiment classification. This study aims to enhance sentiment analysis by implementing a Support Vector Machine (SVM) classifier combined with SMOTEENN data balancing techniques to address class imbalance issues. The research collects 3,375 YouTube comments on the movie Pengabdi Setan 2: Communion using the YouTube Data API. The preprocessing steps include text cleaning, tokenization, stopwords removal, stemming, and slang word normalization using kamusalay.csv to ensure standardization of informal expressions. The extracted features are represented using TF-IDF, and sentiment labeling is performed using VADER. Experimental results show that the SVM model achieves an accuracy of 98%, but struggles with detecting negative sentiments, as indicated by lower recall (38%) and F1-score (53%) for the negative class. Although the application of SMOTEENN improves data balance, further refinements, such as adjusting classification thresholds and integrating deep learning techniques, are necessary to enhance sentiment detection, particularly for informal and emotionally nuanced language. This study contributes to improving sentiment analysis models by demonstrating the effectiveness of slang word normalization in handling non-standard language variations. Future work will explore more sophisticated language models to enhance accuracy in sentiment classification.Downloads
References
Bagate, R. A., & Suguna, R. (2021). Sarcasm detection of tweets without #sarcasm: Data science approach. Indonesian Journal of Electrical Engineering and Computer Science, 23(2), 993–1001. https://doi.org/10.11591/ijeecs.v23.i2.pp993-1001
Chamekh, A., Mahfoudh, M., & Forestier, G. (2022). Sentiment Analysis Based on Deep Learning in E-Commerce. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13369 LNAI, 498–507. https://doi.org/10.1007/978-3-031-10986-7_40
Chinedu, E. Q., Asogwa, E. C., Sunday, B. T., & Macdonald, N. (2023). Unraveling Emotions: Contemporary Approaches in Sentiment Analysis. Journal of Sensor Networks and Data Communications, 3(1), 223–230. https://doi.org/10.33140/jsndc.03.01.14
Ganie, A. G. (2023). Presence of informal language, such as emoticons, hashtags, and slang, impact the performance of sentiment analysis models on social media text? ArXiv Preprint ArXiv:2301.12303.
Khan, M. Y., Ahmed, T., Siddiqui, M. S., & Wasi, S. (2023). Cognitive Relationship-Based Approach for Urdu Sarcasm and Sentiment Classification. IEEE Access, 11(September), 126661–126690. https://doi.org/10.1109/ACCESS.2023.3325048
Kularbphettong, K., Roonrakwit, P., & Boonseng, C. (2024). Sentiment analysis of the awareness of environmental sustainability. Edelweiss Applied Science and Technology, 8(3), 145–155. https://doi.org/10.55214/25768484.v8i3.847
Kumar, A., Narapareddy, V. T., Srikanth, V. A., Malapati, A., & Neti, L. B. M. (2020). Sarcasm Detection Using Multi-Head Attention Based Bidirectional LSTM. IEEE Access, 8, 6388–6397. https://doi.org/10.1109/ACCESS.2019.2963630
Lestari, U., & Anugrahni, D. (2021). Sentiment Analysis of Performance Effectiveness of Malioboro Pedestrian Using Sentistrength Method on Twitter. Jurnal TAM (Technology Acceptance Model), 12(1), 75. https://doi.org/10.56327/jurnaltam.v12i1.1044
Munggaran, J. P., Alhafidz, A. A., Taqy, M., Agustini, D. A. R., & Munawir, M. (2023). Sentiment Analysis of Twitter Users’ Opinion Data Regarding the Use of ChatGPT in Education. Journal of Computer Engineering, Electronics and Information Technology, 2(2), 75–88. https://doi.org/10.17509/coelite.v2i2.59645
Novendri, R., Callista, A. S., Pratama, D. N., & Puspita, C. E. (2020). Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes. Bulletin of Computer Science and Electrical Engineering, 1(1), 26–32. https://doi.org/10.25008/bcsee.v1i1.5
Nurodin, M. I., & Puspitarani, Y. (2023). Phrase Detection’S Impact on Sentiment Analysis of Public Opinion and Online Media Toward Political Figures. Jurnal Riset Informatika, 6(1), 67–76. https://doi.org/10.34288/jri.v6i1.XXX
Omar, A., & Hamouda, W. I. (2021). A Sentiment Analysis of Egypt’s New Real Estate Registration Law on Facebook. International Journal of Advanced Computer Science and Applications, 12(4), 656–663. https://doi.org/10.14569/IJACSA.2021.0120481
Potamias, R. A., Siolas, G., & Stafylopatis, A. G. (2020). A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications, 32(23), 17309–17320. https://doi.org/10.1007/s00521-020-05102-3
Prestianta, A. M. (2021). Mapping the ASEAN YouTube Uploaders. Jurnal ASPIKOM, 6(1), 1. https://doi.org/10.24329/aspikom.v6i1.761
Rao, S. A., Ravi, M. S., Zhao, J. W., Sturgeon, C., & Bilimoria, K. Y. (2020). Social Media Responses to Elective Surgery Cancellations in the Wake of COVID-19. Annals of Surgery, 272(3), E246–E248. https://doi.org/10.1097/SLA.0000000000004106
Redjeki, S., & Widyarto, S. (2022). Comparison of Seven Machine Learning Algorithms in the Classification of Public Opinion. Tech-E, 5(2), 143–149. https://doi.org/10.31253/te.v5i1.1046
Syafia, A. N., Hidayattullah, M. F., & Suteddy, W. (2023). Studi Komparasi Algoritma SVM Dan Random Forest Pada Analisis Sentimen Komentar Youtube BTS. Jurnal Informatika: Jurnal Pengembangan IT, 8(3), 207–212. https://doi.org/10.30591/jpit.v8i3.5064
Williams, L., Anthi, E., & Burnap, P. (2024). Comparing Hierarchical Approaches to Enhance Supervised Emotive Text Classification. Big Data and Cognitive Computing, 8(4). https://doi.org/10.3390/bdcc8040038
Xiong, W., Zuo, Y., Zhang, M., Zhang, C., & Guo, C. (2024). Research on Sentiment Analysis of E-commerce Live Comments based on Text Mining. Frontiers in Computing and Intelligent Systems, 6(3), 34–36. https://doi.org/10.54097/c2wofcb2
Zhan, T., Shi, C., Shi, Y., Li, H., & Lin, Y. (2024). Optimization techniques for sentiment analysis based on LLM (GPT-3). Applied and Computational Engineering, 67(1), 27–33. https://doi.org/10.54254/2755-2721/67/2024ma0060
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Alfin Nur Aziz Saputra, Rujianto Eko Saputro, Dhanar Intan Surya Saputra

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.