Comparison of Sentiment Analysis Methods on Topic Haram of Music In Youtube
DOI:
10.33395/sinkron.v8i4.12776Keywords:
Analysis Sentiment, Classification, Imbalance Data, Machine Learning, Natural Language Processing, Word EmbeddingAbstract
Sentiment analysis on video lectures on YouTube that discuss the haram of music is an exciting topic to find out public opinion. This study aims to find what factors affect the model's accuracy in sentiment analysis, especially on video lecture content on YouTube. The data used is comment data on three video lectures that discuss the haram of music, which has been labelled into two categories: positive and negative. The data is divided into two categories, namely primary data, as many as 2099 data that have not been normalized, while secondary data has 1001 data that have been normalized. The experiment shows that the validity of the data, labelling the data, the amount of data, and preprocessing are essential points in forming a good sentiment analysis classification model because, from the test results, it was found that imbalance techniques such as SMOTE, word embedding word2Vec and FastText, and SVM and KNN classification algorithms do not provide maximum accuracy if the data used primary data. However, the data imbalance process, such as oversampling and SVM and KNN classification algorithms, will provide better model accuracy if used with secondary data. Based on the trial results, it is found that when using the SVM algorithm, primary data produces the highest accuracy at 58.35%, while secondary data is 72.23%. If using KNN, the primary data provides the highest model accuracy at 53.54%, while the secondary data has the highest accuracy at 72.81%. Based on these results, it was found that the validity of the data or data must be appropriate and related to the case raised and labelling the data must be done carefully because the most crucial is the inappropriate data in preprocessing the data must be done correctly, if data preprocessing is done in an inappropriate way then data imbalance techniques such as oversampling do not have enough influence on increasing accuracy, but if on the contrary then accuracy will increase. The selection of the right word embedding also affects accuracy. It is necessary to do many experiments to select the correct algorithm and follow the data owned because selecting the correct algorithm will provide maximum accuracy model results
Downloads
References
Abro, S., Shaikh, S., Abro, R. A., Soomro, S. F., & Malik, H. M. (2020). Aspect based sentimental analysis of hotel reviews: a comparative study. Sukkur IBA Journal of Computing and Mathematical Sciences, 4(1), 11-20.
Ahmadi, M. I., Gustian, D., & Sembiring, F. (2021). Analisis Sentiment Masyarakat terhadap Kasus Covid-19 pada Media Sosial Youtube dengan Metode Naive bayes. J-SAKTI (Jurnal Sains Komputer dan Informatika), 5(2), 807-814.
Al Fathir, R. S., Agus, T. R., Suyono, A. A., & Ibrahim, F. (2021). Analisis Sentimen Haramnya Musik Secara Umum Menggunakan Metode KNN. METIK JURNAL, 5(2), 66-70.
Asgarnezhad, R., Monadjemi, S. A., & Aghaei, M. S. (2022). A new hierarchy framework for feature engineering through multi‐objective evolutionary algorithm in text classification. Concurrency and Computation: Practice and Experience, 34(3), e6594.
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146.
Fransiska, S., Rianto, R., & Gufroni, A. I. (2020). Sentiment Analysis Provider by. U on Google Play Store Reviews with TF-IDF and Support Vector Machine (SVM) Method. Scientific Journal of Informatics, 7(2), 203-212.
Harpizon, H. A. R., Kurniawan, R., Iskandar, I., Salambue, R., Budianita, E., & Syafria, F. (2022). Analisis Sentimen Komentar Di YouTube Tentang Ceramah Ustadz Abdul Somad Menggunakan Algoritma Naïve Bayes. Analisis Sentimen Komentar Di YouTube Tentang Ceramah Ustadz Abdul Somad Menggunakan Algoritma Naïve Bayes, 5(1), 131-140.
Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667.
Karani, D. (2018). Introduction to word embedding and word2vec. Towards Data Science, 1.
Khomsah, S. (2021). Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest. Telematika: Jurnal Informatika dan Teknologi Informasi, 18(1), 61-72.
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150.
Naresh, A., & Venkata Krishna, P. (2021). An efficient approach for sentiment analysis using machine learning algorithm. Evolutionary Intelligence, 14(2), 725-731.
Neogi, A. S., Garg, K. A., Mishra, R. K., & Dwivedi, Y. K. (2021). Sentiment analysis and classification of Indian farmers’ protest using twitter data. International Journal of Information Management Data Insights, 1(2), 100019.
Nurdeni, D. A., Budi, I., & Santoso, A. B. (2021). Sentiment Analysis on Covid19 Vaccines in Indonesia: From The Perspective of Sinovac and Pfizer. 122–127. https://doi.org/10.1109/eiconcit50028.2021.9431852
Nurkholis, A., Alita, D., & Munandar, A. (2022). Comparison of Kernel Support Vector Machine Multi-Class in PPKM Sentiment Analysis on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(2), 227-233.
Rahat, A. M., Kahir, A., & Masum, A. K. M. (2019, November). Comparison of Naive Bayes and SVM Algorithm based on sentiment analysis using review dataset. In 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART) (pp. 266-270). IEEE.
Raisa, J. F., Ulfat, M., Al Mueed, A., & Reza, S. S. (2021, February). A review on Twitter sentiment analysis approaches. In 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) (pp. 375-379). IEEE.
Rini, R., Utami, E., & Hartanto, A. D. (2020, October). Systematic Literature Review Of Hate Speech Detection With Text Mining. In 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1-6). IEEE.
Rohman, A. N., Musyarofah, R. L., Utami, E., & Raharjo, S. (2020, October). Natural Language Processing on Marketplace Product Review Sentiment Analysis. In 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1-5). IEEE.
Shamrat, F. M. J. M., Chakraborty, S., Imran, M. M., Muna, J. N., Billah, M. M., Das, P., & Rahman, O. M. (2021). Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm. Indones. J. Electr. Eng. Comput. Sci, 23(1).
Thavarasan, S., & Mahesan, S. (2020, July). Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts. In 2020 Moratuwa engineering research conference (MERCon) (pp. 272-276). IEEE.
Tripathi, M. (2021). Sentiment Analysis of Nepali COVID19 Tweets Using NB SVM and LSTM. Journal of Artificial Intelligence, 3(03), 151-168.
Utami, E., & Luthfi, E. T. (2018, March). Text mining based on tax comments as big data analysis using SVM and feature selection. In 2018 International Conference on Information and Communications Technology (ICOIACT) (pp. 537-542). IEEE.
Wisnu, H., Afif, M., & Ruldevyani, Y. (2020). Sentiment analysis on customer satisfaction of digital payment in Indonesia: A comparative study using KNN and Naïve Bayes. In Journal of Physics: Conference Series (Vol. 1444, No. 1, p. 012034). IOP Publishing.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2023 Rahmat Saudi Al Fathir As, Ema Utami, Anggit Dwi Hartono
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.