Whoosh User Sentiment Analysis on Social Media Using Word2Vec and the Best Naïve Bayes Probability Model
DOI:
10.33395/sinkron.v8i3.13742Keywords:
high-speed train, naïve bayes, sentiment analysis, twitter, word2vecAbstract
By using the Twitter microblogging feature, users can post short tweets with limited characters that express their thoughts and opinions regarding a matter. The newest transportation in Indonesia, a high-speed train namely Whoosh is one of the things that Twitter users responded to. This latest transportation has led to the emergence of opinions from the Indonesian people which are shared publicly in various media, one of which is social media. Therefore, to make it easier for business people or companies to understand public opinion regarding service improvements in the future, sentiment analysis on social media is needed to determine user opinions regarding high-speed train transportation. In this research, sentiment analysis of high-speed train users will be carried out on social media Twitter using Word2Vec and Naïve Bayes as classification methods. In this research, a comparison of Naïve Bayes models will also be carried out to find out the best Naïve Bayes method opportunity model. Simultaneously, the Word2vec feature extraction method was chosen because Word2Vec can be used to improve model performance and increase the accuracy of sentiment classification. This research found that the Word2Vec Skip-Gram model outperformed the Word2Vec CBOW model. The best model obtained was the use of the Gaussian Naïve Bayes and Word2Vec Skip-Gram models with an accuracy score of 77.18%, precision 70.35%, recall 76.09%, and f1-score 73.10%.
Downloads
References
Awla Rajul. (2023). Kritik dan Saran dari Calon Penumpang Kereta Cepat Jakarta Bandung. BandungBergerak.Id. https://bandungbergerak.id/article/detail/15785/kritik-dan-saran-dari-calon-penumpang-kereta-cepat-jakarta-bandung
Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, 340–341, 250–261. https://doi.org/https://doi.org/10.1016/j.ins.2016.01.033
Fitriana, D. N., & Sibaroni, Y. (2020). Sentiment analysis on kai twitter post using multiclass support vector machine (svm). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(5), 846–853.
Gayathri, B. M., & Sumathi, C. P. (2016). An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer. International Journal of Computer Applications, 148(6), 16–21. https://doi.org/10.5120/ijca2016911146
Gershon, R. R. M. (2005). Public transportation: Advantages and challenges. Journal of Urban Health, 82(1), 7–9. https://doi.org/10.1093/jurban/jti003
Hitesh, M., Vaibhav, V., Kalki, Y. J. A., Kamtam, S. H., & Kumari, S. (2019). Real-Time Sentiment Analysis of 2019 Election Tweets using Word2vec and Random Forest Model. 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), 146–151. https://doi.org/10.1109/ICCT46177.2019.8969049
Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2Vec Model Analysis for Semantic Similarities in English Words. Procedia Computer Science, 157, 160–167. https://doi.org/https://doi.org/10.1016/j.procs.2019.08.153
Kristiyanti, D. A., Putri, D. A., Indrayuni, E., Nurhadi, A., & Umam, A. H. (2020). E-Wallet Sentiment Analysis Using Naïve Bayes and Support Vector Machine Algorithm. Journal of Physics: Conference Series, 1641(1), 012079. https://doi.org/10.1088/1742-6596/1641/1/012079
Lestari, F., Purba, A., & Zakaria, A. (2018). Komparasi Pembangunan Kereta Cepat di Indonesia Dengan Kereta Cepat di Negara Lain dari Sudut Pandang Ekonomi. Prosiding Semnas SINTA FT UNILA Vol. 1 Tahun 2018, 1(1), 266–272.
Li, Y., & Yang, T. (2018). Word Embedding for Understanding Natural Language: A Survey. In S. Srinivasan (Ed.), Guide to Big Data Applications (pp. 83–104). Springer International Publishing. https://doi.org/10.1007/978-3-319-53817-4_4
Malik, A., Heyman-Schrum, C., & Johri, A. (2019). Use of Twitter across educational settings: a review of the literature. International Journal of Educational Technology in Higher Education, 16(1), 36. https://doi.org/10.1186/s41239-019-0166-x
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:7311285
Muhammad, P. F., Kusumaningrum, R., & Wibowo, A. (2021). Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews. Procedia Computer Science, 179, 728–735. https://doi.org/https://doi.org/10.1016/j.procs.2021.01.061
Parveen, H., & Pandey, S. (2016). Sentiment analysis on Twitter Data-set using Naive Bayes algorithm. 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (ICATccT), 416–419. https://doi.org/10.1109/ICATCCT.2016.7912034
Ramdhani, S. L., Andreswari, R., & Hasibuan, M. A. (2018). Sentiment Analysis of Product Reviews using Naive Bayes Algorithm: A Case Study. 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT), 123–127. https://doi.org/10.1109/EIConCIT.2018.8878528
Rennie, J. D. M., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the Poor Assumptions of Naive Bayes Text Classifiers. Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 616–623.
Rizal, S., Adiwijaya, & Purbolaksono, M. D. (2022). Sentiment Analysis on Movie Review from Rotten Tomatoes Using Word2Vec and Naive Bayes. 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT), 180–185. https://doi.org/10.1109/ICoSEIT55604.2022.10030009
Santoso, J., Soetiono, A. D. B., Gunawan, Setyati, E., Yuniarno, E. M., Hariadi, M., & Purnomo, M. H. (2018). Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 7(2). https://jurnal.ugm.ac.id/v3/JNTETI/article/view/2765
Sari, E. Y., Wierfi, A. D., & Setyanto, A. (2019). Sentiment Analysis of Customer Satisfaction on Transportation Network Company Using Naive Bayes Classifier. 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), 1–6. https://doi.org/10.1109/CENIM48368.2019.8973262
Sato, N., Komiya, K., Fujimoto, K., & Kotani, Y. (2011). Categorization of product pages depending on information on the Web. 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), 393–398. https://doi.org/10.1109/JCSSE.2011.5930153
Saud, S., Jamil, B., Upadhyay, Y., & Irshad, K. (2020). Performance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach. Sustainable Energy Technologies and Assessments, 40, 100768. https://doi.org/https://doi.org/10.1016/j.seta.2020.100768
Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press Cambridge.
Shamantha, R. B., Shetty, S. M., & Rai, P. (2019). Sentiment Analysis Using Machine Learning Classifiers: Evaluation of Performance. 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), 21–25. https://doi.org/10.1109/CCOMS.2019.8821650
Wongkar, M., & Angdresey, A. (2019). Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter. 2019 Fourth International Conference on Informatics and Computing (ICIC), 1–5. https://doi.org/10.1109/ICIC47613.2019.8985884
Xu, J., Zhang, Y., & Miao, D. (2020). Three-way confusion matrix for classification: A measure driven view. Information Sciences, 507, 772–794. https://doi.org/https://doi.org/10.1016/j.ins.2019.06.064
Yang, F.-J. (2018). An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI), 301–306. https://doi.org/10.1109/CSCI46756.2018.00065
Zeng, G. (2020). On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics - Theory and Methods, 49(9), 2080–2093. https://doi.org/10.1080/03610926.2019.1568485
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2024 Muhammad Dinan Islamanda, Yuliant Sibaroni
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.