Whoosh User Sentiment Analysis on Social Media Using Word2Vec and the Best Naïve Bayes Probability Model

Authors

  • Muhammad Dinan Islamanda School of Computing, Telkom University, Indonesia
  • Yuliant Sibaroni School of Computing, Telkom University, Indonesia

DOI:

10.33395/sinkron.v8i3.13742

Keywords:

high-speed train, naïve bayes, sentiment analysis, twitter, word2vec

Abstract

By using the Twitter microblogging feature, users can post short tweets with limited characters that express their thoughts and opinions regarding a matter. The newest transportation in Indonesia, a high-speed train namely Whoosh is one of the things that Twitter users responded to. This latest transportation has led to the emergence of opinions from the Indonesian people which are shared publicly in various media, one of which is social media. Therefore, to make it easier for business people or companies to understand public opinion regarding service improvements in the future, sentiment analysis on social media is needed to determine user opinions regarding high-speed train transportation. In this research, sentiment analysis of high-speed train users will be carried out on social media Twitter using Word2Vec and Naïve Bayes as classification methods. In this research, a comparison of Naïve Bayes models will also be carried out to find out the best Naïve Bayes method opportunity model. Simultaneously, the Word2vec feature extraction method was chosen because Word2Vec can be used to improve model performance and increase the accuracy of sentiment classification. This research found that the Word2Vec Skip-Gram model outperformed the Word2Vec CBOW model. The best model obtained was the use of the Gaussian Naïve Bayes and Word2Vec Skip-Gram models with an accuracy score of 77.18%, precision 70.35%, recall 76.09%, and f1-score 73.10%.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Awla Rajul. (2023). Kritik dan Saran dari Calon Penumpang Kereta Cepat Jakarta Bandung. BandungBergerak.Id. https://bandungbergerak.id/article/detail/15785/kritik-dan-saran-dari-calon-penumpang-kereta-cepat-jakarta-bandung

Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, 340–341, 250–261. https://doi.org/https://doi.org/10.1016/j.ins.2016.01.033

Fitriana, D. N., & Sibaroni, Y. (2020). Sentiment analysis on kai twitter post using multiclass support vector machine (svm). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(5), 846–853.

Gayathri, B. M., & Sumathi, C. P. (2016). An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer. International Journal of Computer Applications, 148(6), 16–21. https://doi.org/10.5120/ijca2016911146

Gershon, R. R. M. (2005). Public transportation: Advantages and challenges. Journal of Urban Health, 82(1), 7–9. https://doi.org/10.1093/jurban/jti003

Hitesh, M., Vaibhav, V., Kalki, Y. J. A., Kamtam, S. H., & Kumari, S. (2019). Real-Time Sentiment Analysis of 2019 Election Tweets using Word2vec and Random Forest Model. 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), 146–151. https://doi.org/10.1109/ICCT46177.2019.8969049

Jatnika, D., Bijaksana, M. A., & Suryani, A. A. (2019). Word2Vec Model Analysis for Semantic Similarities in English Words. Procedia Computer Science, 157, 160–167. https://doi.org/https://doi.org/10.1016/j.procs.2019.08.153

Kristiyanti, D. A., Putri, D. A., Indrayuni, E., Nurhadi, A., & Umam, A. H. (2020). E-Wallet Sentiment Analysis Using Naïve Bayes and Support Vector Machine Algorithm. Journal of Physics: Conference Series, 1641(1), 012079. https://doi.org/10.1088/1742-6596/1641/1/012079

Lestari, F., Purba, A., & Zakaria, A. (2018). Komparasi Pembangunan Kereta Cepat di Indonesia Dengan Kereta Cepat di Negara Lain dari Sudut Pandang Ekonomi. Prosiding Semnas SINTA FT UNILA Vol. 1 Tahun 2018, 1(1), 266–272.

Li, Y., & Yang, T. (2018). Word Embedding for Understanding Natural Language: A Survey. In S. Srinivasan (Ed.), Guide to Big Data Applications (pp. 83–104). Springer International Publishing. https://doi.org/10.1007/978-3-319-53817-4_4

Malik, A., Heyman-Schrum, C., & Johri, A. (2019). Use of Twitter across educational settings: a review of the literature. International Journal of Educational Technology in Higher Education, 16(1), 36. https://doi.org/10.1186/s41239-019-0166-x

McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:7311285

Muhammad, P. F., Kusumaningrum, R., & Wibowo, A. (2021). Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews. Procedia Computer Science, 179, 728–735. https://doi.org/https://doi.org/10.1016/j.procs.2021.01.061

Parveen, H., & Pandey, S. (2016). Sentiment analysis on Twitter Data-set using Naive Bayes algorithm. 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (ICATccT), 416–419. https://doi.org/10.1109/ICATCCT.2016.7912034

Ramdhani, S. L., Andreswari, R., & Hasibuan, M. A. (2018). Sentiment Analysis of Product Reviews using Naive Bayes Algorithm: A Case Study. 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT), 123–127. https://doi.org/10.1109/EIConCIT.2018.8878528

Rennie, J. D. M., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the Poor Assumptions of Naive Bayes Text Classifiers. Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 616–623.

Rizal, S., Adiwijaya, & Purbolaksono, M. D. (2022). Sentiment Analysis on Movie Review from Rotten Tomatoes Using Word2Vec and Naive Bayes. 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT), 180–185. https://doi.org/10.1109/ICoSEIT55604.2022.10030009

Santoso, J., Soetiono, A. D. B., Gunawan, Setyati, E., Yuniarno, E. M., Hariadi, M., & Purnomo, M. H. (2018). Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 7(2). https://jurnal.ugm.ac.id/v3/JNTETI/article/view/2765

Sari, E. Y., Wierfi, A. D., & Setyanto, A. (2019). Sentiment Analysis of Customer Satisfaction on Transportation Network Company Using Naive Bayes Classifier. 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), 1–6. https://doi.org/10.1109/CENIM48368.2019.8973262

Sato, N., Komiya, K., Fujimoto, K., & Kotani, Y. (2011). Categorization of product pages depending on information on the Web. 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), 393–398. https://doi.org/10.1109/JCSSE.2011.5930153

Saud, S., Jamil, B., Upadhyay, Y., & Irshad, K. (2020). Performance improvement of empirical models for estimation of global solar radiation in India: A k-fold cross-validation approach. Sustainable Energy Technologies and Assessments, 40, 100768. https://doi.org/https://doi.org/10.1016/j.seta.2020.100768

Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press Cambridge.

Shamantha, R. B., Shetty, S. M., & Rai, P. (2019). Sentiment Analysis Using Machine Learning Classifiers: Evaluation of Performance. 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), 21–25. https://doi.org/10.1109/CCOMS.2019.8821650

Wongkar, M., & Angdresey, A. (2019). Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter. 2019 Fourth International Conference on Informatics and Computing (ICIC), 1–5. https://doi.org/10.1109/ICIC47613.2019.8985884

Xu, J., Zhang, Y., & Miao, D. (2020). Three-way confusion matrix for classification: A measure driven view. Information Sciences, 507, 772–794. https://doi.org/https://doi.org/10.1016/j.ins.2019.06.064

Yang, F.-J. (2018). An Implementation of Naive Bayes Classifier. 2018 International Conference on Computational Science and Computational Intelligence (CSCI), 301–306. https://doi.org/10.1109/CSCI46756.2018.00065

Zeng, G. (2020). On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics - Theory and Methods, 49(9), 2080–2093. https://doi.org/10.1080/03610926.2019.1568485

Downloads


Crossmark Updates

How to Cite

Islamanda, M. D. ., & Yuliant Sibaroni. (2024). Whoosh User Sentiment Analysis on Social Media Using Word2Vec and the Best Naïve Bayes Probability Model. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(3), 1558-1568. https://doi.org/10.33395/sinkron.v8i3.13742