Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings
DOI:
10.33395/sinkron.v9i1.14278Keywords:
FastText, Word2Vec, Naive Bayes, Random Forest, Support Vector Machine, Sentiment Analysis, TokopediaAbstract
Tokopedia, a prominent e-commerce platform in Indonesia, generates vast amounts of user feedback, offering valuable insights into customer satisfaction through sentiment analysis. However, sentiment analysis of app reviews specifically on Tokopedia reviews remains underexplored. Sentiment analysis, also known as opinion mining, categorizes user sentiments into positive or negative, offering insights into user preferences and service shortcomings. While traditional text representation techniques like TF-IDF are widely used for sentiment analysis, they lack the semantic richness provided by advanced word embeddings such as Word2Vec and FastText, which excel at capturing contextual relationships between words. These methods have shown potential to enhance the performance of machine learning models in sentiment analysis tasks. This study evaluates the performance of three machine learning algorithms—Support Vector Machine (SVM), Random Forest (RF), and Gaussian Naïve Bayes (NB)—combined with Word2Vec and FastText feature extraction. A dataset of 59,811 Tokopedia app reviews was scraped from the Google Play Store, preprocessed, and subjected to feature extraction using Word2Vec and FastText. SVM achieved the best performance, with an accuracy of 89.06% using FastText and 89.02% using Word2Vec. RF ranked second with accuracies of 88.07% for FastText and 88.14% for Word2Vec. NB showed the lowest performance, achieving 84.26% with Word2Vec and 83.73% with FastText. Differences in performance between Word2Vec and FastText embeddings were minimal across all algorithms, highlighting their comparable effectiveness. These results underscore SVM’s consistent superiority across various configurations for sentiment analysis.
Downloads
References
Asian, J., Rosita, M. D., & Mantoro, T. (2022). Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods. Jurnal Online Informatika, 7(1), 132–141. https://doi.org/10.15575/join.v7i1.900
Chong, K., & Shah, N. (2022). Comparison of Naive Bayes and SVM Classification in Grid-Search Hyperparameter Tuned and Non-Hyperparameter Tuned Healthcare Stock Market Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13(12), 90–94. https://doi.org/10.14569/IJACSA.2022.0131213
Damayanti, L., & Lhaksmana, K. M. (2024). Sentiment Analysis of the 2024 Indonesia Presidential Election on Twitter. Sinkron, 8(2), 938–946. https://doi.org/10.33395/sinkron.v8i2.13379
Das, R. K., Islam, M., Hasan, M. M., Razia, S., Hassan, M., & Khushbu, S. A. (2023). Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models. Heliyon, 9(9), e20281. https://doi.org/10.1016/j.heliyon.2023.e20281
Garousi, V., & Cutting, D. (2021). What do users think of the UK’s three COVID-19 contact-tracing apps? A comparative analysis. BMJ Health & Care Informatics, 28(1), e100320. https://doi.org/10.1136/bmjhci-2021-100320
Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660–667. https://doi.org/10.1016/j.procs.2021.12.187
Islamanda, M. D., & Sibaroni, Y. (2024). Whoosh User Sentiment Analysis on Social Media Using Word2Vec and the Best Naïve Bayes Probability Model. Sinkron, 8(3), 1558–1568. https://doi.org/10.33395/sinkron.v8i3.13742
Karunia, K., Putri, A. E., Fachriani, M. D., & Rois, M. H. (2024). Evaluation of the Effectiveness of Neural Network Models for Analyzing Customer Review Sentiments on Marketplace. Public Research Journal of Engineering, Data Technology and Computer Science, 2(1), 52–59. https://doi.org/10.57152/predatecs.v2i1.1100
Khomsah, S., Ramadhani, R. D., & Wijaya, S. (2022). The Accuracy Comparison Between Word2Vec and FastText On Sentiment Analysis of Hotel Reviews. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(3), 352–358. https://doi.org/10.29207/resti.v6i3.3711
Kristiyanti, D. A. & Hardani, S. (2023). Sentiment Analysis of Public Acceptance of Covid-19 Vaccines Types in Indonesia using Naïve Bayes, Support Vector Machine, and Long Short-Term Memory (LSTM). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(3), 722–732. https://doi.org/10.29207/resti.v7i3.4737
Liu, B. (2022). Sentiment Analysis and Opinion Mining. Springer Nature.
Pangilinan, G. A., Tambunan, A., & Astuti, E. D. (2023). Tokopedia E-Commerce is Being Used to Present Opportunities for Young Business Owners to Succeed in the Digital Economy Amid the Pandemic. Startupreneur Business Digital (SABDA Journal), 2(2), 182–191. https://doi.org/10.33050/sabda.v2i2.284
Raihan, M. A. & Setiawan, E. B. (2022). Aspect Based Sentiment Analysis with FastText Feature Expansion and Support Vector Machine Method on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(4), 591–598. https://doi.org/10.29207/resti.v6i4.4187
Regina, R., Saragih, T. H., & Kartini, D. (2023). ANALISIS SENTIMEN BRAND AMBASSADOR BTS TERHADAP TOKOPEDIA MENGGUNAKAN KLASIFIKASI BAYESIAN NETWORK DENGAN EKSTRAKSI FITUR TF-IDF. Jurnal Informatika Polinema, 9(4), 383–390. https://doi.org/10.33795/jip.v9i4.1333
Royyan, A.R. & Setiawan, E.B. (2022). Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 78–84. https://doi.org/10.29207/resti.v6i1.3525
Rezki, N., Thamrin, S. A., & Siswanto, S. (2023). SENTIMENT ANALYSIS OF MERDEKA BELAJAR KAMPUS MERDEKA POLICY USING SUPPORT VECTOR MACHINE WITH WORD2VEC. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(1), 481–486. https://doi.org/10.30598/barekengvol17iss1pp0481-0486
Sumertajaya, I. M., Angraini, Y., Harahap, J. R., & Fitrianto, A. (2022). Sentiment Analysis on Covid-19 Vaccination in Indonesia Using Support Vector Machine and Random Forest. JUITA: Jurnal Informatika, 10(1), 1. https://doi.org/10.30595/juita.v10i1.12394
Susilawati, E. (2021). The Influence of Mobile Banking Easiness and Flash Sale towards Impulse Buying on Shopee Users in Bandung. Proceeding of International Conference on Business, Economics, Social Sciences, and Humanities, 4, 333–338. https://doi.org/10.34010/icobest.v2i.291
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Idris, Ahmad Rifai, Ken Ditha Tania

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.