Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings

Authors

  • Muhammad Idris Universitas Sriwijaya
  • Ahmad Rifai Universitas Sriwijaya
  • Ken Ditha Tania Universitas Sriwijaya

DOI:

10.33395/sinkron.v9i1.14278

Keywords:

FastText, Word2Vec, Naive Bayes, Random Forest, Support Vector Machine, Sentiment Analysis, Tokopedia

Abstract

Tokopedia, a prominent e-commerce platform in Indonesia, generates vast amounts of user feedback, offering valuable insights into customer satisfaction through sentiment analysis. However, sentiment analysis of app reviews specifically on Tokopedia reviews remains underexplored. Sentiment analysis, also known as opinion mining, categorizes user sentiments into positive or negative, offering insights into user preferences and service shortcomings. While traditional text representation techniques like TF-IDF are widely used for sentiment analysis, they lack the semantic richness provided by advanced word embeddings such as Word2Vec and FastText, which excel at capturing contextual relationships between words. These methods have shown potential to enhance the performance of machine learning models in sentiment analysis tasks. This study evaluates the performance of three machine learning algorithms—Support Vector Machine (SVM), Random Forest (RF), and Gaussian Naïve Bayes (NB)—combined with Word2Vec and FastText feature extraction. A dataset of 59,811 Tokopedia app reviews was scraped from the Google Play Store, preprocessed, and subjected to feature extraction using Word2Vec and FastText. SVM achieved the best performance, with an accuracy of 89.06% using FastText and 89.02% using Word2Vec. RF ranked second with accuracies of 88.07% for FastText and 88.14% for Word2Vec. NB showed the lowest performance, achieving 84.26% with Word2Vec and 83.73% with FastText. Differences in performance between Word2Vec and FastText embeddings were minimal across all algorithms, highlighting their comparable effectiveness. These results underscore SVM’s consistent superiority across various configurations for sentiment analysis.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Asian, J., Rosita, M. D., & Mantoro, T. (2022). Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods. Jurnal Online Informatika, 7(1), 132–141. https://doi.org/10.15575/join.v7i1.900

Chong, K., & Shah, N. (2022). Comparison of Naive Bayes and SVM Classification in Grid-Search Hyperparameter Tuned and Non-Hyperparameter Tuned Healthcare Stock Market Sentiment Analysis. International Journal of Advanced Computer Science and Applications, 13(12), 90–94. https://doi.org/10.14569/IJACSA.2022.0131213

Damayanti, L., & Lhaksmana, K. M. (2024). Sentiment Analysis of the 2024 Indonesia Presidential Election on Twitter. Sinkron, 8(2), 938–946. https://doi.org/10.33395/sinkron.v8i2.13379

Das, R. K., Islam, M., Hasan, M. M., Razia, S., Hassan, M., & Khushbu, S. A. (2023). Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models. Heliyon, 9(9), e20281. https://doi.org/10.1016/j.heliyon.2023.e20281

Garousi, V., & Cutting, D. (2021). What do users think of the UK’s three COVID-19 contact-tracing apps? A comparative analysis. BMJ Health & Care Informatics, 28(1), e100320. https://doi.org/10.1136/bmjhci-2021-100320

Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660–667. https://doi.org/10.1016/j.procs.2021.12.187

Islamanda, M. D., & Sibaroni, Y. (2024). Whoosh User Sentiment Analysis on Social Media Using Word2Vec and the Best Naïve Bayes Probability Model. Sinkron, 8(3), 1558–1568. https://doi.org/10.33395/sinkron.v8i3.13742

Karunia, K., Putri, A. E., Fachriani, M. D., & Rois, M. H. (2024). Evaluation of the Effectiveness of Neural Network Models for Analyzing Customer Review Sentiments on Marketplace. Public Research Journal of Engineering, Data Technology and Computer Science, 2(1), 52–59. https://doi.org/10.57152/predatecs.v2i1.1100

Khomsah, S., Ramadhani, R. D., & Wijaya, S. (2022). The Accuracy Comparison Between Word2Vec and FastText On Sentiment Analysis of Hotel Reviews. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(3), 352–358. https://doi.org/10.29207/resti.v6i3.3711

Kristiyanti, D. A. & Hardani, S. (2023). Sentiment Analysis of Public Acceptance of Covid-19 Vaccines Types in Indonesia using Naïve Bayes, Support Vector Machine, and Long Short-Term Memory (LSTM). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(3), 722–732. https://doi.org/10.29207/resti.v7i3.4737

Liu, B. (2022). Sentiment Analysis and Opinion Mining. Springer Nature.

Pangilinan, G. A., Tambunan, A., & Astuti, E. D. (2023). Tokopedia E-Commerce is Being Used to Present Opportunities for Young Business Owners to Succeed in the Digital Economy Amid the Pandemic. Startupreneur Business Digital (SABDA Journal), 2(2), 182–191. https://doi.org/10.33050/sabda.v2i2.284

Raihan, M. A. & Setiawan, E. B. (2022). Aspect Based Sentiment Analysis with FastText Feature Expansion and Support Vector Machine Method on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(4), 591–598. https://doi.org/10.29207/resti.v6i4.4187

Regina, R., Saragih, T. H., & Kartini, D. (2023). ANALISIS SENTIMEN BRAND AMBASSADOR BTS TERHADAP TOKOPEDIA MENGGUNAKAN KLASIFIKASI BAYESIAN NETWORK DENGAN EKSTRAKSI FITUR TF-IDF. Jurnal Informatika Polinema, 9(4), 383–390. https://doi.org/10.33795/jip.v9i4.1333

Royyan, A.R. & Setiawan, E.B. (2022). Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 78–84. https://doi.org/10.29207/resti.v6i1.3525

Rezki, N., Thamrin, S. A., & Siswanto, S. (2023). SENTIMENT ANALYSIS OF MERDEKA BELAJAR KAMPUS MERDEKA POLICY USING SUPPORT VECTOR MACHINE WITH WORD2VEC. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 17(1), 481–486. https://doi.org/10.30598/barekengvol17iss1pp0481-0486

Sumertajaya, I. M., Angraini, Y., Harahap, J. R., & Fitrianto, A. (2022). Sentiment Analysis on Covid-19 Vaccination in Indonesia Using Support Vector Machine and Random Forest. JUITA: Jurnal Informatika, 10(1), 1. https://doi.org/10.30595/juita.v10i1.12394

Susilawati, E. (2021). The Influence of Mobile Banking Easiness and Flash Sale towards Impulse Buying on Shopee Users in Bandung. Proceeding of International Conference on Business, Economics, Social Sciences, and Humanities, 4, 333–338. https://doi.org/10.34010/icobest.v2i.291

Downloads


Crossmark Updates

How to Cite

Idris, M. ., Rifai, A., & Tania, K. D. (2025). Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 210-219. https://doi.org/10.33395/sinkron.v9i1.14278