Comparison of Feature Extraction Methods on Sentiment Analysis in Hotel Reviews
DOI:
10.33395/sinkron.v7i4.11706Abstract
The development of technology causes things that done through meet in person or coming to a place can now be done by viewing information through gadgets or websites. Nowadays, to find out information about a place that provides accommodation for a vacation or a business visit, it can be done by accessing social media to see reviews from visitors who have visited the place, example, a hotel. Reviews given by hotel visitors are seen as more credible than information obtained from advertisements but the problem is that there are many reviews circulating on social media and it takes a time to analyze them. This study aims to analyze hotel reviews using the sentiment analysis method with the Support Vector Machine (SVM) approach. Sentiment analysis can be used to analyze the opinions of a large number of hotel visitors where it usually focuses on opinions that positive, negative and neutral. Before being analyzed with the support vector machine algorithm, 3 feature extraction methods will be used, namely Bag Of Words, TF-IDF and improvement TF-IDF to get the value of each word weight. The selection of these three methods is carried out by considering the influence of the presence of the same word feature in each review. In this comparison method, TF-IDF was found to be the best feature extraction method with 71.75% accuracy, 78.66% precision, 71.91% recall and 70.08% f1-score. The results obtained indicate that there are influence of features of the word in the hotel review data.
Downloads
References
Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152. https://doi.org/10.1016/j.procs.2019.05.008
Berrar, D. (2018). Cross-validation. In Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics (Vol. 1–3). https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Guo, A., & Yang, T. (2016). Research and improvement of feature words weight based on TFIDF algorithm. Proceedings of 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2016. https://doi.org/10.1109/ITNEC.2016.7560393
Himawan, H., Kaswidjanti, W., Sentimen, A., Sosial, M., & Based, L. (2018). Metode Lexicon Based dan Support Vector Machine untuk Menganalisis Sentimen pada Media Sosial sebagai Rekomendasi Oleh-Oleh Favorit. Seminar Nasional Informatika, 2018(November).
Kurniawan, A., Indriarti, & Adinugroho, S. (2019). Analisis Sentimen Opini Film Menggunakan Metode Naïve Bayes dan Lexicon Based Features. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(9).
Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. Eurasip Journal on Wireless Communications and Networking, Vol. 2017. https://doi.org/10.1186/s13638-017-0993-1
Lo, A. S., & Yao, S. S. (2019). What makes hotel online reviews credible?: An investigation of the roles of reviewer expertise, review rating consistency and review valence. International Journal of Contemporary Hospitality Management, 31(1). https://doi.org/10.1108/IJCHM-10-2017-0671
Najib, A. C., Irsyad, A., Qandi, G. A., & Rakhmawati, N. A. (2019). Perbandingan Metode Lexicon-based dan SVM untuk Analisis Sentimen Berbasis Ontologi pada Kampanye Pilpres Indonesia Tahun 2019 di Twitter. Fountain of Informatics Journal, 4(2). https://doi.org/10.21111/fij.v4i2.3573
Padurariu, C., & Breaban, M. E. (2019). Dealing with data imbalance in text classification. Procedia Computer Science, 159. https://doi.org/10.1016/j.procs.2019.09.229
Pecar, S., Simko, M., & Bielikova, M. (2018). Sentiment analysis of customer reviews: Impact of text pre-processing. DISA 2018 - IEEE World Symposium on Digital Intelligence for Systems and Machines, Proceedings. https://doi.org/10.1109/DISA.2018.8490619
Qader, W. A., Ameen, M. M., & Ahmed, B. I. (2019). An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges. Proceedings of the 5th International Engineering Conference, IEC 2019. https://doi.org/10.1109/IEC47844.2019.8950616
Sarudin, R. (2021). ANALISIS ONLINE REVIEW TRIPADVISOR.COM TERHADAP MINAT PEMBELIAN PRODUK JASA AKOMODASI DI HOTEL MANHATTAN. Jurnal Hospitality Dan Pariwisata, 7(1). https://doi.org/10.30813/jhp.v7i1.2634
Silaa, V., Masui, F., & Ptaszynski, M. (2022). A Method of Supplementing Reviews to Less-Known Tourist Spots Using Geotagged Tweets. Applied Sciences (Switzerland), 12(5). https://doi.org/10.3390/app12052321
Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review. https://doi.org/10.1007/s10462-022-10144-1
Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series, 1168(2). https://doi.org/10.1088/1742-6596/1168/2/022022
Zhu, F. (2021). The Impact of High Technology on the Economy. Proceedings - 2021 5th International Conference on Data Science and Business Analytics, ICDSBA 2021. https://doi.org/10.1109/ICDSBA53075.2021.00069
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2022 Arie Satia Dharma
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.