Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews
Keywords:K-Means, DBScan, Text Clustering, Product Review, RapidMiner
The purpose of this study was to compare the accuracy performance of the K-Means and DBScan algorithms in clustering product reviews. This comparison evaluated to determine which algorithm is better in terms of accuracy. The two algorithms were chosen because they have different methods of clustering, K-Means uses centroid-based while DBScan uses density-based. Text clustering results can be implemented on e-commerce platforms, marketplaces or product review platforms. This can help customers in deciding what product they will buy. One of the factors that customers have difficulty in determining what product they will buy is the number of reviews that each product has, and the difficulty in concluding the advantages of each product that will be matched their needs or desires. With text clustering, it can be easier and faster for customer to determine whether the product is worth buying or not based on the product reviews they read. The data set used in this study is a review of the Cetaphil Facial Wash product from the Female Daily website. Firstly, data set goes through the Text Pre-Processing stage; then it will be clustered using two algorithms, K-Means and DBScan. After that, the results of the clustering of the two algorithms calculated for their accuracy performance and the performance results obtained. From the results of this study, it concluded that, in the review clustering of Cetaphil Facial Wash products, DBScan has 99.80% accuracy, which higher to compare with K-Means with only has 99.50% accuracy.
Berrar, D. (2019). Cross-Validation. In Encyclopedia of Bioinformatics and Computational Biology (pp. 542–545). Elsevier. doi:10.1016/B978-0-12-809633-8.20349-X
bin Waheed, U., Al-Zahrani, S., & Hanafy, S. M. (2019). Machine learning algorithms for automatic velocity picking: K-means vs. DBSCAN. SEG Technical Program Expanded Abstracts, 5110–5114. doi:10.1190/SEGAM2019-3215809.1
Cataltas, M., Dogramaci, S., Yumusak, S., & Oztoprak, K. (2020). Extraction of Product Defects and Opinions from Customer Reviews by Using Text Clustering and Sentiment Analysis. Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, 4529–4534. doi:10.1109/BigData50022.2020.9377851
Chandrayan, S., & Bamne, P. (2021). A brief survey of Text Mining and its applications. International Journal of Emerging Trends in Engineering Research, 9(8), 1190–1195. doi:10.30534/ijeter/2021/26982021
Dewi, C., Siam, E. P., Wijayanti, G. A., Putri, M., Aulia, N., & Nooraeni, R. (2021). Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020, 17.
Jayasekara, P. K., & K.S., A. (2018). Text Mining of Highly Cited Publications in Data Mining. In 2018 5th International Symposium on Emerging Trends and Technologies in Libraries and Information Services (ETTLIS) (pp. 128–130). IEEE. doi:10.1109/ETTLIS.2018.8485261
Jo, T. (2019). Text Mining (Vol. 45). Cham: Springer International Publishing. doi:10.1007/978-3-319-91815-0
Kästel, A. M., & Vestergaard, C. (2019). Comparing performance of K-Means and DBSCAN on customer support queries.
Kłopotek, M. A. (2020). An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering. SN Computer Science, 1(2), 1–38. doi:10.1007/S42979-020-0079-8/TABLES/15
Lakshmanaprabu, S. K., Shankar, K., Gupta, D., Khanna, A., Rodrigues, J. J. P. C., Pinheiro, P. R., & de Albuquerque, V. H. C. (2018). Ranking analysis for online customer reviews of products using opinion mining with clustering. Complexity, 2018. doi:10.1155/2018/3569351
Rehman, S. U., Asghar, S., Fong, S., & Sarasvady, S. (2014). DBSCAN: Past, present and future. In The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014) (pp. 232–238). IEEE. doi:10.1109/ICADIWT.2014.6814687
Rodriguez, M. Z., Comin, C. H., Casanova, D., Bruno, O. M., Amancio, D. R., Costa, L. da F., & Rodrigues, F. A. (2019). Clustering algorithms: A comparative approach. PLOS ONE, 14(1), e0210236. doi:10.1371/journal.pone.0210236
Shiri, A. (2004). Introduction to Modern Information Retrieval (2nd edition). Library Review, 53(9), 462–463. doi:10.1108/00242530410565256
Wei, Y., Lao, Y., Sato, Y., & Han, D. (2019). Product-review classification combining multiple clustering algorithms. ACM International Conference Proceeding Series, 133–136. doi:10.1145/3338188.3338211
Witten, I. H. (2004). Text mining. The Practical Handbook of Internet Computing, 14-1-14–22. doi:10.1201/9780203507223
Yan, N., Wu, B., Chang, S., -, al, Pamuji, G. C., & Rongtao, H. (2020). A Comparison study of DBScan and K-Means Clustering in Jakarta rainfall based on the Tropical Rainfall Measuring Mission (TRMM) 1998-2007. IOP Conference Series: Materials Science and Engineering, 879(1), 012057. doi:10.1088/1757-899X/879/1/012057
How to Cite
Copyright (c) 2022 Fitri Andriyani, Yan Puspitarani
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.