Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews

Fitri Andriyani; Yan  Puspitarani

doi:10.33395/sinkron.v7i3.11569

Authors

Fitri Andriyani Widyatama University
Yan Puspitarani Universitas Widyatama, Indonesia

DOI:

10.33395/sinkron.v7i3.11569

Keywords:

K-Means, DBScan, Text Clustering, Product Review, RapidMiner

Abstract

The purpose of this study was to compare the accuracy performance of the K-Means and DBScan algorithms in clustering product reviews. This comparison evaluated to determine which algorithm is better in terms of accuracy. The two algorithms were chosen because they have different methods of clustering, K-Means uses centroid-based while DBScan uses density-based. Text clustering results can be implemented on e-commerce platforms, marketplaces or product review platforms. This can help customers in deciding what product they will buy. One of the factors that customers have difficulty in determining what product they will buy is the number of reviews that each product has, and the difficulty in concluding the advantages of each product that will be matched their needs or desires. With text clustering, it can be easier and faster for customer to determine whether the product is worth buying or not based on the product reviews they read. The data set used in this study is a review of the Cetaphil Facial Wash product from the Female Daily website. Firstly, data set goes through the Text Pre-Processing stage; then it will be clustered using two algorithms, K-Means and DBScan. After that, the results of the clustering of the two algorithms calculated for their accuracy performance and the performance results obtained. From the results of this study, it concluded that, in the review clustering of Cetaphil Facial Wash products, DBScan has 99.80% accuracy, which higher to compare with K-Means with only has 99.50% accuracy.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Berrar, D. (2019). Cross-Validation. In Encyclopedia of Bioinformatics and Computational Biology (pp. 542–545). Elsevier. doi:10.1016/B978-0-12-809633-8.20349-X

bin Waheed, U., Al-Zahrani, S., & Hanafy, S. M. (2019). Machine learning algorithms for automatic velocity picking: K-means vs. DBSCAN. SEG Technical Program Expanded Abstracts, 5110–5114. doi:10.1190/SEGAM2019-3215809.1

Cataltas, M., Dogramaci, S., Yumusak, S., & Oztoprak, K. (2020). Extraction of Product Defects and Opinions from Customer Reviews by Using Text Clustering and Sentiment Analysis. Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, 4529–4534. doi:10.1109/BigData50022.2020.9377851

Chandrayan, S., & Bamne, P. (2021). A brief survey of Text Mining and its applications. International Journal of Emerging Trends in Engineering Research, 9(8), 1190–1195. doi:10.30534/ijeter/2021/26982021

Dewi, C., Siam, E. P., Wijayanti, G. A., Putri, M., Aulia, N., & Nooraeni, R. (2021). Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020, 17.

Jayasekara, P. K., & K.S., A. (2018). Text Mining of Highly Cited Publications in Data Mining. In 2018 5th International Symposium on Emerging Trends and Technologies in Libraries and Information Services (ETTLIS) (pp. 128–130). IEEE. doi:10.1109/ETTLIS.2018.8485261

Jo, T. (2019). Text Mining (Vol. 45). Cham: Springer International Publishing. doi:10.1007/978-3-319-91815-0

Kästel, A. M., & Vestergaard, C. (2019). Comparing performance of K-Means and DBSCAN on customer support queries.

Kłopotek, M. A. (2020). An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering. SN Computer Science, 1(2), 1–38. doi:10.1007/S42979-020-0079-8/TABLES/15

Lakshmanaprabu, S. K., Shankar, K., Gupta, D., Khanna, A., Rodrigues, J. J. P. C., Pinheiro, P. R., & de Albuquerque, V. H. C. (2018). Ranking analysis for online customer reviews of products using opinion mining with clustering. Complexity, 2018. doi:10.1155/2018/3569351

Rehman, S. U., Asghar, S., Fong, S., & Sarasvady, S. (2014). DBSCAN: Past, present and future. In The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014) (pp. 232–238). IEEE. doi:10.1109/ICADIWT.2014.6814687

Rodriguez, M. Z., Comin, C. H., Casanova, D., Bruno, O. M., Amancio, D. R., Costa, L. da F., & Rodrigues, F. A. (2019). Clustering algorithms: A comparative approach. PLOS ONE, 14(1), e0210236. doi:10.1371/journal.pone.0210236

Shiri, A. (2004). Introduction to Modern Information Retrieval (2nd edition). Library Review, 53(9), 462–463. doi:10.1108/00242530410565256

Wei, Y., Lao, Y., Sato, Y., & Han, D. (2019). Product-review classification combining multiple clustering algorithms. ACM International Conference Proceeding Series, 133–136. doi:10.1145/3338188.3338211

Witten, I. H. (2004). Text mining. The Practical Handbook of Internet Computing, 14-1-14–22. doi:10.1201/9780203507223

Yan, N., Wu, B., Chang, S., -, al, Pamuji, G. C., & Rongtao, H. (2020). A Comparison study of DBScan and K-Means Clustering in Jakarta rainfall based on the Tropical Rainfall Measuring Mission (TRMM) 1998-2007. IOP Conference Series: Materials Science and Engineering, 879(1), 012057. doi:10.1088/1757-899X/879/1/012057

	CONTACT US
	EDITORIAL BOARD
	AIMS & SCOPE
	COPYRIGHT & LICENSE
	REVIEWER
	FACEBOOK FANPAGE
	AUTHOR PROCESSING CHARGE
	OPEN ACCESS POLICY
	TEMPLATE
	PEER REVIEW PROCESS
	PUBLICATION ETHICS
	STATISTIC VIEWER
	ARCHIVING
	CROSSMARK POLICY
	FREQUENCY
	PLAGIARISM POLICY
	AUTHOR GUIDELINES
	HISTORY
	CALL REVIEWER

Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Current Issue

Make a Submission

Information

Developed By

Acceptance Rate Statistics