Clustering Analysis of Tweets About COVID-19 Using the K-Means Algorithm

Authors

  • Andi STMIK TIME, Indonesia
  • Carles Juliandy STMIK TIME, Indonesia
  • David STMIK TIME, Indonesia

DOI:

10.33395/sinkron.v8i1.12145

Keywords:

Clustering Analysis, Twitter, COVID-19, Elbow Method, K-Means

Abstract

One of the trending topics in 2020 to 2022 is tweets about Coronavirus Disease 2019 (COVID-19). A large number of tweets regarding COVID-19 that have appeared have been mixed and not grouped properly, making it difficult for Twitter users to read and sort them based on the information they want. One solution that can be applied to overcome the problems described is through clustering of tweets information about COVID-19. In this study, researchers used quantitative research with the K-Means method, which is one of the clustering methods used in grouping data. The data used in this study is a dataset taken from Kaggle, namely Omicron-Covid-19 Variant Tweets, and also taken through a scraping process with Bright Data with a total of 4,103 datasets. The results showed that determining the best cluster using the Elbow method on the dataset produced empirical evidence that the best cluster was k = 5. The results of grouping tweets regarding COVID-19 using the K-Means Clustering method with k = 5 resulted in the largest number of cluster members being cluster 4 with 1,185 tweets, the second largest was cluster 1 with 1,047 tweets, the third largest was cluster 2 with 757 tweets, the fourth largest was cluster 3 as many as 744 tweets, and the smallest number of cluster members is cluster 5 as many as 370 tweets.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Blidex, & Wibowo, J. S. (2021). Analisis Sentimen Klasifikasi Tweet Vaksin COVID 19 Dengan Naive Bayes. Jurnal Mahajana Informasi, VI(2), 103-110.

Dihni, V. A., & Bayu, D. J. (2021). Inilah 10 Negara dengan Pengguna Twitter Terbanyak, Ada Indonesia? Retrieved November 4, 2021, from https://databoks.katadata.co.id/datapublish/2021/11/04/inilah-10-negara-dengan-pengguna-twitter-terbanyak-ada-indonesia

Akbar, M. N., Darmatasia, Mustikasari, & Syahwal, M. (2021). Analisis Clustering Tanggapan Masyarakat di Twitter Terhadap Pembatasan Sosial Berskala Besar Menggunakan Algoritma K-Means. Jurnal Information System and Processing (INSYPRO), VI(1), 1-9.

Andi, Juliandy, C., Robet, & Pribadi, O. (2022). Securing Medical Records of COVID-19 Patients Using Elliptic Curve Digital Signature Algorithm (ECDSA) in Blockchain. Commit (Communication and Technology Information) Journal, XVI(1), 87-96.

Bagaskoro, G. N., Fauzi, M. A., & Adikara, P. P. (2018). Penerapan Klasifikasi Tweets Pada Berita Twitter Menggunakan Metode K-Nearest Neighbor Dan Query Expansion Berbasis Distributional Semantic. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, II(1), 3849-3855.

Juditha, C. (2015). Fenomena Trending Topic di Twitter: Analisis Wacana Twit #SAVEHAJIULUNG. Jurnal Penelitian Komunikasi dan Pembangunan, XVI(II), 138-154.

Tukiyat, & Djohan, Y. (2022). Analisis Penyebaran Pandemi Covid-19 di Kota Jakarta Menggunakan Metode Clustering K-Means dan Density Based Spatial Clustering of Application With Noise. JURNAL INFORMATIKA, IX(1), 43-54.

Pramudita, Y. D., Putro, S. S., & Makhmud, N. (2018). Klasifikasi Berita Olahraga Menggunakan Metode Naive Bayes Dengan Enchanted Confix Stripping Stemmer. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), V(3), 269-276.

Nurhafida, S. I., & Sembiring, F. (2021). Analisis Text Clustering Masyarakat di Twitter Mengenai McDonald'sxBTS Menggunakan Orange Data Mining. Seminar Nasional Sistem Informasi dan Manajemen Informatika (pp. 28-35). Sukabumi: Universitas Nusa Putra.

Bedford, J., Enria, D., Giesecke, J., Heymann, D. L., Ihekweazu, C., Kobinger, G., . . . Wieler, L. H. (2020). COVID-19: towards controlling of a pandemic. Lancet, 1015-1018.

Aditia, A. (2021). COVID-19: Epidiemologi, Virologi, Penularan, Gejala Klinis, Diagnosis, Tatalaksana, Faktor Risiko dan Pencegahan. Jurnal Penelitian Perawat Profesional, III(4), 653-660.

Purba, N., Poningsih, & Tambunan, H. S. (2021). Penerapan Algoritma K-Means Clustering Pada Penyebaran Penyakit Infeksi Saluran Pernapasan Akut (ISPA) di Provinsi Riau. Journal of Information System Research (JOSH), II(3), 220-226.

Rachman, D. A., Goejantoro, R., & Amijaya, F. D. (2020). Implementasi Text Mining Pengelompokkan Dokumen Skripsi Menggunakan Metode K-Means Clustering. Jurnal EKSPONENSIAL, XI(2), 167-174.

Sugiyamto, Surarso, B., & Sugiharto, A. (2021). Analisa Performa Metode Cosine dan Jacard Pada Pengujian Kesamaan Dokumen. Jurnal Masyarakat Informatika, V(10), 1-8.

Sari, R. Y., Oktavianto, H., & Sulistyo, H. W. (2022). Algoritma K-Means Dengan Metode Elbow Untuk Mengelompokkan Kabupaten/Kota di Jawa Tengah Berdasarkan Komponen Pembentuk Indeks Pembangunan Manusia. Jurnal Smart Teknologi, III(2), 104-108.

Sulastri, & Diartono, D. A. (2019). Analisa Jejaring Sosial Twitter Menggunakan Klastering K-Means dan Hirarki Agglomeratif. Prosiding SENDI_U. Semarang.

Astari, N. M., Divayana, D. G., & Indrawan, G. (2020). Analisis Sentimen Dokumen Twitter Mengenai Dampak Virus Corona Menggunakan Metode Naive Bayes Classifier. Jurnal Sistem dan Informatika (JSI), XV(1), 22-29.

Sugiyono. (2020). Metode Penelitian Kuantitatif Kualitatif dan R&D. Bandung: CV. Alfabeta.

Downloads


Crossmark Updates

How to Cite

Andi, A., Juliandy, C., & David, D. (2023). Clustering Analysis of Tweets About COVID-19 Using the K-Means Algorithm. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(1), 543-533. https://doi.org/10.33395/sinkron.v8i1.12145