Performance of Various Naïve Bayes Using GridSearch Approach In Phishing Email Dataset

Authors

  • Rizki Rahman Universitas Amikom Yogyakarta
  • Ferian Fauzi Abdulloh Universitas Amikom Yogyakarta

DOI:

10.33395/sinkron.v8i4.12958

Keywords:

Bernoulli, Gaussian, GridSearch, Naive Bayes, Phishing Email

Abstract

The background is the increasing cybersecurity threats in the form of phishing attacks that can be detrimental to individuals and organizations. The purpose of this research is to compare the performance of four Naive Bayes variants in classifying phishing emails with a method that involves a data pre-processing stage, phishing emails are collected, cleaned, and converted into appropriate numerical features. Next, the GridSearch approach was used to find the best parameters. This research objective is to understand how each Naive Bayes variant works on phishing email datasets. This phishing detection task is based on the following performance evaluation criteria such as accuracy, precision, recall, and F1-score. In this study, Bernoulli got the best accuracy of 97.34% but when the results obtained a hyperparameter, the results showed an increase with the most optimal results and the best performance is Bernoulli 97.38%. The research results are to provide an in-depth insight into the effectiveness of each variant of Naive Bayes in dealing with phishing email datasets and researchers in selecting the most suitable Naive Bayes variant for phishing detection tasks. In addition, the applied GridSearch method can guide how to find the best parameters for Naive Bayes models in other contexts. In summary, this study focuses on analyzing the performance of four variants of Naive Bayes Gaussian, Multinomial, Complement, and Bernoulli with the best algorithms Bernoulli 97.38%.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Afdhaluzzikri, A., Mawengkang, H., & Sitompul, O. S. (2022). Perfomance analysis of Naive Bayes method with data weighting. SinkrOn, 7(3), 817–821. https://doi.org/10.33395/sinkron.v7i3.11516

Al-talib, G. A., & Hassan, H. S. (2013). A Study on Analysis of SMS Classification Using TF-IDF weighting. International Journal of Computer Networks and Communications Security, 1(2013), 189–194. https://doi.org/10.47277/ijcncs/1(5)3

Amin Muftiadi. (2022). Studi kasus keamanan jaringan komputer: analisis ancaman phisingterhadap layanan online banking. Hexatech: Jurnal Ilmiah Teknik, 1(2), 60–65.

Bustomi, Y., Nugraha, A., Juliane, C., & Rahayu, S. (2023). Data Mining Selection of Prospective Government Employees with Employment Agreements using Naive Bayes Classifier. Sinkron, 8(1), 1–8. https://doi.org/10.33395/sinkron.v8i1.11968

Christanto, B., & Setiabudi, D. H. (2020). Penerapan Random Forest dalam Email Filtering untuk Mendeteksi Spam. Jurnal Infra, 8(2).

Fatkhurohman, A., & Pujastuti, E. (2019). Penerapan Algoritma Naïve Bayes Classifier Untuk Meningkatkan Keamanan Data Dari Website

Phising. Respati, 14(1), 115–124. https://doi.org/10.35842/jtir.v14i1.279

Hadi Ramadhan, I., & Kumalasari Nurnawati, E. (2022). Analisis Ancaman Phishing Dalam Layanan E-Commerce. Prosiding Snast, November, E31-41. https://doi.org/10.34151/prosidingsnast.v8i1.4169

Karunia, S. A., Saptono, R., & Anggrainingsih, R. (2017). Online News Classification Using Naive Bayes Classifier with Mutual Information for Feature Selection. Jurnal Ilmiah Teknologi Dan Informasi, 6(1), 11–15. https://jurnal.uns.ac.id/itsmart/article/view/11114

Kurniadi, D., Nuraeni, F., & Lestari, S. M. (2022). Implementasi Algoritma Naïve Bayes Menggunakan Feature Forward Selection dan SMOTE Untuk Memprediksi Ketepatan Masa Studi Mahasiswa Sarjana. Jurnal Sistem Cerdas, 5(2), 63–82. https://doi.org/10.37396/jsc.v5i2.215

Lubis, A. I., & Chandra, R. (2023). Forward Selection Attribute Reduction Technique for Optimizing Naïve Bayes Performance in Sperm Fertility Prediction. Sinkron, 8(1), 275–285. https://doi.org/10.33395/sinkron.v8i1.11967

Momole, G. M. (2022). Perbandingan Naïve Bayes dan Random Forest Dalam Klasifikasi Bahasa Daerah. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 9(2), 855–863. https://doi.org/10.35957/jatisi.v9i2.1857

Putri, N. B., & Wijayanto, A. W. (2022). Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing. Komputika : Jurnal Sistem Komputer, 11(1), 59–66. https://doi.org/10.34010/komputika.v11i1.4350

Rahmadani, P. S., Tampubolon, F. C., Jannah, A. N., Hutabarat, N. L. H., & Simarmata, A. M. (2022). Tiktok Social Media Sentiment Analysis Using the Nave Bayes Classifier Algorithm. SinkrOn, 7(3), 995–999. https://doi.org/10.33395/sinkron.v7i3.11579

Subarkah, P., & Ikhsan, A. N. (2021). Identifikasi Website Phishing Menggunakan Algoritma Classification And Regression Trees (CART). Jurnal Ilmiah Informatika, 6(2), 127–136. https://doi.org/10.35316/jimi.v6i2.1342

Suprihati, F. R. (2021). Analisis Klasifikasi SMS Spam Menggunakan Logistic Regression. Jurnal Sistem Cerdas, 4(3), 155–160. https://doi.org/10.37396/jsc.v4i3.166

Tangkelayuk, A. (2022). The Klasifikasi Kualitas Air Menggunakan Metode KNN, Naïve Bayes, dan Decision Tree. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 9(2), 1109–1119. https://doi.org/10.35957/jatisi.v9i2.2048

Tanjung, J. P., Tampubolon, F. C., Panggabean, A. W., & Nandrawan, M. A. A. (2023). Customer Classification Using Naive Bayes Classifier With Genetic Algorithm Feature Selection. Sinkron, 8(1), 584–589. https://doi.org/10.33395/sinkron.v8i1.12182

Downloads


Crossmark Updates

How to Cite

Rahman, R. ., & Fauzi Abdulloh, F. . (2023). Performance of Various Naïve Bayes Using GridSearch Approach In Phishing Email Dataset. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(4), 2336-2344. https://doi.org/10.33395/sinkron.v8i4.12958