Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning
DOI:
10.33395/sinkron.v8i2.13530Keywords:
Detection Disease, Light Gradient Boosting, GridSearchCV, Diabetes, RandomSearchCV, SMOTE, Hyperparameter TuningAbstract
Diabetes is a condition caused by an imbalance between the need for insulin in the body and insufficient insulin production by the pancreas, causing an increase in blood sugar concentration. This study aims to find the best classification performance on diabetes datasets with the LightGBM method. The dataset used consists of 768 rows and 9 columns, with target values of 0 and 1. In this study, resampling is applied to overcome data imbalance using SMOTE and perform hyperparameter optimization. Model evaluation is performed using confusion matrix and various metrics such as accuracy, recall, precision and f1-score. This research conducted several tests. In hyperparameter optimization tests using GridSearchCV and RandomSearchCV, the LightGBM method showed good performance. In tests that apply data resampling, the LightGBM method achieves the highest accuracy, namely the LightGBM method with GridSearchCV optimization with the highest accuracy reaching 84%, while LightGBM with RandomSearchCV optimization reaches 82% accuracy.
Downloads
References
Afandi, M. R., & Marpaung, F. R. (2019). Correlation Between Apoprotein B/Apoprotein a-I Ratio With Homa Ir Value (Homeostatic Model Assesment Insulin Resistance) in Type 2 Diabetes Mellitus. Journal of Vocational Health Studies, 3(2), 78. https://doi.org/10.20473/jvhs.v3.i2.2019.78-82
Alya Azzahra Utomo, Andira Aulia R, Sayyidah Rahmah, R. A. (2020). FAKTOR RISIKO DIABETES MELLITUS TIPE 2: A SYSTEMATIC REVIEW. AN-Nur: Jurnal Kajian Dan Pengembangan Kesehatan Masyarakat, 1(1), 44–52. https://doi.org/10.31101/jkk.395
Anggrawan, A., & Mayadi, M. (2023). Application of KNN Machine Learning and Fuzzy C-Means to Diagnose Diabetes. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 22(2), 405–418. https://doi.org/10.30812/matrik.v22i2.2777
Chen, T., Xu, J., Ying, H., Chen, X., Feng, R., Fang, X., Gao, H., & Wu, J. (2019). Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine. IEEE Access, 7, 150960–150968. https://doi.org/10.1109/ACCESS.2019.2946980
Erlin, Yulvia Nora Marlim, Junadhi, Laili Suryati, & Nova Agustina. (2022). Deteksi Dini Penyakit Diabetes Menggunakan Machine Learning dengan Algoritma Logistic Regression. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 11(2), 88–96. https://doi.org/10.22146/jnteti.v11i2.3586
Fauzi, A., & Yunial, A. H. (2022). JEPIN (Jurnal Edukasi dan Penelitian Informatika) Optimasi Algoritma Klasifikasi Naive Bayes, Decision Tree, K-Nearest Neighbor, dan Random Forest menggunakan Algoritma Particle Swarm Optimization pada Diabetes Dataset. (JEPIN) Jurnal Edukasi Dan Penelitian Informatika, 8(3), 470–481.
Febriantoro, E., Setyati, E., & Santoso, J. (2023). PEMODELAN PREDIKSI KUANTITAS PENJUALAN MAINAN MENGGUNAKAN LightGBM. SMARTICS Journal, 9(1), 7–13. https://ejournal.unikama.ac.id/index.php/jst/article/view/8279
Gde Agung Brahmana Suryanegara, Adiwijaya, M. D. P. (2021). Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 114–122. https://doi.org/10.29207/resti.v5i1.2880
Handayani, K., & Erni, E. (2023). Penerapan Light Gradient Boosting Dalam Prediksi Rasio Klik Tayang. JATI (Jurnal Mahasiswa Teknik Informatika), 7(1), 13–18. https://doi.org/10.36040/jati.v7i1.6010
Hardianto, D. (2021a). Insulin: Produksi, Jenis, Analisis, dan Rute Pemberian. Bioteknologi Dan Biosains Indonesia, 8(2), 321–331. http://ejurnal.bppt.go.id/index.php/JBBI
Hardianto, D. (2021b). Telaah Komprehensif Diabetes Melitus: Klasifikasi, Gejala, Diagnosis, Pencegahan, Dan Pengobatan. Jurnal Bioteknologi & Biosains Indonesia (JBBI), 7(2), 304–317. https://doi.org/10.29122/jbbi.v7i2.4209
Hartanto, A. D., Nur Kholik, Y., & Pristyanto, Y. (2023). Stock Price Time Series Data Forecasting Using the Light Gradient Boosting Machine (LightGBM) Model. JOIV : International Journal on Informatics Visualization, 7(4), 2270–2279. https://doi.org/10.30630/joiv.7.4.1740
Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., & Rehman, M. U. (2019). A model combining convolutional neural network and lightgbm algorithm for ultra-short-term wind power forecasting. IEEE Access, 7, 28309–28318. https://doi.org/10.1109/ACCESS.2019.2901920
Kohli, S., & Joshi, P. (2021). “ A Brief Study on Random Forest Using Python .” 3(6), 2063–2069. https://doi.org/10.35629/5252-030620632069
Kurniadi, F. I., & Larasati, P. D. (2022). Light Gradient Boosting Machine untuk Deteksi Penyakit Stroke. Jurnal SISKOM-KB (Sistem Komputer Dan Kecerdasan Buatan), 6(1), 67–72. https://doi.org/10.47970/siskom-kb.v6i1.328
Lee, G. Y., Alzamil, L., Doskenov, B., & Termehchy, A. (2021). A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance. 1–6. http://arxiv.org/abs/2109.07127
Li, L., Lin, Y., Yu, D., Liu, Z., Gao, Y., & Qiao, J. (2021). A Multi-Organ Fusion and LightGBM Based Radiomics Algorithm for High-Risk Esophageal Varices Prediction in Cirrhotic Patients. IEEE Access, 9, 15041–15052. https://doi.org/10.1109/ACCESS.2021.3052776
Liang, D., Jin, X., Yuan, Y., & Zou, R. (2023). Performance Analysis of Machine Learning Methods. Journal of Physics: Conference Series, 2428(1), 481–490. https://doi.org/10.1088/1742-6596/2428/1/012039
Marlim, Y. N., Suryati, L., & Agustina, N. (2022). Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm. 11(2), 88–96.
Maulidah, N., Supriyadi, R., Utami, D. Y., Hasan, F. N., Fauzi, A., & Christian, A. (2021). Prediksi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine dan Naive Bayes. Indonesian Journal on Software Engineering (IJSE), 7(1), 63–68. https://doi.org/10.31294/ijse.v7i1.10279
Moh. Badris Sholeh Rahmatullah, Aulia Ligar Salma Hanani, Akmal Muhammad Naim, Zamah Sari, & Yufis Azhar. (2022). Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(6), 923–929. https://doi.org/10.29207/resti.v6i6.4213
Nikmatun, I. A., & Waspada, I. (2019). Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor. Jurnal SIMETRIS, 10(2), 421–432.
Nugraha, W. (2021). Prediksi Penyakit Jantung Cardiovascular Menggunakan Model Algoritma Klasifikasi. Jurnal Managemen Dan Informatika, 9(2), 3–8.
Pneumonia, F., Mortality, T., Comparative, U., & Perceptron, M. (2022). Jurnal resti. 5(158), 528–537.
Purbolaksono, M. D., Irvan Tantowi, M., Imam Hidayat, A., & Adiwijaya, A. (2021). Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(2), 393–399. https://doi.org/10.29207/resti.v5i2.3008
Putri, T. A. E., Widiharih, T., & Santoso, R. (2023). Penerapan Tuning Hyperparameter Randomsearchcv Pada Adaptive Boosting Untuk Prediksi Kelangsungan Hidup Pasien Gagal Jantung. Jurnal Gaussian, 11(3), 397–406. https://doi.org/10.14710/j.gauss.11.3.397-406
Rachmadi, R. R., Sudarsono, A., & Santoso, B. (2021). Implementasi Metode LightGBM Untuk Klasifikasi Kondisi Abnormal Pada Pengemudi Sepeda Motor Berbasis Sensor Smartphone. Jurnal Komputer Terapan, 7(2), 218–227.
Rajagede, R. A. (2021). Improving Automatic Essay Scoring for Indonesian Language using Simpler Model and Richer Feature. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4, 11–18. https://doi.org/10.22219/kinetik.v6i1.1196
Ramadhan, N. G. (2021). Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus. Scientific Journal of Informatics, 8(2), 276–282. https://doi.org/10.15294/sji.v8i2.32484
Silalahi, L. (2019). Hubungan Pengetahuan dan Tindakan Pencegahan Diabetes Mellitus Tipe 2. Jurnal PROMKES, 7(2), 223. https://doi.org/10.20473/jpk.v7.i2.2019.223-232
Tanoey, J., & Becher, H. (2021). Diabetes prevalence and risk factors of early-onset adult diabetes: results from the Indonesian family life survey. Global Health Action, 14(1). https://doi.org/10.1080/16549716.2021.2001144
Wang, Y., & Wang, T. (2020). Application of improved LightGBM model in blood glucose prediction. Applied Sciences (Switzerland), 10(9). https://doi.org/10.3390/app10093227
Wardani, B. S., Sa, S., & Nurjanah, D. (2023). Measuring and Mitigating Bias in Bank Customers Data with XGBoost , LightGBM , and Random Forest Algorithm. 9(1), 142–155. https://doi.org/10.26555/jiteki.v9i1.25768
Wardhani, K. D. K., & Akbar, M. (2022). Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost). Jurnal Online Informatika, 7(2), 244–250. https://doi.org/10.15575/join.v7i2.970
Wijayanti, R. A., Furqon, M. T., & Adinugroho, S. (2018). Penerapan Algoritme Support Vector Machine Terhadap Klasifikasi Tingkat Risiko Pasien Gagal Ginjal. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(10), 3500–3507. http://j-ptiik.ub.ac.id
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2024 Elisa Ramadanti, Devi Aprilya Dinathi, christianskaditya, Didih Rizki Chandranegara
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.