Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning

Authors

  • Elisa Ramadanti Universitas Muhammadiyah Malang
  • Devi Aprilya Dinathi Universitas Muhammadiyah Malang
  • christianskaditya Universitas Muhammadiyah Malang
  • Didih Rizki Chandranegara Universitas Muhammadiyah Malang

DOI:

10.33395/sinkron.v8i2.13530

Keywords:

Detection Disease, Light Gradient Boosting, GridSearchCV, Diabetes, RandomSearchCV, SMOTE, Hyperparameter Tuning

Abstract

Diabetes is a condition caused by an imbalance between the need for insulin in the body and insufficient insulin production by the pancreas, causing an increase in blood sugar concentration. This study aims to find the best classification performance on diabetes datasets with the LightGBM method. The dataset used consists of 768 rows and 9 columns, with target values of 0 and 1. In this study, resampling is applied to overcome data imbalance using SMOTE and perform hyperparameter optimization. Model evaluation is performed using confusion matrix and various metrics such as accuracy, recall, precision and f1-score. This research conducted several tests. In hyperparameter optimization tests using GridSearchCV and RandomSearchCV, the LightGBM method showed good performance. In tests that apply data resampling, the LightGBM method achieves the highest accuracy, namely the LightGBM method with GridSearchCV optimization with the highest accuracy reaching 84%, while LightGBM with RandomSearchCV optimization reaches 82% accuracy.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Afandi, M. R., & Marpaung, F. R. (2019). Correlation Between Apoprotein B/Apoprotein a-I Ratio With Homa Ir Value (Homeostatic Model Assesment Insulin Resistance) in Type 2 Diabetes Mellitus. Journal of Vocational Health Studies, 3(2), 78. https://doi.org/10.20473/jvhs.v3.i2.2019.78-82

Alya Azzahra Utomo, Andira Aulia R, Sayyidah Rahmah, R. A. (2020). FAKTOR RISIKO DIABETES MELLITUS TIPE 2: A SYSTEMATIC REVIEW. AN-Nur: Jurnal Kajian Dan Pengembangan Kesehatan Masyarakat, 1(1), 44–52. https://doi.org/10.31101/jkk.395

Anggrawan, A., & Mayadi, M. (2023). Application of KNN Machine Learning and Fuzzy C-Means to Diagnose Diabetes. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 22(2), 405–418. https://doi.org/10.30812/matrik.v22i2.2777

Chen, T., Xu, J., Ying, H., Chen, X., Feng, R., Fang, X., Gao, H., & Wu, J. (2019). Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine. IEEE Access, 7, 150960–150968. https://doi.org/10.1109/ACCESS.2019.2946980

Erlin, Yulvia Nora Marlim, Junadhi, Laili Suryati, & Nova Agustina. (2022). Deteksi Dini Penyakit Diabetes Menggunakan Machine Learning dengan Algoritma Logistic Regression. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 11(2), 88–96. https://doi.org/10.22146/jnteti.v11i2.3586

Fauzi, A., & Yunial, A. H. (2022). JEPIN (Jurnal Edukasi dan Penelitian Informatika) Optimasi Algoritma Klasifikasi Naive Bayes, Decision Tree, K-Nearest Neighbor, dan Random Forest menggunakan Algoritma Particle Swarm Optimization pada Diabetes Dataset. (JEPIN) Jurnal Edukasi Dan Penelitian Informatika, 8(3), 470–481.

Febriantoro, E., Setyati, E., & Santoso, J. (2023). PEMODELAN PREDIKSI KUANTITAS PENJUALAN MAINAN MENGGUNAKAN LightGBM. SMARTICS Journal, 9(1), 7–13. https://ejournal.unikama.ac.id/index.php/jst/article/view/8279

Gde Agung Brahmana Suryanegara, Adiwijaya, M. D. P. (2021). Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 114–122. https://doi.org/10.29207/resti.v5i1.2880

Handayani, K., & Erni, E. (2023). Penerapan Light Gradient Boosting Dalam Prediksi Rasio Klik Tayang. JATI (Jurnal Mahasiswa Teknik Informatika), 7(1), 13–18. https://doi.org/10.36040/jati.v7i1.6010

Hardianto, D. (2021a). Insulin: Produksi, Jenis, Analisis, dan Rute Pemberian. Bioteknologi Dan Biosains Indonesia, 8(2), 321–331. http://ejurnal.bppt.go.id/index.php/JBBI

Hardianto, D. (2021b). Telaah Komprehensif Diabetes Melitus: Klasifikasi, Gejala, Diagnosis, Pencegahan, Dan Pengobatan. Jurnal Bioteknologi & Biosains Indonesia (JBBI), 7(2), 304–317. https://doi.org/10.29122/jbbi.v7i2.4209

Hartanto, A. D., Nur Kholik, Y., & Pristyanto, Y. (2023). Stock Price Time Series Data Forecasting Using the Light Gradient Boosting Machine (LightGBM) Model. JOIV : International Journal on Informatics Visualization, 7(4), 2270–2279. https://doi.org/10.30630/joiv.7.4.1740

Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., & Rehman, M. U. (2019). A model combining convolutional neural network and lightgbm algorithm for ultra-short-term wind power forecasting. IEEE Access, 7, 28309–28318. https://doi.org/10.1109/ACCESS.2019.2901920

Kohli, S., & Joshi, P. (2021). “ A Brief Study on Random Forest Using Python .” 3(6), 2063–2069. https://doi.org/10.35629/5252-030620632069

Kurniadi, F. I., & Larasati, P. D. (2022). Light Gradient Boosting Machine untuk Deteksi Penyakit Stroke. Jurnal SISKOM-KB (Sistem Komputer Dan Kecerdasan Buatan), 6(1), 67–72. https://doi.org/10.47970/siskom-kb.v6i1.328

Lee, G. Y., Alzamil, L., Doskenov, B., & Termehchy, A. (2021). A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance. 1–6. http://arxiv.org/abs/2109.07127

Li, L., Lin, Y., Yu, D., Liu, Z., Gao, Y., & Qiao, J. (2021). A Multi-Organ Fusion and LightGBM Based Radiomics Algorithm for High-Risk Esophageal Varices Prediction in Cirrhotic Patients. IEEE Access, 9, 15041–15052. https://doi.org/10.1109/ACCESS.2021.3052776

Liang, D., Jin, X., Yuan, Y., & Zou, R. (2023). Performance Analysis of Machine Learning Methods. Journal of Physics: Conference Series, 2428(1), 481–490. https://doi.org/10.1088/1742-6596/2428/1/012039

Marlim, Y. N., Suryati, L., & Agustina, N. (2022). Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm. 11(2), 88–96.

Maulidah, N., Supriyadi, R., Utami, D. Y., Hasan, F. N., Fauzi, A., & Christian, A. (2021). Prediksi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine dan Naive Bayes. Indonesian Journal on Software Engineering (IJSE), 7(1), 63–68. https://doi.org/10.31294/ijse.v7i1.10279

Moh. Badris Sholeh Rahmatullah, Aulia Ligar Salma Hanani, Akmal Muhammad Naim, Zamah Sari, & Yufis Azhar. (2022). Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(6), 923–929. https://doi.org/10.29207/resti.v6i6.4213

Nikmatun, I. A., & Waspada, I. (2019). Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor. Jurnal SIMETRIS, 10(2), 421–432.

Nugraha, W. (2021). Prediksi Penyakit Jantung Cardiovascular Menggunakan Model Algoritma Klasifikasi. Jurnal Managemen Dan Informatika, 9(2), 3–8.

Pneumonia, F., Mortality, T., Comparative, U., & Perceptron, M. (2022). Jurnal resti. 5(158), 528–537.

Purbolaksono, M. D., Irvan Tantowi, M., Imam Hidayat, A., & Adiwijaya, A. (2021). Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(2), 393–399. https://doi.org/10.29207/resti.v5i2.3008

Putri, T. A. E., Widiharih, T., & Santoso, R. (2023). Penerapan Tuning Hyperparameter Randomsearchcv Pada Adaptive Boosting Untuk Prediksi Kelangsungan Hidup Pasien Gagal Jantung. Jurnal Gaussian, 11(3), 397–406. https://doi.org/10.14710/j.gauss.11.3.397-406

Rachmadi, R. R., Sudarsono, A., & Santoso, B. (2021). Implementasi Metode LightGBM Untuk Klasifikasi Kondisi Abnormal Pada Pengemudi Sepeda Motor Berbasis Sensor Smartphone. Jurnal Komputer Terapan, 7(2), 218–227.

Rajagede, R. A. (2021). Improving Automatic Essay Scoring for Indonesian Language using Simpler Model and Richer Feature. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4, 11–18. https://doi.org/10.22219/kinetik.v6i1.1196

Ramadhan, N. G. (2021). Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus. Scientific Journal of Informatics, 8(2), 276–282. https://doi.org/10.15294/sji.v8i2.32484

Silalahi, L. (2019). Hubungan Pengetahuan dan Tindakan Pencegahan Diabetes Mellitus Tipe 2. Jurnal PROMKES, 7(2), 223. https://doi.org/10.20473/jpk.v7.i2.2019.223-232

Tanoey, J., & Becher, H. (2021). Diabetes prevalence and risk factors of early-onset adult diabetes: results from the Indonesian family life survey. Global Health Action, 14(1). https://doi.org/10.1080/16549716.2021.2001144

Wang, Y., & Wang, T. (2020). Application of improved LightGBM model in blood glucose prediction. Applied Sciences (Switzerland), 10(9). https://doi.org/10.3390/app10093227

Wardani, B. S., Sa, S., & Nurjanah, D. (2023). Measuring and Mitigating Bias in Bank Customers Data with XGBoost , LightGBM , and Random Forest Algorithm. 9(1), 142–155. https://doi.org/10.26555/jiteki.v9i1.25768

Wardhani, K. D. K., & Akbar, M. (2022). Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost). Jurnal Online Informatika, 7(2), 244–250. https://doi.org/10.15575/join.v7i2.970

Wijayanti, R. A., Furqon, M. T., & Adinugroho, S. (2018). Penerapan Algoritme Support Vector Machine Terhadap Klasifikasi Tingkat Risiko Pasien Gagal Ginjal. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(10), 3500–3507. http://j-ptiik.ub.ac.id

Downloads


Crossmark Updates

How to Cite

Ramadanti, E. ., Aprilya Dinathi, D. ., christianskaditya, & Chandranegara, D. R. . (2024). Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(2), 956-963. https://doi.org/10.33395/sinkron.v8i2.13530