Thyroid Disease Prediction Using Random Forest with KNNImputer for Missing Values

Authors

  • Raffy Nicandra Putra Pratama Information System, Dian Nuswantoro University, Semarang, Indonesia
  • Sri Winarno Information System, Dian Nuswantoro University, Semarang, Indonesia
  • Tan Nicholas Octavian Wijaya Information System, Dian Nuswantoro University, Semarang, Indonesia

DOI:

10.33395/sinkron.v9i1.14334

Keywords:

Thyroid, Classification, Random Forest, KNNImputer

Abstract

Thyroid disease is a health dysfunction that requires immediate and accurate diagnosis. This research seeks to design a classification model based on the Random Forest algorithm to detect the type of thyroid disease utilizing data from the UCI Repository. In the data processing stage, KNNImputer is used to handle missing data by calculating the average value of the nearest neighbors based on Euclidean distance, thus ensuring better data quality for model training. The developed model was evaluated utilizing the confusion matrix, which showed an accuracy of 98%, with precision, recall, and F1 score values ​​reached 98% based on weighted avg.These results corroborate that the proposed model is highly reliable in detecting various types of thyroid diseases, such as Negative, Hypothyroid, and Hyperthyroid. This research makes an important contribution to the application of data mining technology for medical diagnosis, while proving that optimal data processing through methods such as KNN Imputer can significantly improve model performance.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Apriliah, W., Kurniawan, I., Baydhowi, M., & Haryati, T. (2021). Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest. Sistemasi, 10(1), 163. https://doi.org/10.32520/stmsi.v10i1.1129

Erdiansyah, U., Irmansyah Lubis, A., & Erwansyah, K. (2022). Komparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil. Jurnal Media Informatika Budidarma, 6(1), 208. https://doi.org/10.30865/mib.v6i1.3373

Ginantra, N. L. W. S. R., Arifah, F. N., Wijaya, A. H., Septarini, R. S., Ahmad, N., Ardiana, D. P. Y., Effendy, F., Iskandar, A., Hazriani, H., Sari, I. Y., Gustiana, Z., Prianto, C., Gustian, D., & Negara, E. S. (2021). Data Mining dan Penerapan Algoritma.

Handayani, P., Nurlelah, E., Raharjo, M., & Ramdani, P. M. (2019). Liver Disease Prediction Using Decision Tree and Neural Network Methods. Computer Engineering, Science and System Journal, 4(1), 55.

Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B. M., & Gräler, B. (2018). Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 2018(8). https://doi.org/10.7717/peerj.5518

Juna, A., Umer, M., Sadiq, S., Karamti, H., Eshmawi, A. A., Mohamed, A., & Ashraf, I. (2022). Water Quality Prediction Using KNN Imputer and Multilayer Perceptron. Water (Switzerland), 14(17), 1–19. https://doi.org/10.3390/w14172592

Khonbuvi, H., & Usmonovna, S. G. (2024). Thyroid diseases. 26, 40–43.

Khosravİ, M., Yazdanshenas, M., & Nematİ, M. H. (2015). Design of an expert system for diagnosis of thyroid cancer. 36.

Primajaya, A., & Sari, B. N. (2018). Random Forest Algorithm for Prediction of Precipitation. Indonesian Journal of Artificial Intelligence and Data Mining, 1(1), 27. https://doi.org/10.24014/ijaidm.v1i1.4903

Supardianto, Lalu Mutawalli, & Wafiah Murniati. (2022). Penerapan Knnimputer Dalam Mengolah Data Missing Value Untuk Membantu Meningkatkan Akurasi Support Vector Machine Klasifikasi Penyakit Tiroid. Jurnal Informatika Teknologi Dan Sains, 4(4), 386–390. https://doi.org/10.51401/jinteks.v4i4.2077

Widianti, A., & Pratama, I. (2024). Penanganan Missing Values Dan Prediksi Data Timbunan Sampah Berbasis Machine Learning. Rabit : Jurnal Teknologi Dan Sistem Informasi Univrab, 9(2), 242–251. https://doi.org/10.36341/rabit.v9i2.4789

Yurizali, B., & Adhyka, N. (n.d.). Profil Tingkat Hormon Stimulasi Tiroid dan Kondisi Kesehatan dalam Studi Populasi Dewasa. 124–137.

Zailani, A. U., & Hanun, N. L. (2020). Penerapan Algoritma Klasifikasi Random Forest Untuk Penentuan Kelayakan Pemberian Kredit Di Koperasi Mitra Sejahtera. Infotech: Journal of Technology Information, 6(1), 7–14. https://doi.org/10.37365/jti.v6i1.61

Downloads


Crossmark Updates

How to Cite

Pratama, R. N. P. ., Winarno, S., & Wijaya, T. N. O. (2025). Thyroid Disease Prediction Using Random Forest with KNNImputer for Missing Values. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(1), 160-166. https://doi.org/10.33395/sinkron.v9i1.14334