Decision Trees in Predicting Loan Default Risk in Customer Relationships within the Financial Sector

Authors

  • Yohanni Syahra Universitas Muhammadiyah Sumatera Utara
  • Yuni Franciska Br. Tarigan Akademi Manajemen Informatika dan Komputer Polibisnis
  • Karina Andriani STMIK TRIGUNA DHARMA
  • Hevlie Winda Nazry S Universitas Muhammadiyah Sumatera Utara
  • Roziyani Setik Faculty of Communication Visual Art and Computing, Universitas Selangor, Selangor, Malaysia

DOI:

10.33395/sinkron.v9i2.14672

Keywords:

Loan Default Prediction, Credit Risk Analysis, Decision Trees, C4.5

Abstract

Loan default prediction is an important aspect of risk management in financial institutions. Accurate prediction models enable banks and lending organizations to mitigate risks, allocate resources effectively, and optimize decision-making processes. This study investigates the application of decision tree algorithms in predicting loan default risk in the financial sector. Decision trees are renowned for their interpretability, adaptability to non-linear data, and ability to handle missing values, making them a valuable tool in credit risk analysis. Using a dataset consisting of borrower profiles, credit scores, income levels, and payment history, the model identifies key predictors that influence default outcomes. The study uses the C4.5 decision tree model, which will demonstrate that decision trees achieve high prediction accuracy and offer a transparent decision-making framework, enhancing their applicability in real-world scenarios. Furthermore, the paper highlights the implications of these findings for financial institutions, emphasizing the scalability and cost-effectiveness of the model. By integrating decision tree-based models into existing risk assessment systems, lenders can proactively manage loan portfolios and reduce default rates. Future research directions are proposed to explore hybrid approaches that combine decision trees with advanced combined methods to enhance predictive capabilities. The potential of decision tree algorithms in transforming credit risk assessment and supporting more accurate data-driven financial decision-making processes

GS Cited Analysis

Downloads

Download data is not yet available.

References

Addo, D., Al-Antari, M. A., Zhou, S., Ashalley, E., Muoka, G. W., & Nartey, O. T. (2024). Enhancing Alzheimer Disease Diagnosis: Integrating Gabor Convolutional Neural Network with Conventional CNNs. 2024 2nd International Conference on Intelligent Perception and Computer Vision (CIPCV), 56(Ictlhe), 147–151. https://doi.org/10.1109/CIPCV61763.2024.00033

Ahmad, M., Al-Shayea, N. A., Tang, X. W., Jamal, A., Al-Ahmadi, H. M., & Ahmad, F. (2020). Predicting the pillar stability of underground mines with random trees and C4.5 decision trees. Applied Sciences (Switzerland), 10(18). https://doi.org/10.3390/APP10186486

Ariyaluran Habeeb, R. A., Nasaruddin, F., Gani, A., Targio Hashem, I. A., Ahmed, E., & Imran, M. (2019). Real-time big data processing for anomaly detection: A Survey. International Journal of Information Management, 45(February), 289–307. https://doi.org/10.1016/j.ijinfomgt.2018.08.006

Arram, A., Ayob, M., Albadr, M. A. A., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. http://arxiv.org/abs/2310.02956

Asah-Opoku, K., Onisarotu, A. N., Nuamah, M. A., Syurina, E., Bloemenkamp, K., Browne, J. L., & Rijken, M. J. (2023). Exploring the shared decision making process of caesarean sections at a teaching hospital in Ghana: a mixed methods study. BMC Pregnancy and Childbirth, 23(1), 1–14. https://doi.org/10.1186/s12884-023-05739-7

Babaev, D., Savchenko, M., Tuzhilin, A., & Umerenkov, D. (2019). E.T.-RNN. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2183–2190. https://doi.org/10.1145/3292500.3330693

Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal, 3(1), 100071. https://doi.org/10.1016/j.dajour.2022.100071

Bejani, M. M., & Ghatee, M. (2021). A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, 54(8), 6391–6438. https://doi.org/10.1007/s10462-021-09975-1

Bussmann, N., Giudici, P., Marinelli, D., & Papenbrock, J. (2021). Explainable Machine Learning in Credit Risk Management. Computational Economics, 57(1), 203–216. https://doi.org/10.1007/s10614-020-10042-0

Chen, P. (2021). The Application of an Improved C4.5 Decision Tree. 2021 7th Annual International Conference on Network and Information Systems for Computers (ICNISC), 10(2), 392–396. https://doi.org/10.1109/ICNISC54316.2021.00078

Damanik, I. S., Windarto, A. P., Wanto, A., Poningsih, Andani, S. R., & Saputra, W. (2019). Decision Tree Optimization in C4.5 Algorithm Using Genetic Algorithm. Journal of Physics: Conference Series, 1255(1). https://doi.org/10.1088/1742-6596/1255/1/012012

Dumitrescu, E., Hué, S., Hurlin, C., & Tokpavi, S. (2022). Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research, 297(3), 1178–1192. https://doi.org/10.1016/j.ejor.2021.06.053

Es-sabery, F., & Hair, A. (2019). A MapReduce C4.5 Decision Tree Algorithm Based on Fuzzy Rule-Based System. Fuzzy Information and Engineering, 11(4), 446–473. https://doi.org/10.1080/16168658.2020.1756099

Goh, R. Y., Lee, L. S., Seow, H. V., & Gopal, K. (2020). Hybrid harmony search-artificial intelligence models in credit scoring. Entropy, 22(9), 1–25. https://doi.org/10.3390/e22090989

Gulsoy, N., & Kulluk, S. (2019). A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. WIREs Data Mining and Knowledge Discovery, 9(3). https://doi.org/10.1002/widm.1299

Jia, W., Sun, M., Lian, J., & Hou, S. (2022). Feature dimensionality reduction: a review. Complex & Intelligent Systems, 8(3), 2663–2693. https://doi.org/10.1007/s40747-021-00637-x

Katragadda, V. (2022). Dynamic Customer Segmentation: Using Machine Learning to Identify and Address Diverse Customer Needs in Real-Time. IRE Journals, 5(10), 278–279.

Lee, C. S., Cheang, P. Y. S., & Moslehpour, M. (2022). Predictive Analytics in Business Analytics: Decision Tree. Advances in Decision Sciences, 26(1), 1–29. https://doi.org/10.47654/V26Y2022I1P1-30

Medeiros, G., Florindo, T., Talamini, E., Neto, A. F., & Ruviaro, C. (2020). Optimising tree plantation land use in brazil by analysing trade-offs between economic and environmental factors using multi-objective programming. Forests, 11(7), 1–22. https://doi.org/10.3390/F11070723

Meng, X., Zhang, P., Xu, Y., & Xie, H. (2020). Electrical Power and Energy Systems Construction of decision tree based on C4 . 5 algorithm for online voltage. Electrical Power and Energy Systems, 118(July 2019), 105793. https://doi.org/10.1016/j.ijepes.2019.105793

Mijwil, M. M., & Abttan, R. A. (2021). Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm. Asian Journal of Applied Sciences, 9(1), 45–52. https://doi.org/10.24203/ajas.v9i1.6503

Mohd Selamat, S. A., Prakoonwit, S., & Khan, W. (2020). A review of data mining in knowledge management: applications/findings for transportation of small and medium enterprises. SN Applied Sciences, 2(5). https://doi.org/10.1007/s42452-020-2589-3

Mori, T., & Uchihira, N. (2019). Balancing the trade-off between accuracy and interpretability in software defect prediction. In Empirical Software Engineering (Vol. 24, Issue 2). https://doi.org/10.1007/s10664-018-9638-1

Nasyuha, A. H., Jama, J., Abdullah, R., Syahra, Y., Azhar, Z., Hutagalung, J., & Hasugian, B. S. (2021). Frequent pattern growth algorithm for maximizing display items. Telkomnika (Telecommunication Computing Electronics and Control), 19(2), 390–396. https://doi.org/10.12928/TELKOMNIKA.v19i2.16192

Nasyuha, A. H., Zulham, Z., & Rusydi, I. (2022). Implementation of K-means algorithm in data analysis. TELKOMNIKA (Telecommunication Computing Electronics and Control), 20(2), 307. https://doi.org/10.12928/telkomnika.v20i2.21986

Niu, W., Feng, Y., Xu, S., Wilson, A., Jin, Y., Ma, Z., & Wang, Y. (2024). Revealing suicide risk of young adults based on comprehensive measurements using decision tree classification. Computers in Human Behavior, 158(9), 108272. https://doi.org/10.1016/j.chb.2024.108272

Pujianto, U., Setiawan, A. L., Rosyid, H. A., & Salah, A. M. M. (2019). Comparison of Naïve Bayes Algorithm and Decision Tree C4.5 for Hospital Readmission Diabetes Patients using HbA1c Measurement. Knowledge Engineering and Data Science, 2(2), 58. https://doi.org/10.17977/um018v2i22019p58-71

Rawal, B., & Agarwal, R. (2019). Improving Accuracy of Classification Based on C4.5 Decision Tree Algorithm Using Big Data Analytics. In IEEE Transactions on Knowledge and Data Engineering (Vol. 14, Issue 2, pp. 203–211). https://doi.org/10.1007/978-981-10-8055-5_19

Ray, P., Reddy, S. S., & Banerjee, T. (2021). Various dimension reduction techniques for high dimensional data analysis: a review. In Artificial Intelligence Review (Vol. 54, Issue 5). https://doi.org/10.1007/s10462-020-09928-0

Roy, A. G., & Urolagin, S. (2019). Credit Risk Assessment Using Decision Tree and Support Vector Machine Based Data Analytics. Advances in Science, Technology and Innovation, 79–84. https://doi.org/10.1007/978-3-030-01662-3_10

Sagi, O., & Rokach, L. (2020). Explainable decision forest: Transforming a decision forest into an interpretable tree. Information Fusion, 61, 124–138. https://doi.org/10.1016/j.inffus.2020.03.013

Samuel, Y. T., Hutapea, J. J., & Jonathan, B. (2019). Predicting the Timeliness of Student Graduation Using Decision Tree C4.5 Algorithm in Universitas Advent Indonesia. 2019 12th International Conference on Information & Communication Technology and System (ICTS), 276–280. https://doi.org/10.1109/ICTS.2019.8850948

Sofos, F., Stavrogiannis, C., Exarchou‐kouveli, K. K., Akabua, D., Charilas, G., & Karakasidis, T. E. (2022). Current Trends in Fluid Research in the Era of Artificial Intelligence: A Review. Fluids, 7(3), 1–25. https://doi.org/10.3390/fluids7030116

Vieira, C. P. R., & Digiampietri, L. A. (2020). A study about Explainable Articial Intelligence: using decision tree to explain SVM. Revista Brasileira de Computação Aplicada, 12(1), 113–121. https://doi.org/10.5335/rbca.v12i1.10247

Wang, D., Wang, X., Chen, Y., Kang, W., & Liu, Y. (2019). Experimental study on performance test of serpentine flat plate collector with different pipe parameters and a new phase change collector. Energy Procedia, 158(August 2018), 738–743. https://doi.org/10.1016/j.egypro.2019.01.197

Wang, J. (2022). Application of C4.5 Decision Tree Algorithm for Evaluating the College Music Education. Mobile Information Systems, 2022. https://doi.org/10.1155/2022/7442352

Wu, Q. (2019). MOOC learning behavior analysis and teaching intelligent decision support method based on improved decision tree C4.5 algorithm. International Journal of Emerging Technologies in Learning, 14(12), 29–41. https://doi.org/10.3991/ijet.v14i12.10810

Yağ, İ., & Altan, A. (2022). Artificial Intelligence-Based Robust Hybrid Algorithm Design and Implementation for Real-Time Detection of Plant Diseases in Agricultural Environments. Biology, 11(12). https://doi.org/10.3390/biology11121732

Yao, Z., Wang, Z., Huang, J., Xu, N., Cui, X., & Wu, T. (2024). Interpretable prediction, classification and regulation of water quality: A case study of Poyang Lake, China. Science of The Total Environment, 951(1), 175407. https://doi.org/10.1016/j.scitotenv.2024.175407

You, Y., Sun, J., Guo, Y., Tan, Y., & Jiang, J. (2022). Interpretability and accuracy trade-off in the modeling of belief rule-based systems. Knowledge-Based Systems, 236(1), 107491. https://doi.org/10.1016/j.knosys.2021.107491

Zhang, X., & Yu, L. (2024). Consumer credit risk assessment: A review from the state-of-the-art classification algorithms, data traits, and learning methods. Expert Systems with Applications, 237(2), 121484. https://doi.org/10.1016/j.eswa.2023.121484

Downloads


Crossmark Updates

How to Cite

Syahra, Y., Br. Tarigan, Y. F. ., Andriani, K. ., Nazry S, H. W. ., & Setik, R. . (2025). Decision Trees in Predicting Loan Default Risk in Customer Relationships within the Financial Sector. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(2), 734-745. https://doi.org/10.33395/sinkron.v9i2.14672