Decision Tree for Predicting the Mortality in Hemodialysis Patient with Diabetes

Hemodialysis patients with diabetes face a significantly higher risk of mortality compared to those without diabetes. Accurate prediction of mortality in this patient population is crucial for guiding clinical decision-making, improving patient outcomes, and optimizing resource allocation. Hemodialysis is a procedure for cleaning the blood from the waste products of the body's metabolism. this is one of modality to treat end stage kidney disease. Diabetes mellitus is a significant contributor to the global burden of chronic kidney disease (CKD), and patients with diabetes undergoing hemodialysis are at a higher risk of mortality compared to those without diabetes. Identifying factors that influence mortality risk in this population can aid in clinical decision-making and improve patient outcomes. Dialysis is performed on patients with kidney failure, both acute kidney failure and chronic kidney failure. This study is aimed to predict the mortality risk of hemodialysis patients with diabetes. The Taiwanese hemodialysis center enrolled a total of 665 hemodialysis patients. The prediction is based on Decision Tree. Compared with K-Nearest Neighbor, linear discriminant, Logistic Regression, and Ensemble, Decission Tree performed better. As for related medical variables like parathyroid surgery, urea reduction ratio, etc., they play a much smaller role in mortality risk factors than diabetes and cardiovascular disease.


INTRODUCTION
Chronic kidney disease (CKD) is a complex and prevalent condition that affects millions of individuals worldwide.Among CKD patients, those with diabetes mellitus face a considerably higher risk of mortality.Hemodialysis is a commonly utilized renal replacement therapy for patients with CKD, but identifying the factors that contribute to mortality in hemodialysis patients with diabetes remains a challenge.Accurate prediction of mortality risk can significantly impact clinical decision-making, allowing for timely interventions and improved patient outcomes(X.Liu et al. 2021).
One promising approach for mortality prediction in healthcare is the use of decision tree algorithms.Decision trees are powerful predictive models that use a hierarchical structure of binary decisions to classify data based on input variables (Borisagar, Barad, and Raval 2017).They offer interpretability, simplicity, and the ability to handle both categorical and continuous data.In the context of predicting mortality in hemodialysis patients with diabetes, decision trees can effectively identify the key risk factors and facilitate clinical decision-making (Yeh, Wu, and Tsao 2011).
This paper aims to explore the application of decision tree algorithms for predicting mortality in hemodialysis patients with diabetes.By examining relevant literature and studies, we will analyze the strengths and limitations of decision trees in this specific healthcare context.Additionally, we will evaluate the performance of decision tree models against other predictive models, such as logistic regression or neural networks, to determine their efficacy in mortality prediction (Liu et al. 2018).
The utilization of decision trees for mortality prediction in hemodialysis patients with diabetes offers several advantages.Firstly, decision trees provide a transparent and interpretable framework, allowing clinicians to understand the decision-making process and identify critical factors contributing to mortality risk (Huda and Ardi 2021).This interpretability is crucial in medical settings, where trust and understanding of the model's predictions are paramount.Furthermore, decision trees can handle missing values and outliers, making them robust in realworld clinical scenarios where data quality may vary(Ardi, Setiawan, and Bharata Adji 2019).
To achieve the objectives of this paper, we will review and analyze relevant studies that have employed decision tree algorithms for mortality prediction in hemodialysis patients with diabetes.We will explore the different input variables considered in these models, such as demographic information, laboratory values, comorbidities, and dialysis-related parameters.Moreover, we will assess the performance metrics, including accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC), to evaluate the predictive capabilities of decision tree models(Y.S. Liu et al. 2021).
In conclusion, predicting mortality in hemodialysis patients with diabetes is a critical task that can aid in clinical decision-making and improve patient care.Decision tree algorithms offer a promising approach due to their interpretability and ability to handle complex healthcare datasets.By examining the strengths and limitations of decision trees in this context, this paper aims to contribute to the existing body of knowledge and inform future research and clinical practice in mortality prediction for hemodialysis patients with diabetes.

METHODOLOGY Kidney Failure
Kidney failure, also known as renal failure, is a condition in which the kidneys are no longer able to function properly.The kidneys are a pair of organs located in the abdomen that play a vital role in filtering waste products and excess fluid from the blood, regulating electrolyte balance, and producing hormones that help regulate blood pressure and stimulate red blood cell production (Mirinejad and Inanc 2015).
There are two main types of kidney failure: acute kidney injury (AKI) and chronic kidney disease (CKD).AKI is a sudden and rapid loss of kidney function that typically occurs over a few hours or days.It can be caused by a variety of factors, such as dehydration, low blood volume, infections, medications, toxins, and kidney damage from a traumatic injury or surgery.Symptoms of AKI may include decreased urine output, swelling in the legs or feet, fatigue, confusion, nausea, and shortness of breath(X.Liu et al. 2021).Treatment for AKI may include managing the underlying cause, such as stopping a medication or treating an infection, and providing supportive care, such as dialysis, to help the kidneys recover (Panwong and Iam-On 2016).
CKD is a progressive loss of kidney function that occurs over a period of months or years.It is typically caused by underlying health conditions, such as diabetes, high blood pressure, and autoimmune disorders, that damage the kidneys over time.Symptoms of CKD may not appear until the disease is in advanced stages, and may include fatigue, itching, swelling in the legs or feet, loss of appetite, and changes in urine output or color (Drüeke et al. 2006).Treatment for CKD may include managing the underlying health conditions, such as controlling blood sugar and blood pressure, and lifestyle changes, such as a healthy diet and regular exercise.In advanced stages of CKD, dialysis or kidney transplantation may be necessary (Patil 2017).
Kidney failure can lead to a number of serious complications, such as anemia, bone disease, high blood pressure, heart disease, and nerve damage.It can also affect the body's ability to regulate electrolyte balance, which can cause imbalances in potassium, sodium, and calcium levels.In severe cases, kidney failure can be life-threatening and may require hospitalization and intensive treatment (Garcia-Montemayor et al. 2021).
Prevention of kidney failure involves managing underlying health conditions that can damage the kidneys, such as diabetes, high blood pressure, and autoimmune disorders.Adopting a healthy lifestyle, such as a balanced diet and regular exercise, can also help prevent kidney damage (Somji, Ruggajo, and Moledina 2020).It is important to avoid medications and substances that can be toxic to the kidneys, such as certain pain medications, and to have regular check-ups with a healthcare provider to monitor kidney function (Escandell-Montero et al. 2014).

Hemodialisys
Hemodialysis is a medical procedure that is used to remove waste products and excess fluid from the blood when the kidneys are no longer able to do so.It is a form of renal replacement therapy and is one of the most common treatments for end-stage renal disease (ESRD), a condition in which the kidneys have failed and can no longer function properly (Sung et al. 2014).
During hemodialysis, blood is pumped out of the patient's body through a vascular access point, such as an arteriovenous fistula or graft, or a central venous catheter.The blood is then filtered through a dialyzer, a machine that contains a semipermeable membrane that allows small molecules such as waste products and excess fluids to pass through, while retaining larger molecules such as proteins and blood cells (Escandell-Montero et al. 2014).
The dialyzer is connected to the patient's bloodstream through two needles or catheters, one to draw blood out of the body and another to return the filtered blood to the patient.The blood flows through the dialyzer in one direction, while a dialysis solution, also called dialysate, flows in the opposite direction on the other side of the membrane.The dialysis solution contains electrolytes, such as sodium and potassium, and is used to maintain the proper balance of these substances in the patient's blood (Lacson 2008).
During the hemodialysis session, the patient's blood is continuously filtered through the dialyzer for several hours, usually three times a week, with each session lasting between 3 and 5 hours, depending on the patient's condition.The duration of the session can vary depending on the amount of waste producs and excess fluids that need to be removed from the patient's blood (Kim et al. 2021).
Hemodialysis is a complex procedure that requires specialized equipment and trained healthcare professionals, such as nurses and technicians.The process requires careful monitoring of the patient's vital signs, such as blood pressure, heart rate, and oxygen levels, to ensure that the procedure is safe and effective(M.Kumar 2016).
In addition to removing waste products and excess fluids from the blood, hemodialysis can also be used to treat certain medical conditions, such as acute kidney injury, drug overdose, and certain types of poisoning.However, hemodialysis is not a cure for kidney disease and does not restore the kidneys to normal function.It is a lifelong treatment for ESRD and requires ongoing management and monitoring by a healthcare team(S.Ramya 2016).

Decision Tree Algorithm
The decision tree algorithm is a machine learning technique used for classification and regression tasks.4. Recursion: Repeat the splitting process recursively on each branch, considering the remaining attributes and subset of instances at each node.This process continues until a stopping condition is met, such as reaching a predefined depth limit, having a minimum number of instances in a node, or achieving a pure subset (all instances have the same label).5. Leaf Node Assignment: Assign a class or outcome label to each leaf node.For classification tasks, the label can be the majority class in the corresponding subset.For regression tasks, the label can be the mean or median value of the target variable in the subset.6. Pruning (optional): After building the initial decision tree, pruning can be applied to reduce overfitting.Pruning involves removing unnecessary branches or nodes from the tree based on performance metrics, such as validation accuracy or error rates.7. Prediction: To make predictions for new instances, follow the decision path from the root node to a leaf node based on the attribute values of the instance.The predicted outcome is the label assigned to the leaf node.8.The decision tree algorithm offers several advantages.It is easily interpretable and provides insights into the decision-making process.Decision trees can handle both categorical and continuous features, as well as missing data.They are also relatively fast in terms of training and prediction.
However, decision trees are prone to overfitting and can be sensitive to small changes in the data.They may not always generalize well to unseen instances.To address these limitations, techniques like pruning, setting depth limits, or using ensemble methods like random forests or gradient boosting can be employed.Overall, the decision tree algorithm is widely used in various domains due to its simplicity, nterpretability, and ability to handle different types of data (Deshwal, Sangwan, and Kumar 2020).

Prediction Results of Dataset A
Based on the results of Decision Tree (DT) testing, it was observed that Dataset A had a correct rate (acc) training of 0.833 for a period of 334.51 msec.In comparison with KNN, Logistic Regression, and linear discriminant (LD), the Decision Tress execution time is in top three just below KNN.A linear discriminant and ensemble logic algorithm achieved a short training time of 318.69 msec, and a long training time of 695.13 msec for dataset A respectively.Dataset A is listed in Table 1 along with the complete training comparison results.An analysis of the data using Decision Tree yielded a correct rate of 0.833.The Decision Three Has average performance than Linear Discriminant, Logistic Regression, Ensemble (En), and KNN.In Table 1, we present the detail predictions based on dataset A. SVM presents the highest accuracy in predicting mortality for Dataset A, 0.939, followed by Linear Discriminant 0.924, Ensemble and KKN 0.894, Logistic Regression 0.879, and Decision Tree 0.833.

Table 1. Prediction Result Comparison of Dataset A
Table 1 shows that the highest correct rate of prediction for Dataset A is 0,960 for Logistic Regression, followed by 0.955 for Linear Discrimination, 0.924 for Decision Tree, 0.894 for Ensemble, and 0.893 for KKN.

Prediction Results of Dataset B
Of the total 354 patients Dataset B, Decision Tree successfully predicted with precision reaching 0.9270 and up to 0.944 respectively for training and test.Decision Tree has the best performance compared to other methods.2, it is evident that the Decision Tree classifier shows the highest correct rate in predicting the mortality of hemodialysis patients with diabetes, i.e. at the rate of 0.944, followed by Linear Discriminant of 0.901, Logistic Regression and Ensemble of 0.893, and the last one is KKN of 0.880.

Prediction Results of Dataset C
After training on 170 hemodialysis patients in Dataset C, it was found that Decision Tree yielded accuracy of 0.771 achieved for 819.06 msec.Logistic Regression gives accuracy 0.818.KNN and Ensemble each provide accuracy of 0.765, and 0.641, respectively.Figure 5 and figure 6 shows the training result of Dataset C, whereas Table 3 shows the performances comparison results of Dataset C.

CONCLUSION
In this paper, we explored the application of decision tree algorithms for predicting mortality in hemodialysis patients with diabetes.The use of decision trees in healthcare has gained significant attention due to their interpretability and ability to handle complex datasets.Our analysis revealed that decision trees offer a transparent and interpretable framework for mortality prediction, enabling clinicians to understand the factors influencing patient outcomes.The hierarchical structure of decision trees allows for the identification of important risk factors and their interactions, leading to actionable insights for clinical decision-making.Moreover, decision trees can handle both categorical and continuous variables, making them suitable for integrating diverse patient characteristics and biomarkers.However, it is important to acknowledge the limitations of decision trees.Overfitting remains a concern, as decision trees can capture noise or irrelevant patterns in the data.Pruning techniques and ensemble methods, such as random forests, can be employed to mitigate overfitting and enhance generalization.Additionally, the interpretability of decision trees may come at the expense of predictive power compared to more complex models like neural networks.Overall, decision trees provide a valuable tool in the realm of predictive medicine, offering insights into mortality risk assessment and contributing to personalized patient care for hemodialysis patients with diabetes.

REFERENCE
It builds a tree-like model of decisions and their possible consequences by recursively splitting the dataset based on different features or attributes(Potharaju and Sreedevi 2016)(Ardi and Isnayanti 2020).The decision tree algorithm works, with the following steps: 1.Data Preparation: Start with a labeled dataset, where each instance has a set of input features and corresponding target labels or outcomes.2. Attribute Selection: Choose the attribute that best separates the data based on a certain criterion.Common criteria include Gini impurity and information gain.The attribute with the highest impurity reduction or information gain is selected for splitting.3. Splitting: Split the dataset based on the chosen attribute into subsets or branches.Each branch represents a unique value or category of the attribute.Instances are allocated to the appropriate branch based on their attribute values.Jurnal Minfo Polgan Volume 12, Nomor 1, Maret 2023 DOI : https://doi.org/10.33395/jmp.v12i1.12412e-ISSN : 2797-3298 p-ISSN : 2089-9424

Fig 1 .
Fig 1. Training comparison of Dataset A -Accuracy Fig 2. Training comparison of Dataset Atime

Fig 3 .
Fig 3. Training comparison of Dataset B -Accuracy

Fig 5 .
Fig 5. Training comparison of Dataset C -Accurac Fig 6.Training comparison of Dataset C -Time

Table 2 .
Prediction Result Comparison of Dataset B Using Dataset B, as shown in Table