Abstract
Heart transplantation is a life-saving procedure for patients with end-stage heart failure. The United Network for Organ Sharing (UNOS), which administers the US organ allocation system, substantially expanded the number of clinical and demographic variables collected in its database in 2004. This study examines whether these newly added variables improve the ability to predict survival outcomes for patients on the heart transplant waiting list. An information-gain-based feature selection approach, supported by an extensive review of prior studies, was combined with survival analysis and regularized regression to identify the most influential predictors. Using the selected variables, several classification models, including tree-augmented Naïve Bayes, logistic regression, support vector machines, decision trees, and random forests, were developed. Class imbalance was addressed through random under-sampling and cost-sensitive modeling. The results show that prediction accuracy for short-term, medium-term, and long-term survival (one month, one year, and five years) does not improve substantially when the new variables are included. The findings suggest that the expanded data collection introduced in 2004 adds limited incremental value for predicting survival among patients awaiting heart transplantation.
•Evaluate how expanded data collection impacts the accuracy of survival prediction in heart transplantation.•Identify essential variables through structured feature selection procedures.•Compare several predictive models to assess the gains from incorporating additional variables.•Address data imbalance using sampling and cost-sensitive strategies.•Show that newly collected variables provide limited improvement across all prediction horizons.