Article type
Abstract
Background: Predicting the prognosis of patients with acute myocardial infarction (AMI) and diabetes mellitus (DM) is crucial because of the high in-hospital mortality rate, which poses a significant challenge to physicians' treatment decisions.Consequently, this study develops and validates an in-hospital mortality risk prediction model for patients with AMI and DM using an interpretable machine learning (ML) approach.
Methods: The data used in this study were sourced from the Medical Information Database (MIMIC-IV, v.2.2). The least absolute shrinkage and selection operator (LASSO) regression was used to screen predictors, and the dataset was divided into a training and validation set in a 75%:25% ratio. Models were constructed using the training set and validated on the validation set. Predictive models were developed using seven ML algorithms, and the optimal algorithm was selected by comparing the accuracy and AUC values of these algorithms on the training and validation sets. Furthermore, the predictive results of the models were interpreted using two ML interpretable methods, SHapley additive explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME).
Results: This study included 2,828 patients with AMI and DM, with a median age of 71.4 years (interquartile range 63.7–79.5 years), 64.5% men, and 35.5% women. LASSO regression screened 25 of the 73 clinical indicators, revealing their strong association with in-hospital mortality. Among the seven algorithms, the best performance on the training set (AUC: 1.000, Accuracy: 1.000000) and validation set (AUC: 0.905, Accuracy: Accuracy = 0.906648) was exhibited by the Extreme Gradient Boosting (XGBoost) model. In review Feature importance analysis indicated that the top six significant influences on the XGBoost model were intensive care unit length of stay, hospital length of stay, multiple scoring systems (e.g., APSIII, GCS, Charlson), and mean respiratory rate (resp_rate_mean).
Conclusion: This study confirms the potential of predicting the risk of in-hospital death in patients with AMI using ML methods combined with DM, and it demonstrates that the SHAP and LIME approaches effectively improve the interpretability of the models.
Methods: The data used in this study were sourced from the Medical Information Database (MIMIC-IV, v.2.2). The least absolute shrinkage and selection operator (LASSO) regression was used to screen predictors, and the dataset was divided into a training and validation set in a 75%:25% ratio. Models were constructed using the training set and validated on the validation set. Predictive models were developed using seven ML algorithms, and the optimal algorithm was selected by comparing the accuracy and AUC values of these algorithms on the training and validation sets. Furthermore, the predictive results of the models were interpreted using two ML interpretable methods, SHapley additive explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME).
Results: This study included 2,828 patients with AMI and DM, with a median age of 71.4 years (interquartile range 63.7–79.5 years), 64.5% men, and 35.5% women. LASSO regression screened 25 of the 73 clinical indicators, revealing their strong association with in-hospital mortality. Among the seven algorithms, the best performance on the training set (AUC: 1.000, Accuracy: 1.000000) and validation set (AUC: 0.905, Accuracy: Accuracy = 0.906648) was exhibited by the Extreme Gradient Boosting (XGBoost) model. In review Feature importance analysis indicated that the top six significant influences on the XGBoost model were intensive care unit length of stay, hospital length of stay, multiple scoring systems (e.g., APSIII, GCS, Charlson), and mean respiratory rate (resp_rate_mean).
Conclusion: This study confirms the potential of predicting the risk of in-hospital death in patients with AMI using ML methods combined with DM, and it demonstrates that the SHAP and LIME approaches effectively improve the interpretability of the models.