Dataset Open Access

Development and validation of a machine learning model for use as an automated artificial intelligence tool to predict mortality risk in patients with COVID-19

Anna Stachel


New York City quickly became an epicenter of the COVID-19 pandemic. Due to a sudden and massive increase in patients during COVID-19 pandemic, healthcare providers incurred an exponential increase in workload which created a strain on the staff and limited resources. As this is a new infection, predictors of morbidity and mortality are not well characterized.


We developed a prediction model to predict patients at risk for mortality using only laboratory, vital and demographic information readily available in the electronic health record on more than 3000 hospital admissions with COVID-19. A variable importance algorithm was used for interpretability and understanding of performance and predictors.


We built a model with 84-97% accuracy to identify predictors and patients with high risk of mortality, and developed an automated artificial intelligence (AI) notification tool that does not require manual calculation by the busy clinician. Oximetry, respirations, blood urea nitrogen, lymphocyte percent, calcium, troponin and neutrophil percentage were important features and key ranges were identified that contributed to a 50% increase in patients’ mortality prediction score. With an increasing negative predictive value (NPV) starting 0.90 after the second day of admission, we are able more confidently able identify likely survivors. This study serves as a use case of a model with visualizations to aide clinicians with a better understanding of the model and predictors of mortality. Additionally, an example of the operationalization of the model via an AI notification tool is illustrated.

Files (1.9 MB)
Name Size
452.0 kB Download
1.5 MB Download
All versions This version
Views 767767
Downloads 124124
Data volume 106.5 MB106.5 MB
Unique views 643643
Unique downloads 9393


Cite as