Dataset Open Access

Development and validation of a machine learning model for use as an automated artificial intelligence tool to predict mortality risk in patients with COVID-19

Anna Stachel

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.3893846</identifier>
      <creatorName>Anna Stachel</creatorName>
      <affiliation>NYU Langone Health</affiliation>
    <title>Development and validation of a machine learning model for use as an automated artificial intelligence tool to predict mortality risk in patients with COVID-19</title>
    <subject>Prediction model</subject>
    <subject>Machine Learning</subject>
    <subject>Feature Engineering</subject>
    <subject>AI tool</subject>
    <subject>Covid-19 mortality</subject>
    <date dateType="Issued">2020-06-14</date>
  <resourceType resourceTypeGeneral="Dataset"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.3893845</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;New York City quickly became an epicenter of the COVID-19 pandemic. Due to a sudden and massive increase in patients during COVID-19 pandemic, healthcare providers incurred an exponential increase in workload which created a strain on the staff and limited resources. As this is a new infection, predictors of morbidity and mortality are not well characterized.&lt;/p&gt;


&lt;p&gt;We developed a prediction model to predict patients at risk for mortality using only laboratory, vital and demographic information readily available in the electronic health record on more than 3000 hospital admissions with COVID-19. A variable importance algorithm was used for interpretability and understanding of performance and predictors.&lt;/p&gt;


&lt;p&gt;We built a model with 84-97% accuracy to identify predictors and patients with high risk of mortality, and developed an automated artificial intelligence (AI) notification tool that does not require manual calculation by the busy clinician. Oximetry, respirations, blood urea nitrogen, lymphocyte percent, calcium, troponin and neutrophil percentage were important features and key ranges were identified that contributed to a 50% increase in patients&amp;rsquo; mortality prediction score. With an increasing negative predictive value (NPV) starting 0.90 after the second day of admission, we are able more confidently able identify likely survivors. This study serves as a use case of a model with visualizations to aide clinicians with a better understanding of the model and predictors of mortality. Additionally, an example of the operationalization of the model via an AI notification tool is illustrated.&lt;/p&gt;</description>
All versions This version
Views 462462
Downloads 7373
Data volume 64.3 MB64.3 MB
Unique views 370370
Unique downloads 5252


Cite as