Dataset Open Access

Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests

Cabitza, Federico; Campagner, Andrea; Ferrari, Davide; Di Resta, Chiara; Ceriotti, Daniele; Sabetta, Eleonora; Colombini, Alessandra; De Vecchi, Elena; Banfi, Giuseppe; Locatelli, Massimo; Carobene, Anna


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="URL">https://zenodo.org/record/4081318</identifier>
  <creators>
    <creator>
      <creatorName>Cabitza, Federico</creatorName>
      <givenName>Federico</givenName>
      <familyName>Cabitza</familyName>
      <affiliation>DISCo, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Campagner, Andrea</creatorName>
      <givenName>Andrea</givenName>
      <familyName>Campagner</familyName>
      <affiliation>IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Ferrari, Davide</creatorName>
      <givenName>Davide</givenName>
      <familyName>Ferrari</familyName>
      <affiliation>SCVSA Department, University of Parma, Parco Area delle Science 11/a, 43124, Parma, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Di Resta, Chiara</creatorName>
      <givenName>Chiara</givenName>
      <familyName>Di Resta</familyName>
      <affiliation>Vita-Salute San Raffaele University; Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology., Via Olgettina 58, 20132, Milan, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Ceriotti, Daniele</creatorName>
      <givenName>Daniele</givenName>
      <familyName>Ceriotti</familyName>
      <affiliation>Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Sabetta, Eleonora</creatorName>
      <givenName>Eleonora</givenName>
      <familyName>Sabetta</familyName>
      <affiliation>Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Colombini, Alessandra</creatorName>
      <givenName>Alessandra</givenName>
      <familyName>Colombini</familyName>
      <affiliation>IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>De Vecchi, Elena</creatorName>
      <givenName>Elena</givenName>
      <familyName>De Vecchi</familyName>
      <affiliation>IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Banfi, Giuseppe</creatorName>
      <givenName>Giuseppe</givenName>
      <familyName>Banfi</familyName>
      <affiliation>IRCCS Istituto Ortopedico Galeazzi, Orthopaedic Biotechnology Lab, Via Riccardo Galeazzi, 4, 20161, Milano, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Locatelli, Massimo</creatorName>
      <givenName>Massimo</givenName>
      <familyName>Locatelli</familyName>
      <affiliation>Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy</affiliation>
    </creator>
    <creator>
      <creatorName>Carobene, Anna</creatorName>
      <givenName>Anna</givenName>
      <familyName>Carobene</familyName>
      <affiliation>Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <dates>
    <date dateType="Issued">2020-10-12</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4081318</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.1515/cclm-2020-1294</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/covid-19</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;The .xlsx dataset includes all patients used for training, internal-external and external validation: these can be distinguished by looking at the ID (first column) in the dataset: those in format Axxxx-&amp;lt;Date&amp;gt; are the data used for the training, those in the format 20xx are the data used for the internal-external validation, while the remaining data were used for external validation.&lt;/p&gt;

&lt;p&gt;As regards the features: for the Target feature the value 1 stands for &amp;quot;Positive to COVID-19&amp;quot; while the value 0 stands for &amp;quot;Negative to COVID-19&amp;quot;; while for the Sex feature the value 1 stands for &amp;quot;Male&amp;quot; while the value 0 stands for &amp;quot;Female&amp;quot;.&lt;/p&gt;

&lt;p&gt;The full article is available at: https://www.degruyter.com/view/journals/cclm/ahead-of-print/article-10.1515-cclm-2020-1294/article-10.1515-cclm-2020-1294.xml.&lt;/p&gt;

&lt;p&gt;A pre-print version of the article is also available on MedrXiv:&amp;nbsp;https://www.medrxiv.org/content/10.1101/2020.10.02.20205070v1&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ABSTRACT&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt; The rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19),&amp;nbsp;presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative&amp;nbsp;rates around 15&amp;ndash;20%, and expensive equipment. The hematochemical values of routine blood exams could&amp;nbsp;represent a faster and less expensive alternative.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Methods&lt;/strong&gt; Three different training data set of hematochemical values from 1,624 patients (52% COVID-19&amp;nbsp;positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine&amp;nbsp;learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical,&amp;nbsp;coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub&amp;nbsp;datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive)&amp;nbsp;from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt; We developed five ML models: for the complete OSR dataset, the area under the receiver operating&amp;nbsp;characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 15 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC 16 from 0.75 to 0.78; and specificity from 0.92 to 0.96.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusions&lt;/strong&gt; ML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast&amp;nbsp;and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries,&amp;nbsp;or in countries facing an increase in contagions.&lt;/p&gt;</description>
  </descriptions>
</resource>
1,066
474
views
downloads
Views 1,066
Downloads 474
Data volume 144.0 MB
Unique views 962
Unique downloads 394

Share

Cite as