Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published May 23, 2022 | Version v75
Dataset Open

CoronaCentral

Creators

  • 1. Stanford University

Description

This describes the output file for the CoronaCentral data. The scripts used to create it are hosted in the corona-ml Github repo. The sources for the documents before processing for CoronaCentral are PubMed and CORD-19.

The file is a gzipped JSON document containing one record per document. Each document has at least one of: a PubMed ID, a CORD-19 ID (cord_uid), a DOI or a URL.

The fields that documents should have are:

  • pubmed_id: PubMed identifier (optional)
  • pmcid: PubMed Central identifier (optional)
  • doi: Digital object identifier (optional)
  • cord_uid: CORD-19 identifier (optional)
  • url: URL
  • journal: Journal/preprint server
  • publish_year: Year of publication (optional)
  • publish_month: Month of publication (optional)
  • publish_day: Day of publication (optional)
  • title: Title of article
  • abstract: Abstract of article (optional)
  • is_preprint: Whether the article is a preprint
  • topics: Predicted topics for article
  • articletypes: Predicted article types for article
  • entities: Extracted entities (e.g. drugs) with identifiers and locations within text

Please report issues to the corona-ml Github issues page.

Files

Files (232.4 MB)

Name Size Download all
md5:ee5c9790d1eef62e8664df8fb33f7345
232.4 MB Download