Published August 22, 2021
| Version v36
Dataset
Open
CoronaCentral
Description
This describes the output file for the CoronaCentral data. The scripts used to create it are hosted in the corona-ml Github repo. The sources for the documents before processing for CoronaCentral are PubMed and CORD-19.
The file is a gzipped JSON document containing one record per document. Each document has at least one of: a PubMed ID, a CORD-19 ID (cord_uid), a DOI or a URL.
The fields that documents should have are:
- pubmed_id: PubMed identifier (optional)
- pmcid: PubMed Central identifier (optional)
- doi: Digital object identifier (optional)
- cord_uid: CORD-19 identifier (optional)
- url: URL
- journal: Journal/preprint server
- publish_year: Year of publication (optional)
- publish_month: Month of publication (optional)
- publish_day: Day of publication (optional)
- title: Title of article
- abstract: Abstract of article (optional)
- is_preprint: Whether the article is a preprint
- topics: Predicted topics for article
- articletypes: Predicted article types for article
- entities: Extracted entities (e.g. drugs) with identifiers and locations within text
Please report issues to the corona-ml Github issues page.
Files
Files
(118.1 MB)
Name | Size | Download all |
---|---|---|
md5:0463e4413c9651dcc80453e37f5fcb15
|
118.1 MB | Download |