CoronaCentral

doi:10.5281/zenodo.5232554

Published August 22, 2021 | Version v36

Dataset Open

CoronaCentral

Jake Lever¹

1. Stanford University

This describes the output file for the CoronaCentral data. The scripts used to create it are hosted in the corona-ml Github repo. The sources for the documents before processing for CoronaCentral are PubMed and CORD-19.

The file is a gzipped JSON document containing one record per document. Each document has at least one of: a PubMed ID, a CORD-19 ID (cord_uid), a DOI or a URL.

The fields that documents should have are:

pubmed_id: PubMed identifier (optional)
pmcid: PubMed Central identifier (optional)
doi: Digital object identifier (optional)
cord_uid: CORD-19 identifier (optional)
url: URL
journal: Journal/preprint server
publish_year: Year of publication (optional)
publish_month: Month of publication (optional)
publish_day: Day of publication (optional)
title: Title of article
abstract: Abstract of article (optional)
is_preprint: Whether the article is a preprint
topics: Predicted topics for article
articletypes: Predicted article types for article
entities: Extracted entities (e.g. drugs) with identifiers and locations within text

Please report issues to the corona-ml Github issues page.

Files

Files (118.1 MB)

Name	Size	Download all
coronacentral.json.gz md5:0463e4413c9651dcc80453e37f5fcb15	118.1 MB	Download

	All versions	This version
Views	15,440	47
Downloads	1,196	9
Data volume	217.5 GB	1.1 GB

CoronaCentral

Creators

Description

Files

Files (118.1 MB)