Published September 27, 2021 | Version 1.0
Dataset Open

COVID-19++: A Citation-Aware Covid-19 Dataset for the Analysis of Research Dynamics


COVID-19++ is a citation-aware COVID-19 dataset for the analysis of research dynamics. In addition to primary COVID-19 related articles and preprints from 2020, it includes citations and the metadata of first-order cited work. All publications are annotated with MeSH terms, either from the ground truth, or via ConceptMapper, if no ground truth was available. 

The data is organized in CSV files

- Paper metadata (paper_id, publdate, title, data_source): paper.csv

- Annotation data, mapping paper_id to MeSH terms: annotation.csv 

- Authorship data, mapping paper_id to author, optionally with ORCID: authorship.csv
- Paired DOIs of citing and cited papers: references.csv

The column data source within the paper metadata has the value KE (for metadata from ZB MED KE), PP (for preprints) or CR (for cited resources from CrossRef)

This work was supported by BMBF within the programme ``Quantitative Wissenschaftsforschung'' under grant numbers 01PU17013A, 01PU17013B, 01PU17013C.


Files (78.7 MB)

Name Size Download all
78.7 MB Preview Download