Published March 31, 2021 | Version v1
Dataset Open

CovidPubGraph: A FAIR Knowledge Graph of COVID-19 Publications

Description

The rapid generation of large amounts of information about the novel coronavirus SARS-CoV-2 and the disease COVID-19 makes it increasingly difficult to gain a comprehensive overview of current insights related to the disease. This holds especially for scientific research, where a growing number of publications provide insights that might support the development of a cure or better vaccines as well as the repurposing of medication. With this work, we aim to support the rapid access to a comprehensive data source on COVID-19 targeted especially at researchers. Our dataset, COVIDPUBGRAPH, an RDF knowledge graph of scientific publications, abides by the Linked Data and FAIR principles. The base dataset for the extraction isCORD-19, a dataset of COVID-19-related publications, which is updated regularly. Consequently, COVIDPUBGRAPH is updated once in two weeks. Our generation pipeline applies named entity recognition, entity linking and link discovery approaches to the original data. The current version of the resulting dataset contains 202,770,925 triples and is linked to 9 other datasets by over 1 million links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications. COVIDPUBGRAPH can be accessed as an RDF dump, a SPARQL endpoint, and via an HTML endpoint. All data we generated is available under the Creative Commons Attribution 4.0 International license. The software developed for the extraction is available under the GPL 3.0 license.

Files

Files (18.0 GB)

Name Size Download all
md5:8c1055da8d7bc453b356f01cab6deed6
4.8 MB Download
md5:c350193a6f2cc4d135fb49c98f2083f8
23.0 MB Download
md5:a06b008107f35a75976aca0c482d3226
17.7 GB Download
md5:c0ce0932096e8141450e5be8fce595cd
273.6 MB Download
md5:cf9e12c28d801bf3669582c3d97537e2
20.8 kB Download