CovidPubGraph: A FAIR Knowledge Graph of COVID-19 Publications
Creators
- 1. Paderborn University
- 2. Leipzig University
Description
The rapid generation of large amounts of information about the novel coronavirus SARS-CoV-2 and the disease COVID-19 makes it increasingly difficult to gain a comprehensive overview of current insights related to the disease. This holds especially for scientific research, where a growing number of publications provide insights that might support the development of a cure or better vaccines as well as the repurposing of medication. With this work, we aim to support the rapid access to a comprehensive data source on COVID-19 targeted especially at researchers. Our dataset, COVIDPUBGRAPH, an RDF knowledge graph of scientific publications, abides by the Linked Data and FAIR principles. The base dataset for the extraction isCORD-19, a dataset of COVID-19-related publications, which is updated regularly. Consequently, COVIDPUBGRAPH is updated once in two weeks. Our generation pipeline applies named entity recognition, entity linking and link discovery approaches to the original data. The current version of the resulting dataset contains 202,770,925 triples and is linked to 9 other datasets by over 1 million links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications. COVIDPUBGRAPH can be accessed as an RDF dump, a SPARQL endpoint, and via an HTML endpoint. All data we generated is available under the Creative Commons Attribution 4.0 International license. The software developed for the extraction is available under the GPL 3.0 license.