Published February 18, 2021 | Version 1.1
Software Open

csisc/WikidataCOVID19SPARQL: Data about Wikidata coverage of COVID-19

  • 1. University of SfFaculty of Medicine of Sfax, University of Sfax, Sfax, Tunisiaax
  • 2. Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
  • 3. La Trobe University, Melbourne, Victoria, Australia
  • 4. Computational Systems Biology Laboratory, University of São Paulo, São Paulo, Brazil
  • 5. Department of Management in Networked and Digital Societies, Kozminski University, Warsaw, Poland
  • 6. Web Semantics Oviedo (WESO) Research Group, University of Oviedo, Spain
  • 7. Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, CB #3270, Davie Hall, Chapel Hill, NC 27599-3270, United States of America
  • 8. Faculty of Medicine, Hashemite University, Zarqa, Jordan
  • 9. Institute of Child Health (ICH), Kolkata, India
  • 10. School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America

Description

In this research dataset, we investigate the ability of open license knowledge graphs to represent COVID-19 information in a fully structured format and to visualize a synthesis of the obtained information using SPARQL. Our work mainly regards the evaluation of this assumption for COVID-19 information in Wikidata. This repository is the source data for "Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata" by Houcemeddine Turki et al. (2020) and involves two folders:

  • "Docs": This folder includes the source data of several figures and tables of the study
    • Table 3: Languages ranked according to various variables, based on Wikidata queries (as of August 11, 2020). The Medical Wikipedia query yields Wikipedia articles associated with Wikidata items that have a Disease Ontology ID (P699) or are in the tree of any of the following classes: medicine (Q11190), disease (Q12136), medical procedure (Q796194) or medication (Q12140). The Medical Wikidata labels query yields labels of Wikidata items that have a Disease Ontology ID (P699) or a MeSH Desccriptor ID (P486) or are in the tree of any of the same four classes. The Wikidata users column provides a snapshot from the Wikidata dashboard that lists Wikidata users who also edit Wikipedia by number of such users per Wikipedia language. Style code: Italic for languages appearing in all four lists; bold for those appearing in only one.
    • Table 4: Languages ranked according to various COVID-19-related variables (as of August 13, 2020). The COVID Wikidata content query sorts languages by the number of labels of Wikidata items with a direct link to and/or from any of the core COVID-19 items - Q84263196 (COVID-19), Q81068910 (COVID-19 pandemic) and Q82069695 (SARS-CoV-2) - excluding items about humans (3131) or scholarly publications (40164). The COVID Wikipedia pages query filters those Wikidata items for associated Wikipedia articles and sorts languages by the number of such articles. The values in the COVID Wikipedia edits column represent the revision counts per Wikipedia language as taken from the dashboard listing Wikimedia projects by total number of revisions to COVID-19-related articles. The COVID-19 pandemic Wikipedia pageviews column represents daily average user traffic (averaged since January 1, 2020) to the article about the COVID-19 pandemic in the respective language. Style code: Italic for languages appearing in all four lists; bold for those appearing in only one.
    • Fig5Corr: Correlation analysis statistics for the variables shown in Tables 3 and 4
    • Tables 5 to 8: List of the mostly used external identifiers for each class of COVID-19-related Wikidata items.
    • Fig 8B: Co-occurrence of topics in publications with one of the covid-related items as a topic, with ribbon widths proportional to the number of publications sharing those topics (log scale). Topics coloured by group as determined by louvain clustering, topics shared in fewer than 5 publications omitted. The figure is visualized at https://csisc.github.io/WikidataCOVID19SPARQL/Fig8B.html.
    • Archive-URL: Internet archive links for the URLs cited by "Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata" are made available thanks to ArchiveNow
  • "Query": This folder involves sample SPARQL queries developed for the visualization of COVID-19 information in Wikidata. These SPARQL queries are visualized at https://speed.ieee.tn.

Files

csisc/WikidataCOVID19SPARQL-1.1.zip

Files (391.4 kB)

Name Size Download all
md5:4f2e3d645a349b57c280239591f746a0
391.4 kB Preview Download

Additional details