Dataset Open Access

A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic

Sobel, Jonathan; Benjakob, Omer; Aviram, Rona

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3901741", 
  "language": "eng", 
  "title": "A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic", 
  "issued": {
    "date-parts": [
  "abstract": "<p>At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia&rsquo;s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic.&nbsp;</p>\n\n<p>To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30&rsquo;000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of <strong>May 2020</strong> (2&rsquo;000&rsquo;000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article&nbsp; and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.</p>\n\n<p>Files used as input and extracted information on Wikipedia&#39;s COVID-19 sources are presented in this archive.</p>\n\n<p>See the <a href=\"\">WikiCitationHistoRy</a> Github repository for the R codes, and other bash/python scripts utilities related to this project.</p>", 
  "author": [
      "family": "Sobel, Jonathan"
      "family": "Benjakob, Omer"
      "family": "Aviram, Rona"
  "note": "Analysis on Wikipedia sources during the first wave of the COVID-19 pandemics (up to May 2020)", 
  "version": "0.1", 
  "type": "dataset", 
  "id": "3901741"
All versions This version
Views 809809
Downloads 731731
Data volume 5.2 GB5.2 GB
Unique views 763763
Unique downloads 488488


Cite as