10.5281/zenodo.3901741
https://zenodo.org/records/3901741
oai:zenodo.org:3901741
Sobel, Jonathan
Jonathan
Sobel
0000-0002-5111-4070
Weizmann Institute of Science, Rehovot, Israel
Benjakob, Omer
Omer
Benjakob
0000-0002-7179-3509
The Cohn Institute for the History and Philosophy of Science and Ideas, Tel Aviv University, Tel Aviv, Israel
Aviram, Rona
Rona
Aviram
0000-0001-5544-3552
Weizmann Institute of Science
A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic
Zenodo
2021
Wikipedia
COVID-19
Sources history
Infodemics
2021-03-15
eng
10.5281/zenodo.3901740
https://zenodo.org/communities/covid-19
https://zenodo.org/communities/wikipedia_data
https://zenodo.org/communities/wikimedia
0.1
Creative Commons Attribution 4.0 International
At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia’s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic.
To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30’000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of May 2020 (2’000’000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.
Files used as input and extracted information on Wikipedia's COVID-19 sources are presented in this archive.
See the WikiCitationHistoRy Github repository for the R codes, and other bash/python scripts utilities related to this project.
Analysis on Wikipedia sources during the first wave of the COVID-19 pandemics (up to May 2020)