Published March 15, 2021 | Version 0.1
Dataset Open

A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic

  • 1. Weizmann Institute of Science, Rehovot, Israel
  • 2. The Cohn Institute for the History and Philosophy of Science and Ideas, Tel Aviv University, Tel Aviv, Israel
  • 3. Weizmann Institute of Science

Description

At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia’s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic. 

To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30’000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of May 2020 (2’000’000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article  and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.

Files used as input and extracted information on Wikipedia's COVID-19 sources are presented in this archive.

See the WikiCitationHistoRy Github repository for the R codes, and other bash/python scripts utilities related to this project.

Notes

Analysis on Wikipedia sources during the first wave of the COVID-19 pandemics (up to May 2020)

Files

annotated_Altmetric_full_COVID_Corpus.txt

Files (1.1 GB)

Name Size Download all
md5:5a28ea5c1059bd758864632fbea3be98
688.5 kB Preview Download
md5:44d576da3f99a0da9d96090d45e2a3ca
7.2 MB Preview Download
md5:c30cbd88c7e5d2da62a586415189cdfa
422.8 MB Preview Download
md5:5167aab665a1be93bfdaa7663153d4f6
850.3 kB Preview Download
md5:a5b06296a4d0ef035d77b238fc0225ee
10.1 MB Preview Download
md5:ceddd5f6c20da27b2e79093c4b7e319f
734.3 kB Preview Download
md5:d16b0380268aa5a555d86e06835fcfc7
4.7 MB Preview Download
md5:6567532d7a93ecab93da4249e570e321
232.1 MB Preview Download
md5:1c62a716757f2f1ae19f65b7c85da88d
398.9 MB Preview Download
md5:672a8bf03958d0a292c8524792ef0b24
664.5 kB Download
md5:1e0ac98d58d7a0c9afdacccadc617540
34.1 kB Download
md5:5847fa2f72a0cb0cf0064e270d7bfa8c
2.1 MB Download
md5:efa72d30a28cfb80414a0c33d82d1b7e
104.0 kB Download
md5:a2a008d986816c413ac66d0bc6ef45ce
20.8 kB Download
md5:5136d6322f225b14f98a89dba3ecfc7a
347.8 kB Download
md5:c98782743f7e0b2e3149207830afe6e8
439.4 kB Download
md5:4fcad422e6e692c80985e2dbe98d7b57
134.0 kB Download
md5:d0293ca01842a4c897250bec29bc3c95
10.2 kB Download
md5:1693dcd1fefb2152331bbc2cbc5b740d
1.4 MB Download
md5:e971a71e8197725768565e07cfbfeae1
5.9 kB Download
md5:4ecc6ad74713a72287392c9da5e6aa48
4.9 kB Download
md5:cd7d5257fab0f420f55f82f6de67d8dd
6.1 kB Download
md5:c6ec0af472a79928246e04b50886b089
1.0 MB Download
md5:bf07a9d04162d4edfdb4bc4a93a0d880
1.1 MB Download
md5:c829fc0c67863245f40da5bb52617758
11.3 MB Download
md5:659456dfae5b0cfc301c40da775ee669
13.3 MB Preview Download
md5:d276f1a3b0ce2498c1b879f95379fe0d
6.0 kB Preview Download
md5:c05de65832c9273548bd94c7aec49dce
1.6 kB Download
md5:28d77d86408efe63ff1e187008c46369
903 Bytes Download
md5:939993b0c64ad23d57095a8a2eb26952
5.5 kB Download
md5:cc31b0d89ac91005fb9787722bd25a85
10.9 kB Download
md5:a980845c1ef20a669fa129fcc943bc49
1.6 kB Download
md5:a980845c1ef20a669fa129fcc943bc49
1.6 kB Download
md5:c71af7a9cb20d5f2618f546b7939cb58
1.4 kB Download
md5:9751af20e841960aa32739d08df1e4d0
1.5 kB Download
md5:344cd621f4da3b026d03d01185e04ee0
1.5 kB Download
md5:c71af7a9cb20d5f2618f546b7939cb58
1.4 kB Download
md5:9a4e8906fdde8604eb0582ce75832687
383.1 kB Preview Download
md5:2f21438440defabbe501303e29438ee8
16.8 kB Preview Download
md5:5ab3aa1e1e9b5ede27d97bad9380bc4b
1.1 MB Preview Download
md5:8e61aadea3b7ff1115054d1dec960cb1
537.0 kB Download
md5:76af6ed936bb06319194166679902280
393.8 kB Download
md5:c9e8ac14b73fe3bc72faee45cc8c2ff5
13.9 kB Download
md5:1be772c29085c3b68a83474c6f43b973
20.5 kB Preview Download
md5:5845396fd81164749e0eaffdcf9d60ed
11.9 kB Download
md5:6f9b3fbd86bba69470c54246bae9c319
10.7 MB Preview Download
md5:263e99f53fbc5f39d51d11dea4ffb3fa
665.9 kB Download
md5:23cb0868110e1a0141c47c436bf3eac2
5.8 kB Preview Download
md5:61a9755ea0caf5b4094fd12f92d53689
6.5 kB Download
md5:d69ae699a912121a470705af4b1900f2
11.0 kB Preview Download
md5:d8f55f8e22097daeef6311f182f9f96b
17.9 kB Download
md5:dac8b579ed10043f552d69bdb71d0467
9.6 kB Preview Download
md5:2d9132be324f87520e1351832d5d6f1e
15.8 kB Download
md5:6f6563a0d18181eb40666053e7c93bb4
15.8 kB Download
md5:42eaf8e1ecd288ea480c15e826b6cb78
5.7 kB Download
md5:65e4f7bce7a8efb00af5312a1d81b2d7
5.8 kB Preview Download
md5:f24c027a9805a5113ee72647164e509d
16.3 kB Download
md5:5e57232cc3e0db93001011df9a35b106
1.8 MB Preview Download
md5:42f767fb58d16cda6799a00158740fff
2.2 MB Download
md5:5adfbd896ee956a98be633cd5c262a83
814.3 kB Download
md5:f710d05d6ce2b1f50d2d71d1d215da98
1.2 MB Preview Download
md5:7e62ea43ec52b2ff6d52598945520a12
448.8 kB Download
md5:524091975df6df9d555eee155f56b54d
454.4 kB Download