Dataset Open Access
Natallia Kokash;
Giovanni Colavizza
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https://dumps.wikimedia.org/enwiki/20230220/).
Version 1: en_citations.zip is a dataset of extracted citations
Version 2: en_final.zip is the same dataset with classified citations augmented with identifiers
The fields are as follows:
The source code to extract citations can be found here: https://github.com/albatros13/wikicite.
The code is a fork of the earlier project on Wikipedia citation extraction: https://github.com/Harshdeep1996/cite-classifications-wiki.
Name | Size | |
---|---|---|
en_citations.zip
md5:255252cb297b444c400df7214859be38 |
7.3 GB | Download |
en_final.zip
md5:6a926e81416bbb4b4201c27ef4dc936f |
7.6 GB | Download |
All versions | This version | |
---|---|---|
Views | 119 | 77 |
Downloads | 23 | 19 |
Data volume | 171.5 GB | 142.4 GB |
Unique views | 108 | 70 |
Unique downloads | 15 | 12 |