Published May 22, 2023
| Version v1
Dataset
Open
A Comprehensive Dataset of Citations with Identifiers from English Wikipedia (2023)
Description
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https://dumps.wikimedia.org/enwiki/20230220/). The dataset is purely based on information from Wikipedia, labelled and annotated datasets will be added in the follow up versions.
The source code to extract citations can be found here: https://github.com/albatros13/wikicite.
The code is a fork of the earlier project on Wikipedia citation extraction: https://github.com/Harshdeep1996/cite-classifications-wiki.
Files
en_citations.zip
Files
(7.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:255252cb297b444c400df7214859be38
|
7.3 GB | Preview Download |