There is a newer version of the record available.

Published May 22, 2023 | Version v1
Dataset Open

A Comprehensive Dataset of Citations with Identifiers from English Wikipedia (2023)

  • 1. University of Amsterdam

Description

This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https://dumps.wikimedia.org/enwiki/20230220/). The dataset is purely based on information from Wikipedia, labelled and annotated datasets will be added in the follow up versions.

The source code to extract citations can be found here: https://github.com/albatros13/wikicite.

The code is a fork of the earlier project on Wikipedia citation extraction: https://github.com/Harshdeep1996/cite-classifications-wiki.

 

Files

en_citations.zip

Files (7.3 GB)

Name Size Download all
md5:255252cb297b444c400df7214859be38
7.3 GB Preview Download