Published May 10, 2024 | Version v1.1
Dataset Open

Data Citation Corpus Data File

Description

Data file for the first release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust. Read more about the project.

The data file includes 10,006,058 data citation records in JSON and CSV formats. The JSON file is the version of record.

Version 1.0 of the corpus data file was released on January 30, 2024. Release v1.1 is an optimized version of v1.0 designed to make the original citation records more usable. No citations have been added to or removed from the dataset in v1.1. 

For convenience, the data file is provided in batches of approximately 1 million records each. The publication date and batch number are included in each component file name, ex: 2024-05-10-data-citation-corpus-01-v1.1.json.

The data citations in the file originate from DataCite Event Data and a project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles. 

Each data citation record  is comprised of:

  • A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication object (journal article or preprint) in which the dataset is cited  

  • Metadata for the cited dataset and for the citing publication object

The data file includes the following fields:

Field

Description

Required?

id

Internal identifier for the citation

Yes

created

Date of item's incorporation into the corpus

Yes

updated

Date of item's most recent update in corpus

Yes

repository

Repository where cited data is stored

No

publisher

Publisher for the article citing the data

No

journal

Journal for the article citing the data

No

title

Title of cited data

No

objId

DOI of article where data is cited

Yes

subjId

DOI or accession number of cited data

Yes

publishedDate

Date when citing article was published

No

accessionNumber

Accession number of cited data

No

doi

DOI of cited data

No

relationTypeId

Relation type in metadata between citation object and subject

No

source

Source where citation was harvested

Yes

subjects

Subject information for cited data

No

affiliations

Affiliation information for creator of cited data

No

funders

Funding information for cited data

No

 

Additional documentation about the citations and metadata in the file is available on the Make Data Count website.

Feedback on the data file can be submitted via Github. For general questions, email info@makedatacount.org.

Files

2024-05-10-data-citation-corpus-v1.1.zip

Files (1.8 GB)

Name Size Download all
md5:e2e191e9573a9ee729413df872edd96b
1.8 GB Preview Download

Additional details

Related works

Is new version of
Dataset: 10.5281/zenodo.11196859 (DOI)

Funding

Make Data Count: A Central Corpus for All Data Citations 226453/Z/22/Z
Wellcome Trust