Published June 6, 2025 | Version 9.0.0
Dataset Open

OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information

  • 1. ROR icon University of Bologna
  • 2. ROR icon KU Leuven
  • 3. Alma Mater Studiorum Universita' di Bologna
  • 4. Research Centre for Open Scholarly Metadata

Description

Released on 2025-06-06, compared to the previous version, includes metadata related to citing and cited bibliographic resources added in the April 2025 version of Crossref, as well as the December 2024 dump of JaLC (Japan Link Center).

This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zip

This version of the dataset contains:

  • 124,526,660 bibliographic entities
  • 376,295,095 authors, 2,765,927 editors, and 103,928,927 publishers (counted by their roles, without disambiguating individual entities) 
  • 1,019,563 publication venues

The compressed archives total 46.5 GB, using the 7-zip compression algorithm, and expand to 66 GB when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.

Additional information about OpenCitations Meta at the official webpage.

Files

Files (46.5 GB)

Name Size Download all
md5:1637262ad24615f967c97b2efe7b7da2
14.4 GB Download
md5:dc5177ae52f0a8bd10d9d3b544fb9e69
13.5 GB Download
md5:2bc496e396820d40541709ccb8924aa6
5.8 GB Download
md5:8e37cd42d13de7aafb561e51760e0e66
10.9 GB Download
md5:51616dd96dc886f3c72eef8b5b3f86f7
1.8 GB Download

Additional details

Related works

Is compiled by
Software: 10.5281/zenodo.15244932 (DOI)
Is described by
Journal article: 10.1162/qss_a_00292 (DOI)
Is new version of
Dataset: 10.6084/m9.figshare.21747536.v8 (DOI)

Dates

Created
2022-12-25
First release
Updated
2025-06-06