Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published February 7, 2019 | Version 1.0
Dataset Open

Types, open citations, closed citations, publishers, and participation reports of Crossref entities

  • 1. Digital Humanities Advanced Research Centre, Department of Computer Science and Engineering, University of Bologna, Bologna, Italy
  • 2. Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom


This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at

Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at The datasets contain the following information. it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).

The columns of the CSV file are the following ones:

  • doi: the DOI of the publication in Crossref;
  • type: the type of the publication as indicated in Crossref;
  • cited_by: the number of open citations received by the publication according to COCI;
  • non_open: the number of closed citations received by the publication according to Crossref + COCI.

croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).

The columns of the CSV file are the following ones:

  • type: the type publication between "journal", "book", "proceedings", "dataset", "other";
  • label: the label assigned to the type for visualisation purposes;
  • coci_open_cit: the number of open citations received by the publication type according to COCI;
  • crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.

publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:

  • publisher: the name of the publisher;
  • doi_prefix: the list of DOI prefixes used assigned by the publisher;
  • coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;
  • crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;
  • total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).

20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.

The columns of the CSV file are the following ones:

  • publisher: the name of the publisher;
  • open: the number of publications in Crossref with an 'open' visibility for their reference lists;
  • limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;
  • closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;
  • overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.



Files (448.5 MB)

Name Size Download all
1.2 kB Preview Download
192 Bytes Preview Download
447.8 MB Preview Download
640.2 kB Preview Download

Additional details


  • Ivan Heibi, Silvio Peroni, David Shotton (2019). Crowdsourcing open citations with CROCI - An analysis of the current status of open citations, and a proposal. arXiv:1902.02534.