Dataset Open Access

Types, open citations, closed citations, publishers, and participation reports of Crossref entities

Hiebi, Ivan; Peroni, Silvio; Shotton, David

This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.

Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.

non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).

The columns of the CSV file are the following ones:

  • doi: the DOI of the publication in Crossref;
  • type: the type of the publication as indicated in Crossref;
  • cited_by: the number of open citations received by the publication according to COCI;
  • non_open: the number of closed citations received by the publication according to Crossref + COCI.

croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).

The columns of the CSV file are the following ones:

  • type: the type publication between "journal", "book", "proceedings", "dataset", "other";
  • label: the label assigned to the type for visualisation purposes;
  • coci_open_cit: the number of open citations received by the publication type according to COCI;
  • crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.

publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:

  • publisher: the name of the publisher;
  • doi_prefix: the list of DOI prefixes used assigned by the publisher;
  • coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;
  • crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;
  • total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).

20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.

The columns of the CSV file are the following ones:

  • publisher: the name of the publisher;
  • open: the number of publications in Crossref with an 'open' visibility for their reference lists;
  • limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;
  • closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;
  • overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.

Files (448.5 MB)
Name Size
20publishers_cr.csv
md5:f9f568ad77829184c73db1094171f0a6
1.2 kB Download
croci_types.csv
md5:9f45cd2edb8f634d67aabb0e7f08c076
192 Bytes Download
non_open.zip
md5:cc7684ddefa33954c8146b476e596ccf
447.8 MB Download
publishers_cits.csv
md5:7a835fc16068889db59f3a5f6f5b56d9
640.2 kB Download
  • Ivan Heibi, Silvio Peroni, David Shotton (2019). Crowdsourcing open citations with CROCI - An analysis of the current status of open citations, and a proposal. arXiv:1902.02534.
150
25
views
downloads
All versions This version
Views 150150
Downloads 2525
Data volume 449.1 MB449.1 MB
Unique views 145145
Unique downloads 1818

Share

Cite as