There is a newer version of the record available.

Published October 6, 2023 | Version 1.3
Dataset Open

BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops

  • 1. Univ. of the Peloponnese & ATHENA RC
  • 2. ATHENA RC
  • 3. Univ. of the Peloponnese

Description

In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation  has created a void in available data.

    BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~2.7M citations made by approximately 164K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

File Structure:

The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming. 

Each JSON object has three main fields:

  • “_id”: a unique identifier,

  • “citing_paper”, the “dblp_id” of the citing paper,

  • “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields:

    • “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present.

    • “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present.

    • “bibliographic_reference”: the raw citation string as it appears in the citing paper.

Changes from previous version:

  • Added more papers from DBLP.

Files

Files (252.4 MB)

Name Size Download all
md5:87268ffc99cc6b1e18bcc4d802f2be6c
252.4 MB Download

Additional details

Funding

SciLake – Democratising and making sense out of heterogeneous scholarly content 101058573
European Commission
GraspOS – GraspOS: next Generation Research Assessment to Promote Open Science 101095129
European Commission