There is a newer version of the record available.

Published February 12, 2024 | Version 1.4
Dataset Open

BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops

  • 1. Univ. of the Peloponnese & ATHENA RC
  • 2. ATHENA RC
  • 3. Univ. of the Peloponnese


In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation  has created a void in available data.

    BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~2.9M citations made by approximately 171K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

File Structure:

The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming. 

Each JSON object has three main fields:

  • “_id”: a unique identifier,

  • “citing_paper”, the “dblp_id” of the citing paper,

  • “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields:

    • “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present.

    • “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present.

    • “bibliographic_reference”: the raw citation string as it appears in the citing paper.

Changes from previous version:

  • Added more papers from DBLP.


Files (272.4 MB)

Name Size Download all
272.4 MB Download

Additional details


SciLake – Democratising and making sense out of heterogeneous scholarly content 101058573
European Commission
GraspOS – GraspOS: next Generation Research Assessment to Promote Open Science 101095129
European Commission