Published July 14, 2025 | Version 1.0
Dataset Open

COMET arXiv preprint matching results

  • 1. ROR icon California Digital Library
  • 2. ROR icon Leiden University

Description

Overview

This dataset contains 738,474 matched records linking arXiv preprints to their published counterparts. It is part of the COMET (Collaborative Metadata) initiative, specifically produced as a result of the matching strategy developed during COMET's pilot phase.

Data Structure

Each record contains the following fields:

  • input_doi: The DOI of the ArXiv preprint (format: 10.48550/arxiv.XXXX.XXXXX)
  • matched_doi: The DOI of the published work in Crossref that corresponds to the preprint
  • confidence: A confidence score (0-1) indicating the reliability of the match
  • matched_doi_type: The type of the matched publication (journal-article, proceedings-article, book-chapter, or report )

File Formats

The dataset is available in two formats:

  • CSV: 20250615_arxiv_preprint_matching_results.csv
  • JSON: 20250615_arxiv_preprint_matching_results.json

Files

20250615_arxiv_preprint_matching_results.csv

Files (178.6 MB)

Name Size Download all
md5:dc3163f195ef92d7920a78d82704ac33
53.4 MB Preview Download
md5:f7e0243ea08a49238e9e181fb8cb93fe
125.2 MB Preview Download

Additional details

Related works

References
Other: 10.71707/yj21-5d60 (DOI)

Dates

Created
2025-06-15