Publication dates for ArXiv publication versions

Druskat, Stephan

doi:10.5281/zenodo.11184477

Published May 13, 2024 | Version 1.1

Dataset Open

Publication dates for ArXiv publication versions

Druskat, Stephan (Data collector)^{1, 2}

1. Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)
2. Humboldt-Universität zu Berlin

Lookup tables in plain JSON, mapping ArXiv publication version identifiers to their respective publications dates.

The JSON files are archived in arxiv-publication-dates-by-identifier-prefix.tar.gz.
The archive contains files named after the date prefix of the ArXiv publication version identifiers they contain.
E.g., the file 1908.json will contain the data for identifiers 1908.12345v1, 1908.12345v2, 1908.23456v1, etc.
Publication dates are given in the format YYYY-MM-DD.

Reproducibility

The Snakemake workflow that has produced this dataset has been archived and is available in arxiv-publication-dates-workflow.tar.gz.

Changes in version 1.1

For version 1.1, the dataset was extended manually to include a single missing date for arXiv:0906.3421v3: 2010-02-02. As of 2024-05-13, the date for the respective version had not been provided in the arXivRaw OAI-PMH data (http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:0906.3421&metadataPrefix=arXivRaw).

Running the workflow

To reproduce the dataset on a Linux machine, you need a version of the conda package manager installed on your system.

Run the following:

# Extract the archived workflow
tar -xf my-workflow.tar.gz
# Create conda environment from lock file
conda env create -n arxiv-metadata --file conda-environment.lock.yaml
# Activate the environment
conda activate arxiv-metadata
# Optionally, dry-run the workflow
snakemake -n
# Produce the output files
snakemake --keep-storage-local-copies --software-deployment-method conda -c <NUMBER OF CORES TO USE>

Then, append the file 0906.json (included in the tar.gz output) with value 2010-02-02 for a new key 0906.3421v3.

Workflow

To adapt/change the workflow, clone it from https://github.com/sdruskat/arxiv-publication-metadata.
The workflow version used to produce this dataset is available at https://doi.org/10.5281/zenodo.11091617.

Files

README.md

Files (31.3 MB)

Name	Size	Download all
arxiv-publication-dates-by-identifier-prefix.tar.gz md5:3658d5c1693e462e27a34a118aa04264	16.1 MB	Download
arxiv-publication-dates-workflow.tar.gz md5:3fd6cb06340aa6979bdbcdf50f7f387b	15.2 MB	Download
README.md md5:d19f460386d5db4fcf11bfa490de366c	2.2 kB	Preview Download

Additional details

Is compiled by: Software: 10.5281/zenodo.11091617 (DOI)
Is derived from: Dataset: 10.5281/zenodo.11065282 (DOI)

Druskat, S. (2024). ArXiv metadata extraction workflow (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.11091617

	All versions	This version
Views	805	486
Downloads	378	175
Data volume	4.4 GB	2.0 GB

Reproducibility

Changes in version 1.1

Running the workflow

Workflow

README.md

Files (31.3 MB)

Related works

References

Publication dates for ArXiv publication versions

Authors/Creators

Description

Reproducibility

Changes in version 1.1

Running the workflow

Workflow

Files

README.md

Files (31.3 MB)

Additional details

Related works

References