Published May 31, 2022 | Version v2
Dataset Open

Dataset for: Quantifying the rise and fall of scientific fields

  • 1. CRI, University of Paris, Paris
  • 2. Georgia Institute of Technology
  • 3. Nokia Bell Labs

Description

This dataset supports the paper:


Singh CK, Barme E, Ward R, Tupikina L, Santolini M (2022)
Quantifying the rise and fall of scientific fields.
PLOS ONE 17(6): e0270131. https://doi.org/10.1371/journal.pone.0270131

It provides the processed metadata and relational mappings derived from the arXiv preprint repository used to quantify the temporal dynamics of 175 scientific fields across Physics, Mathematics, Computer Science, Quantitative Biology, and Quantitative Finance.

Files Information

arXiv_data_with_Rescaled_times.csv

CSV file containing article-level features used for analyzing field evolution, including rescaled time variables derived from Gumbel distribution fits.

Each row represents an article and includes the following columns:

  • id: arXiv ID

  • categories: arXiv field tags (e.g., 'hep-th')

  • doi: DOI if the article has been published

  • created: Date of first submission to arXiv

  • authors: List of last names of authors

  • authors_orcid: ORCID ID of authors wherever possible 

  • NumCitationsArxiv: Number of arXiv articles citing this article

  • NumReferencesArxiv: Number of arXiv articles this article cites

  • year: Year of submission

  • Rescaled Times: Rescaled time values based on Gumbel parameters per field

  • Min RT: Minimum rescaled time across the article’s fields

article_metadata.tsv

Tab-separated file with supplementary metadata parsed from arXiv and journal records:

  • id: arXiv ID (e.g., "0704.0001")

  • journal.ref: Journal reference string (if published)

  • doi: DOI of published version (if available)

  • num.versions: Number of arXiv versions submitted

  • num.pages: Estimated number of pages (from PDF parsing, may be NA)

  • num.figures: Estimated number of figures (from PDF parsing, may be NA)

orcid_ids_to_articles.json

JSON list of triples associating author ORCID IDs to arXiv article IDs, allowing disambiguation of authors.

Each entry links a disambiguated author (via ORCID) to an arXiv preprint, in the form:

{
 "certainty": 1,
 "predicate": "is_author_of",
 "subject": {
    "@id": "http://orcid.org/0000-0001-5000-1018",
    "type": "ORCID_iD",
    "value": "0000-0001-5000-1018"
  },
 "object": {
   "@id": "http://arxiv.org/art/1902.00500",
   "type": "arXiv_article",
    "value": "1902.00500"
  }
}

Used for author trajectory reconstruction across fields.

internal-citations.json

JSON dictionary representing the arXiv internal citation network.

  • Keys: arXiv IDs of citing articles

  • Values: Lists of arXiv IDs cited by the article

Example:

{
  "hep-lat/0403005": ["nucl-th/9911018", "hep-ph/9808398", ...],
  "hep-lat/0403014": ["hep-lat/0310012", "hep-lat/9702016", ...]
}

This file enables construction of the citation network used to compute the Disruptive Index and other bibliometric indicators.

Citation

Please cite the original article if using this dataset:

Singh CK, Barme E, Ward R, Tupikina L, Santolini M (2022)
Quantifying the rise and fall of scientific fields.
PLOS ONE 17(6): e0270131. https://doi.org/10.1371/journal.pone.0270131

Also cite the dataset itself:

Chakresh Kumar Singh, Emma Barme, Robert Ward, Liubov Tupikina, & Marc Santolini. (2022). 
Dataset for: Quantifying the rise and fall of scientific fields (Version v2) [Data set].
Zenodo. https://doi.org/10.5281/zenodo.16738242

License

This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to use, share, and adapt the materials with proper attribution.

Files

arXiv_data_with_Rescaled_times.csv

Files (614.4 MB)

Name Size Download all
md5:d9edfdbe74057df3aca5e0470f4eacda
76.3 MB Download
md5:c2b5379d71663093b13f7decb0ccb5a1
210.3 MB Preview Download
md5:6f45c030bba808dc6db962f3fdb8673f
171.8 MB Preview Download
md5:5fe17ff091857134cb78d52c6cf256b4
155.9 MB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.48550/arXiv.2107.03749 (DOI)
Publication: 10.1371/journal.pone.0270131 (DOI)