Dataset for: Quantifying the rise and fall of scientific fields
- 1. CRI, University of Paris, Paris
- 2. Georgia Institute of Technology
- 3. Nokia Bell Labs
Description
This dataset supports the paper:
Singh CK, Barme E, Ward R, Tupikina L, Santolini M (2022)
Quantifying the rise and fall of scientific fields.
PLOS ONE 17(6): e0270131. https://doi.org/10.1371/journal.pone.0270131
It provides the processed metadata and relational mappings derived from the arXiv preprint repository used to quantify the temporal dynamics of 175 scientific fields across Physics, Mathematics, Computer Science, Quantitative Biology, and Quantitative Finance.
Files Information
arXiv_data_with_Rescaled_times.csv
CSV file containing article-level features used for analyzing field evolution, including rescaled time variables derived from Gumbel distribution fits.
Each row represents an article and includes the following columns:
-
id: arXiv ID
-
categories: arXiv field tags (e.g., 'hep-th')
-
doi: DOI if the article has been published
-
created: Date of first submission to arXiv
-
authors: List of last names of authors
-
authors_orcid: ORCID ID of authors wherever possible
-
NumCitationsArxiv: Number of arXiv articles citing this article
-
NumReferencesArxiv: Number of arXiv articles this article cites
-
year: Year of submission
-
Rescaled Times: Rescaled time values based on Gumbel parameters per field
-
Min RT: Minimum rescaled time across the article’s fields
article_metadata.tsv
Tab-separated file with supplementary metadata parsed from arXiv and journal records:
-
id: arXiv ID (e.g., "0704.0001")
-
journal.ref: Journal reference string (if published)
-
doi: DOI of published version (if available)
-
num.versions: Number of arXiv versions submitted
-
num.pages: Estimated number of pages (from PDF parsing, may be NA)
-
num.figures: Estimated number of figures (from PDF parsing, may be NA)
orcid_ids_to_articles.json
JSON list of triples associating author ORCID IDs to arXiv article IDs, allowing disambiguation of authors.
Each entry links a disambiguated author (via ORCID) to an arXiv preprint, in the form:
{
"certainty": 1,
"predicate": "is_author_of",
"subject": {
"@id": "http://orcid.org/0000-0001-5000-1018",
"type": "ORCID_iD",
"value": "0000-0001-5000-1018"
},
"object": {
"@id": "http://arxiv.org/art/1902.00500",
"type": "arXiv_article",
"value": "1902.00500"
}
}
Used for author trajectory reconstruction across fields.
internal-citations.json
JSON dictionary representing the arXiv internal citation network.
-
Keys: arXiv IDs of citing articles
-
Values: Lists of arXiv IDs cited by the article
Example:
{
"hep-lat/0403005": ["nucl-th/9911018", "hep-ph/9808398", ...],
"hep-lat/0403014": ["hep-lat/0310012", "hep-lat/9702016", ...]
}
This file enables construction of the citation network used to compute the Disruptive Index and other bibliometric indicators.
Citation
Please cite the original article if using this dataset:
Singh CK, Barme E, Ward R, Tupikina L, Santolini M (2022)
Quantifying the rise and fall of scientific fields.
PLOS ONE 17(6): e0270131. https://doi.org/10.1371/journal.pone.0270131
Also cite the dataset itself:
Chakresh Kumar Singh, Emma Barme, Robert Ward, Liubov Tupikina, & Marc Santolini. (2022).
Dataset for: Quantifying the rise and fall of scientific fields (Version v2) [Data set].
Zenodo. https://doi.org/10.5281/zenodo.16738242
License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to use, share, and adapt the materials with proper attribution.
Files
arXiv_data_with_Rescaled_times.csv
Additional details
Related works
- Is supplement to
- Preprint: 10.48550/arXiv.2107.03749 (DOI)
- Publication: 10.1371/journal.pone.0270131 (DOI)