Gold Standard Benchmark Dataset for Bibliometric Reference Matching Evaluation
Authors/Creators
- 1. K-Synth Academic Spin-off
- 2. Università degli Studi di Napoli Federico II
Description
This dataset provides the gold standard benchmark used in the paper:
Aria, M., D'Aniello, L., Spano, M. (2026). "A Multi-Phase Reference Matching Algorithm for Bibliometric Analysis: Design, Implementation, and Evaluation." Scientometrics (submitted).
The dataset consists of 1,064 journal articles retrieved from Web of Science using the query TI=("bibliometrics" OR "science mapping"), restricted to English-language journal articles with complete metadata. For each article, the following fields are included: author list (full names), article title, journal name (full title and ISO 4 abbreviation), publication year, volume, issue, start page, end page, and DOI.
Each article was converted into a Scopus-format reference string, with journal names randomly assigned as either full title or ISO 4 abbreviation (approximately 50% each), introducing realistic heterogeneity in journal name representation. The resulting reference strings, each associated with a unique article identifier, constitute the gold standard against which the 17 perturbation scenarios described in the paper are evaluated.
The dataset supports the reproducibility of the synthetic benchmark experiments presented in the paper and can be reused for evaluating other reference matching or record linkage algorithms in bibliometric contexts.
Files
gold_standard.csv
Files
(515.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4da78ee43eeaaa3e4a34f00e6727f1f6
|
515.9 kB | Preview Download |
Additional details
Dates
- Available
-
2026-04-08Dataset available for replication analyses
Software
- Repository URL
- https://github.com/massimoaria/bibliometrix
- Programming language
- R
- Development Status
- Active
References
- Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of informetrics, 11(4), 959-975.
- Aria M., Cuccurullo C. (2026) Science Mapping Analysis – A primer with Biblioshiny, McGraw-Hill, ISBN: 978-88-386-2297-7.