Published April 8, 2026 | Version 1.0
Dataset Open

Gold Standard Benchmark Dataset for Bibliometric Reference Matching Evaluation

  • 1. K-Synth Academic Spin-off
  • 2. Università degli Studi di Napoli Federico II

Description

This dataset provides the gold standard benchmark used in the paper:

Aria, M., D'Aniello, L., Spano, M. (2026). "A Multi-Phase Reference Matching Algorithm for Bibliometric Analysis: Design, Implementation, and Evaluation." Scientometrics (submitted).

The dataset consists of 1,064 journal articles retrieved from Web of Science using the query TI=("bibliometrics" OR "science mapping"), restricted to English-language journal articles with complete metadata. For each article, the following fields are included: author list (full names), article title, journal name (full title and ISO 4 abbreviation), publication year, volume, issue, start page, end page, and DOI.

Each article was converted into a Scopus-format reference string, with journal names randomly assigned as either full title or ISO 4 abbreviation (approximately 50% each), introducing realistic heterogeneity in journal name representation. The resulting reference strings, each associated with a unique article identifier, constitute the gold standard against which the 17 perturbation scenarios described in the paper are evaluated.

The dataset supports the reproducibility of the synthetic benchmark experiments presented in the paper and can be reused for evaluating other reference matching or record linkage algorithms in bibliometric contexts.

Files

gold_standard.csv

Files (515.9 kB)

Name Size Download all
md5:4da78ee43eeaaa3e4a34f00e6727f1f6
515.9 kB Preview Download

Additional details

Dates

Available
2026-04-08
Dataset available for replication analyses

Software

Repository URL
https://github.com/massimoaria/bibliometrix
Programming language
R
Development Status
Active

References

  • Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of informetrics, 11(4), 959-975.
  • Aria M., Cuccurullo C. (2026) Science Mapping Analysis – A primer with Biblioshiny, McGraw-Hill, ISBN: 978-88-386-2297-7.