Published June 1, 2022 | Version 1.0
Dataset Open

PreprintMatch: a tool for preprint publication detection applied to analyze global inequities in scientific publishing

  • 1. Department of Computer Science and Engineering, UC San Diego, La Jolla, CA, United States
  • 2. Department of Neuroscience, UC San Diego, La Jolla, CA, United States

Description

Dataset underlying the paper "PreprintMatch: a tool for preprint publication detection applied to analyze global inequities in scientific publishing." preprint-paper-matches.csv lists all matches found by our algorithm between bioRxiv/medRxiv and PubMed, and preprint_affiliations.csv lists all extracted affiliations from bioRxiv/medRxiv. The Rxivist data dump (https://zenodo.org/record/4738007) was used for all preprint data, and the scrips to download PubMed data are available on our GitHub repository, https://github.com/PeterEckmann1/preprint-match.

The full database dump, with all data used in the study, is available on Google Drive at https://drive.google.com/file/d/1ZoafhYUP-DO4Hd_4A_v7mbQLjN3JPzJv/view?usp=sharing. The PostgreSQL database can be restored using the pg_restore command.

Files

preprint-paper-matches.csv

Files (35.9 MB)

Name Size Download all
md5:7bd5eb8c3b1125ee0ff368de09145e1d
5.5 MB Preview Download
md5:11c2183841c17207d8b98a2201ce9376
30.4 MB Preview Download