Published September 1, 2022
| Version 1.0
Software
Open
A large dataset of software mentions in the biomedical literature (the code)
Authors/Creators
- 1. Chan Zuckerberg Initiative
Description
The code accompanying our new dataset of software mentions in biomedical papers (dataset, preprint). Plain-text software mentions are extracted with a trained SciBERT model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links.
Files
Files
(1.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:49924d45816be0c497c6e71c50671775
|
1.2 MB | Download |
Additional details
References
- Istrate et al (2022), A large dataset of software mentions in the biomedical literature, arXiv:2209.00693