Published September 1, 2022 | Version 1.0
Software Open

A large dataset of software mentions in the biomedical literature (the code)

Description

The code accompanying our new dataset of software mentions in biomedical papers (dataset, preprint). Plain-text software mentions are extracted with a trained SciBERT model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. 

Files

Files (1.2 MB)

Name Size Download all
md5:49924d45816be0c497c6e71c50671775
1.2 MB Download

Additional details

References

  • Istrate et al (2022), A large dataset of software mentions in the biomedical literature, arXiv:2209.00693