Essential frontiers: open data & software citations, an automated ML approach
Description
Science is progressive, and every discovery, set of data, and publication builds on previous work. Today, it's impossible to put every new development in the context of what's gone before. Comprehensive open citations can both enable the attribution of scientific progress as well as the evaluation of research and its impacts. For citations to live up to its promise as a vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge, the open citation frontier needs to expand beyond traditional bibliographic metadata into other essential scientific resources such as research data and software. We describe a new open corpus of dataset and software mentions in biomedical papers created by applying machine learning to full text biomedical literature. We share the process of extraction and transformation of mentions into citations, as well as opportunities and challenges that come with disambiguating and linking the mentions in an open dataset of this size.
Files
Open Citations Workshop 2022-LinIstrate.pdf
Files
(3.6 MB)
Name | Size | Download all |
---|---|---|
md5:56f49b0b025e22a57f6529b8a9a7d01a
|
3.6 MB | Preview Download |