Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published October 5, 2022 | Version v1
Presentation Open

Essential frontiers: open data & software citations, an automated ML approach

  • 1. Chan Zuckerberg Initiative
  • 2. Indeed

Description

Science is progressive, and every discovery, set of data, and publication builds on previous work. Today, it's impossible to put every new development in the context of what's gone before. Comprehensive open citations can both enable the attribution of scientific progress as well as the evaluation of research and its impacts. For citations to live up to its promise as a vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge, the open citation frontier needs to expand beyond traditional bibliographic metadata into other essential scientific resources such as research data and software. We describe a new open corpus of dataset and software mentions in biomedical papers created by applying machine learning to full text biomedical literature. We share the process of extraction and transformation of mentions into citations, as well as opportunities and challenges that come with disambiguating and linking the mentions in an open dataset of this size.

Files

Open Citations Workshop 2022-LinIstrate.pdf

Files (3.6 MB)

Name Size Download all
md5:56f49b0b025e22a57f6529b8a9a7d01a
3.6 MB Preview Download