Published June 7, 2019 | Version v1
Dataset Open

Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software and Other Digital Artifacts

Description

The Research Resource Identifier was introduced in biomedicine in 2014 to more precisely identify the reagents and tools used in published biomedical research and to track use of tools across the breadth of the biomedical literature. The current RRID specification covers key biological and digital resources. Authors are instructed to include an RRID after the first mention of any resource used. RRIDs are designed to be easy to find using  a full text search search engine.

The published data sets were used in our comparative study where comparing the output of our RRID curation workflow with the outputs of automated text mining systems that have been used to identify mentions of resources in the text of publications. All files in tab-separated format (tsv).

Scibot.tsv: Records of the RRID curation workflow using SciBot.

Each record shows that a resource RRID was identified in paper PMID with curator tags (Tag1, Tag2, both optional)

    PMID: Pubmed ID

     RRID: Research Resource Identifier

     Tag1: Curator tags (optional)

     Tag2: Additional curator tags (optional)

rdwsorted.tsv: Records of the output from RDW, a text mining software.

RDW identifies mentions of research resources in papers. Each record shows that a resource RRID was identified in paper PMID.

    PMID: Pubmed ID

     RRID: Research Resource Identifier

rridbyrdw05282019.tsv: Records of the output of the RRID-by-RDW in RDW.

RRID-by-RDW is a component in RDW that identifies mentions of research resources in papers by matching patterns of RRID specifications. Each record shows that a resource RRID was identified in paper PMID.

    PMID: Pubmed ID

     RRID: Research Resource Identifier

     Context: Snippet where the RRID was found

resource_metadata20190418.tsv: Metadata of RRIDs

This file contains metadata of resources and their RRIDs. See file header for column definitions.

RRIDCUR-definitions.tsv: Definitions of curator tags used in Scibot.tsv.

   tag: Tag name

    definition: Definition of the tag

Notes

NIH NIDDK U24DK097771, NIH NIDA U24DA039832 and

Files

Files (43.2 MB)

Name Size Download all
md5:7121fe8aac59537d518734d97212a867
22.8 MB Download
md5:c4e45b8f1208416e1b20972ae6470a4a
18.2 MB Download
md5:0b3e8f876df50c0c09ece34a4b6d1bdf
1.3 MB Download
md5:b8820d2f30d43957ab18f64b4cc11241
1.5 kB Download
md5:61c9923133c79b209a65defd0863ae3b
802.8 kB Download

Additional details

References