Published December 1, 2021 | Version 1
Poster Open

Detecting Informal Data Use in Literature

  • 1. University of Michigan, USA

Description

The Inter-university Consortium for Political and Social Research (ICPSR) is developing a computational approach to detect informal data use and construct reliable data impact metrics. Formal data citations that use unique identifiers are readily discoverable; however, informal references made to data are challenging to infer and detect as they are described in many ways and tend to occur in article footnotes, tables, figures, or elsewhere where they are not indexed for search. Identifying data citations is an essential step toward characterizing the impact of research data (i.e., who reuses research data and for what purposes). We use features of text including the presence of indicator terms, sections of articles, and frequency of acronyms, to predict the portions of articles that are likely to indicate data use. We then use a natural language processing (NLP) pipeline to extract candidate data references. In production, our model will support the review of publications to ingest into the ICPSR Bibliography of Data-related Literature as part of a broader effort to measure the impact of research data.

Files

FORCE-11-poster-2021.pdf

Files (956.7 kB)

Name Size Download all
md5:aaf439007aaccb73ff1d202725191003
956.7 kB Preview Download