Poster Open Access
Sara Lafia; Elizabeth Moss; Andrea Thomer; Libby Hemphill
The Inter-university Consortium for Political and Social Research (ICPSR) is developing a computational approach to detect informal data use and construct reliable data impact metrics. Formal data citations that use unique identifiers are readily discoverable; however, informal references made to data are challenging to infer and detect as they are described in many ways and tend to occur in article footnotes, tables, figures, or elsewhere where they are not indexed for search. Identifying data citations is an essential step toward characterizing the impact of research data (i.e., who reuses research data and for what purposes). We use features of text including the presence of indicator terms, sections of articles, and frequency of acronyms, to predict the portions of articles that are likely to indicate data use. We then use a natural language processing (NLP) pipeline to extract candidate data references. In production, our model will support the review of publications to ingest into the ICPSR Bibliography of Data-related Literature as part of a broader effort to measure the impact of research data.
|All versions||This version|
|Data volume||51.7 MB||51.7 MB|