Everything you wish you didn't have to know about metadata matching
Description
The scholarly community understands very well how important accurate citation links between research outputs are: they provide provenance for the claims in the articles, researchers follow them to extend their domain knowledge, and institutions even tend to use them to estimate the quality and impact of research. And citation links are not the only important relationships between entities in the scholarly ecosystem. Nowadays, the community is becoming more and more interested in relationships between research outputs and institutions, research outputs and funders, contributors and institutions, preprints and journal articles, and so on. But where do those links actually come from? Ideally, they would be provided by the authors while submitting a scholarly article, collected by the publisher and distributed further in a machine-readable format. Indeed, the authors are typically in the best position to provide accurate information about the relationships of various entities mentioned in their article. However, in practice, only about 30% of bibliographic references deposited with Crossref contain the DOI of the cited work, and about 62% of funding information contain the funder identifier. For the remaining bibliographic references and funding information, we try to automatically find the identifier of the referenced item and insert it in the metadata. Both publisher-asserted and Crossref-asserted links are then made available through our APIs, along with the information about who asserted it. This process of finding the referenced item based on a set of (typically messy) information about it is called matching. In this presentation, I share my experience with different flavours of metadata matching at Crossref and present our future plans. I also answer frequently asked questions such as: “As time goes by, do we need to do less and less matching?”, “Is a simple title lookup enough to match a citation?” and “Can we be 100% sure that all citation links we see in the data are correct?”.
Files
Tkaczyk-slides-oc-workshop-2022.pdf
Files
(421.7 kB)
Name | Size | Download all |
---|---|---|
md5:4bb2ee43a80f85e554dd3ae69c341b8d
|
421.7 kB | Preview Download |