Report Open Access
Melissa Haendel; Andrew Su; Julie McMurry; et al
“Knowledge” is the collection of insights captured by experts, providing an explanatory framework for evaluating new observations. A “knowledge base” makes it possible for that knowledge to be maximally impactful by rendering it findable and computable. Maintaining databases that house scientific knowledge is far more cost-effective than rederiving that knowledge experimentally. Moreover, knowledge bases provide efficiencies beyond those of basic science, they are essential to pharmaceutical R&D and open drug discovery advances as well. Nevertheless, not all databases can be maintained at the same level of support, or for the same duration of time. Therefore, the evaluation and review of biomedical data repositories should be mindful of quality, accessibility, and value of the database resources over time and across the translational divide.
Why traditional metrics fall short. Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?). Even secondary tracking of the resource’s identifiers is difficult as most researchers do not use such identifiers in their manuscripts. Efforts such as those by Identifiers.org, and N2T.net are working together to help improve consistency and citability of records within such data resources that lack DOIs.
Other measures of impact (e.g. letters of support, patents, etc.), have also been insufficient to rigorously assess impact or value as they essentially only relate to impact. It is clear that evaluating a data or knowledge resource is non-trivial, as evidenced by the large number of evaluation and impact working groups and rubrics. We acknowledge that a one-size-fits-all solution is unrealistic. This RFI is an opportunity to stop “looking for our keys under streetlight because that is where the light happens to be.” Here, we focus exclusively on data access and reuse issues as we feel that these are most important to us as data integrators; moreover these factors may be least likely to be covered extensively in many other responses to this RFI.
We arrange our response according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable [PMID:26978244], and have added three additional principles: Traceable, Licensed, and Connected. These three additions are of course very closely related to the original principles; however, we call them out as they are still largely overlooked and underappreciated, even within FAIR. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc.