Report Open Access
Barsky, Eugene; Brosz, John; Leahey, Amber
Research data must be discoverable to be re-used. Data discovery represents the descriptive and technical processing of data and metadata, as well as the tools and infrastructure aimed at improving access and reuse of research data on the web. A Canadian data discovery service would make it easier to find and reuse research data held in institutional and disciplinary repositories. We would like to see a service that provides a coherent, single point of access to authoritative, searchable, browsable, and machine actionable descriptions (metadata) for datasets and implements clear means for accessing them, thus increasing the likelihood of discovery and reuse of research data in Canada. In this paper, we highlight current opportunities and issues related to developing such a service in Canada. Based on a review of international and national research data repositories and data discovery services, we offer a set of guiding principles, best practices, and recommendations for data discovery: Common metadata: the descriptive information that accompanies research data should meet minimum standards to enable discovery and support data reuse. This requires a commitment to a core set of metadata components across domains. Metadata tools should accommodate multiple, overlapping metadata namespaces, i.e., descriptive terms assigned, managed, and grouped into collections of classes and attributes. We also recommend building separate, flexible metadata harvesters for indexing specialized repositories, so that domain-specific metadata and granularity can be retained in its original format. Persistent Identification: the use of global identifiers for researchers and research data. We recommend exploring a national ORCID agreement so that universities and government agencies in Canada can integrate researcher identifiers into institutional and other research management and publishing software. We also recommend registering DOIs corresponding to datasets in participating repositories with DataCite Canada. These DOIs will greatly enhance dataset discoverability via DataCite’s metadata partners (e.g. ORCID, VIVO, etc). Open Access and Programmatic Interfaces: the use of an application program interface (API) allowing one piece of software to make use of the functionality or data available to another through a set of routines, protocols, and tools. Metadata and data should be programmatically accessible for reuse and development purposes through the provision of APIs among participating repositories and data discovery platforms. Common licensing: policies and licenses should govern access to data and metadata and, whenever possible, should be minimally restrictive. We recommend the use of Creative Commons licenses for research data as they effectively communicate information about the copyright holders’ intentions and clarify usage permissions. Licensing can apply to data and metadata, although we strongly recommend that metadata be provided as openly as possible, with minimal to no restrictions on reuse in order to facilitate discovery. Collaboration: a joint commitment to shared recognition and cooperation among actors, organizations, data producers, and researchers, sometimes described as “coexistence in the scholarly ecosystem.” We emphasize that collaboration will drive improvements for data discovery in Canada. A well coordinated national project will ensure that all attempts to improve discovery and access to data will be informed and facilitated by stakeholder expectations, participation, and collaboration. Keeping stakeholders engaged and providing clear communication channels are key for the success of a national data discovery service. This paper is presented with a common goal to make research data as widely discoverable and accessible as possible, thus enhancing opportunities for data reproducibility and reuse. Enhancing data discovery is one approach to facilitating greater interoperability and discovery of scholarly outputs. Building national infrastructure to support research data discovery will greatly enhance opportunities for further integration across the scholarly ecosystem, including support for metadata, global identifiers, and open APIs.