Data quality assurance at research data repositories - results from a survey
Data quality assurance is a central aspect of data curation, as it ensures that data are valid, reliable and therefore reusable. Despite it being a prerequisite for data reuse, information on data quality assurance measures is currently sparse, both at the level of repositories and individual datasets.
To learn more about data quality assurance at research data repositories, we conducted a survey among staff responsible for data curation at repositories listed in re3data, an international registry of research data repositories. Of the 1897 repositories that were contacted, 332 completed the questionnaire.
The survey covered several aspects of data quality assurance, including data collection criteria, support for data depositors, formal assessment and review of data, as well as data rejection rates and the involvement of repository users in post-publication data review. The survey distinguished between formal assessment of data and data review. Formal assessment refers to technical, administrative and access-related aspects of data, whereas data review refers to the process by which experts, either from the hosting institution or from other institutions, evaluate the scientific quality of datasets.
To ensure a homogenous collection, most repositories check whether data fit the scope of the repository in general (71.4 %, 237), others require data to pass formal assessment before deposit (31.9 %, 106), or that they correspond to a peer-reviewed publication (27.4 %, 91).
Most repositories offer direct, individualized support to data depositors (73.5 %, 244), others provide data deposit guidelines (62.7 %, 208)
We found that 62.3 % (207) of responding repositories apply formal criteria, and 51.5 % (171) conduct data review either for all (31.6 %, 105) or some (19.9 %, 66) datasets.
A third of the repositories (33.1 %, 110) would consider rejecting data as a consequence of insufficient quality; these repositories report a median rejection rate of 3 % in the last two years.
Currently, repositories rarely involve repository users in the post-publicaiton peer review of data, for example in the form of public comments (6.6 %, 22) or data ratings (1.5 %, 5).
Repositories adopt different strategies for communicating indicators of data quality to repository users; most commonly, references to corresponding publications are added (69.9 %, 232). Data quality information is included in metadata at 26.5 % (88) of the surveyed repositories.
The survey is part of the project re3data COREF (Community Driven Open Reference for Research Data Repositories), a project funded by the German research foundation DFG aiming at transforming re3data into a central service for the open science community. Results of the survey will be shared with the repository community and inform the development of a framework for research data quality assurance, which will be implemented in a future version of the re3data Metadata Schema.
- Is supplemented by
- Dataset: 10.5281/zenodo.6457848. (DOI)