Proposal Open Access
Philipp Cimiano; John McCrae; Najko Jahn; Christian Pietsch; Jochen Schirrwagen; Johanna Vompras; Cord Wiljes
Reproducibility is a cornerstone of the scientific process. While the reproduction of an experiment can be extremely difficult, the ability to reproduce the (computational) analysis of the data that supported a certain conclusion (e.g. the validation of a hypothesis) should be a minimum requirement on every piece of published research. We call this type of reproducibility analytical reproducibility.
The ability to reproduce the analytic results of a certain piece of research requires, as a minimum, that: i) the primary or secondary data is available, ii) the data is syntactically well-formed and ready-to-use, iii) the data is appropriately documented, iv) the analysis procedures (e.g. scripts) that were used to process or analyze the data are available, and v) these analytic procedures can be run on the data to reproduce the actual result published in a paper. Analytical reproducibility is often hampered by the fact that one of the above requirements is not met.
The goal of this project is to extend the infrastructure available at Bielefeld University for the management of data and publications by a framework that supports researchers in meeting the above mentioned requirements and thus to make their work analytically reproducible. Departing from current practices where data and software is published at the end of a research project, if at all, we intend to move the hosting of data to the very beginning of the scientific process. Borrowing ideas from computer science and from continuous integration, we intend to implement a continuous quality control framework that from early on encourages researchers to publish their data and analytic procedures so that these can easily be re-used and verified. The way we understand quality of data in the project is thus in the sense of readiness to be re-used and validated.
Towards this goal, we will interact with a selected group of researchers at Bielefeld University that have committed themselves to define a use case, provide requirements, implement pilots, and continuously work with the infrastructure and provide regular feedback. The researchers come from disciplines as varied as psychology, sports sciences, biology, chemistry, cognitive linguistics, computational linguistics, robotics as well as economics. By involving a varied set of disciplines, our goal is to identify common requirements on an infrastructure that supports data quality as a continuous process, and supports sharing and external validation of research results. Besides extending our infrastructure, the project can be expected to have an impact way beyond Bielefeld University. By sharing our experiences and requirements identified, we hope to inform other universities and policy makers on the trade-off between effort and return-on-investment and which policies to adopt to support higher transparency in research.