Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published February 14, 2022 | Version 1.0
Thesis Open

Infrastructures and Practices for Reproducible Research in Geography, Geosciences, and GIScience

  • 1. Institute for Geoinformatics, University of Münster, Münster, Germany

Contributors

  • 1. Institute for Geoinformatics, University of Münster, Münster, Germany
  • 2. University of Washington, Seattle, WA, USA

Description

PhD thesis compilation of articles, including an introduction and synopsis, and the defense presentation (one document with slides only and one with speaker notes).

Abstract

Reproducibility of computational research, i.e., research based on code and data, poses enormous challenges to all branches of science. In this dissertation, technologies and practices are developed to increase reproducibility and to connect it better with the process of scholarly communication with a particular focus on geography, geosciences, and GIScience. Based on containerisation, this body of work creates a platform that connects existing academic infrastructures with a newly established executable research compendium (ERC). It is shown how the ERC can improve transparency, understandability, reproducibility, and reusability of research outcomes, e.g., for peer review, by capturing all parts of a workflow for computational research. The core part of the ERC platform is software that can automatically capture the computing environment, requiring authors only to create computational notebooks, which are digital documents that combine text and analysis code. The work further investigates how containerisation can be applied independent of ERCs to package complex workflows using the example of remote sensing, to support data science in general, and to facilitate diverse use cases within the R language community. Based on these technical foundations, the work concludes that functioning practical solutions exist for making reproducibility possible through infrastructure and making reproducibility easy through user experience. Several downstream applications built on top of ERCs provide novel ways to discover and inspect the next generation of publications.

To understand why reproducible research has not been widely adopted and to contribute to the propagation of reproducible research practices, the dissertation continues to investigate the state of reproducibility in GIScience and develops and demonstrates workflows that can better integrate the execution of computational analyses into peer review procedures.

We make recommendations for how to (re)introduce reproducible research into peer reviewing
and how to make practices to achieve the highest possible reproducibility normative, rewarding, and, ultimately, required in science. These recommendations are rest upon over 100 GIScience papers which were assessed as irreproducible, the experiences from over 30 successful reproductions of workflows across diverse scientific fields, and the lessons learned from implementing the ERC.

Besides continuing the development of the contributed concepts and infrastructure, the dissertation points out broader topics of future work, such as surveying practices for code execution during peer review of manuscripts, or reproduction and replication studies of the fundamental works in the considered scientific disciplines. The technical and social barriers to higher reproducibility are strongly intertwined with other transformations in academia, and, therefore, improving reproducibility meets similar challenges around culture change and sustainability. However, we clearly show that reproducible research is achievable today using the newly developed infrastructures and practices. The transferability of cross-disciplinary lessons facilitates the establishment of reproducible research practices and, more than other transformations, the movement towards greater reproducibility can draw from accessible and convincing arguments both for individual researchers as well as for their communities.

 

Notes

See cover pages for each article for the license of the respective work. Introduction and synopsis are published under a CC-BY 4.0 license.

Files

PhD Daniel Nüst - WWU Münster - 2022.pdf

Additional details

References

  • Knoth, C., & Nüst, D. (2017). Reproducibility and Practical Adoption of GEOBIA with Open-Source Software in Docker Containers. Remote Sensing, 9(3), 290. https://doi.org/10. 3390/rs9030290
  • Konkol, M., Nüst, D., & Goulier, L. (2020). Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication. Research Integrity and Peer Review, 5(1), 10. https://doi.org/10.1186/s41073-020-00095-y
  • Nüst, D. (2021). A web service for executable research compendia enables reproducible publications and transparent reviews in geospatial sciences. Zenodo. https://doi.org/10.5281/zenodo.5108218
  • Nüst, D., Eddelbuettel, D., Bennett, D., Cannoodt, R., Clark, D., Daróczi, G., Edmondson, M., Fay, C., Hughes, E., Kjeldgaard, L., Lopp, S., Marwick, B., Nolis, H., Nolis, J., Ooi, H., Ram, K., Ross, N., Shepherd, L., Sólymos, P., Swetnam, T. L., Turaga, N., Petegem, C. V., Williams, J., Willis, C., & Xiao, N. (2020). The Rockerverse: Packages and Applications for Containerisation with R. The R Journal, 12(1). https://doi.org/10.32614/RJ-2020-007
  • Nüst, D., & Hinz, M. (2019). containerit: Generating Dockerfiles for reproducible research with R. Journal of Open Source Software, 4(40), 1603. https://doi.org/10.21105/joss.01603
  • Nüst, D., Konkol, M., Pebesma, E., Kray, C., Schutzeichel, M., Przibytzin, H., & Lorenz, J. (2017). Opening the Publication Process with Executable Research Compendia. D-Lib Magazine, 23(1/2). https://doi.org/10.1045/january2017-nuest
  • Nüst, D., & Pebesma, E. (2021). Practical reproducibility in geography and geosciences. Annals of the American Association of Geographers, 111(5), 1300–1310. https://doi.org/10. 1080/24694452.2020.1806028
  • Nüst, D., Sochat, V., Marwick, B., Eglen, S. J., Head, T., Hirst, T., & Evans, B. D. (2020). Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology, 16(11), 1–24. https://doi.org/10.1371/journal.pcbi.1008316
  • Niers, T., & Nüst, D. (2020). Geospatial Metadata for Discovery in Scholarly Publishing. Septentrio Conference Series, 4. https://doi.org/10.7557/5.5590
  • Nüst, D., Boettiger, C., & Marwick, B. (2018). How to Read a Research Compendium. arXiv:1806.09525 [Cs]. http://arxiv.org/abs/1806.09525
  • Nüst, D., & Eglen, S. J. (2021). CODECHECK: An Open Science initiative for the independent execution of computations underlying research articles during peer review to improve reproducibility. F1000Research, 10, 253. https://doi.org/10.12688/f1000research.51738.1
  • Nüst, D., Granell, C., Hofer, B., Konkol, M., Ostermann, F. O., Sileryte, R., & Cerutti, V. (2018). Reproducible research and GIScience: An evaluation using AGILE conference papers. PeerJ, 6, e5072. https://doi.org/10.7717/peerj.5072
  • Nüst, D., Lohoff, L., Einfeldt, L., Gavish, N., Götza, M., Jaswal, S. T., Khalid, S., Meierkort, L., Mohr, M., Rendel, C., & Eek, A. van. (2019). Guerrilla Badges for Reproducible Geospatial Data Science. AGILE Short Papers. https://doi.org/10.31223/osf.io/xtsqh
  • Ostermann, F. O., Nüst, D., Granell, C., Hofer, B., & Konkol, M. (2020). Reproducible Research and GIScience: An evaluation using GIScience conference papers. EarthArXiv. https://doi.org/10.31223/X5ZK5V