Published July 6, 2017 | Version v1
Presentation Open

Automatically archiving reproducible studies with Docker

  • 1. Institute for Geoinformatics, University of Münster

Contributors

  • 1. Institute for Geoinformatics, University of Münster

Description

 

 

 

 

 

 

Reproducibility of computations is crucial in an era where data is born digital and analysed algorithmically. Most studies however only publish the results, often with figures as important interpreted outputs. But where do these figures come from? Scholarly articles must provide not only a description of the work but be accompanied by data and software. R offers excellent tools to create reproducible works, i.e. Sweave and RMarkdown. Several approaches to capture the workspace environment in R have been made, working around CRAN’s deliberate choice not to provide explicit versioning of packages and their dependencies. They preserve a collection of packages locally (packrat, pkgsnap, switchr/GRANBase) or remotely (MRAN timemachine/checkpoint), or install specific versions from CRAN or source (requireGitHub, devtools). Installers for old versions of R are archived on CRAN. A user can manually re-create a specific environment, but this is a cumbersome task.
We introduce a new possibility to preserve a runtime environment including both, packages and R, by adding an abstraction layer in the form of a container, which can execute a script or run an interactive session. The package containeRit automatically creates such containers based on Docker. Docker is a solution for packaging an application and its dependencies, but shows to be useful in the context of reproducible research (Boettiger 2015). The package creates a container manifest, the Dockerfile, which is usually written by hand, from sessionInfo(), R scripts, or RMarkdown documents. The Dockerfiles use the Rocker community images as base images. Docker can build an executable image from a Dockerfile. The image is executable anywhere a Docker runtime is present.

containeRit uses harbor for building images and running containers, and sysreqs for installing system dependencies of R packages. Before the planned CRAN release we want to share our work, discuss open challenges such as handling linked libraries (see discussion on geospatial libraries in Rocker), and welcome community feedback.

 

https://user2017.sched.com/event/785b77e931775df6849f108615605c01

https://github.com/o2r-project/containerit/

Notes

This work is supported by the project Opening Reproducible Research (Offene Reproduzierbare Forschung) funded by the German Research Foundation (DFG) under project numbers PE 1632/10-1, KR 3930/3-1 and TR 864/6-1.

Files

useR!2017-nuest-containerit-presentation.pdf

Files (3.3 MB)

Name Size Download all
md5:7f00fc48991af1b473a76e0636719d2e
3.3 MB Preview Download

Additional details

Related works

References

  • Nüst, Daniel, Markus Konkol, Edzer Pebesma, Christian Kray, Marc Schutzeichel, Holger Przibytzin, and Jörg Lorenz. 2017. "Opening the Publication Process with Executable Research Compendia." D-Lib Magazine 23 (January). doi:10.1045/january2017-nuest.
  • Boettiger, Carl. 2015. "An Introduction to Docker for Reproducible Research, with Examples from the R Environment." ACM SIGOPS Operating Systems Review 49 (January): 71–79. doi:10.1145/2723872.2723882.