An automated workflow for parallel processing of large multiview SPIM recordings


	2 Processing workflow

The Fiji SPIM processing pipeline uses Hierarchical Data Format (HDF5) as data container for the originally generated TIFF or CZI files by custom made (Pitrone et al., 2013) or commercial SPIM microscopes (Fig. 1A and B). Following format conversion, multiview registration aligns the different acquisition angles (views) within each time point (Fig. 1C), and subsequent time-lapse registration stabilizes the recording over time (Preibisch et al., 2010) (Fig. 1D). Fusion combines the registered views of one time point into a single volume by averaging or multiview deconvolution (Preibisch et al., 2010,2014) (Fig. 1E and F). The result is a set of HDF5 files containing registered and fused multiview SPIM data that can be examined locally or remotely using the BigDataViewer (Pietzsch et al., 2015).

All steps are implemented as plugins (Preibisch et al., 2010,2014;Pietzsch et al., 2015; Preibisch, unpublished ( https://github.com/fiji/SPIM_Registration)), in the open-source platform Fiji (Schindelin et al., 2012). We use these plugins by executing them from the command line as Fiji beanshell scripts ( Supplementary Fig. 1). To overcome the legacy dependency of Fiji on the GUI we encapsulate it in a virtual framebuffer (xvfb) that simulates a monitor in the headless cluster environment ( Supplementary Fig. 1).
To map and dispatch the workflow logic to a single workstation or on a HPC cluster, we use the automated workflow engine snakemake (Köster and Rahmann, 2012). The workflow is defined using a Snakefile containing the name, input and output file names of each of the processing steps and python code calling the beanshell scripts ( Supplementary Fig. 1). Upon invocation, the snakemake rule engine resolves the dependencies between individual processing steps based on the input files required and the output files produced during the workflow. It also creates the command that fits the input/output rule description and the template command as defined in the Snakefile. Most importantly, if single tasks on individual files are discovered to be independent, they are invoked in parallel ( Supplementary Fig. 2). Each instance of snakemake for one dataset is independent and thus the workflow can be applied simultaneously to multiple dataset.
The required parameters for processing are collected by the user during GUI processing of an exemplary time point and entered into a . yaml configuration file ( Supplementary List 1). The workflow is executed by passing the .yaml file to snakemake on the command line ( Supplementary Fig. 1). Importantly, from the user perspective the launching of the pipeline on a HPC cluster and on a local workstation appears identical and require a single command ( Supplementary List 2). If the parameters are chosen correctly and the local or HPC resources are sufficient ( Supplementary Table 1 and 2) no further action from the user is necessary.
Snakemake supports multiple back ends to perform the command dispatch: local, cluster and Distributed Resource Management Application API (DRMAA) (Köster and Rahmann, 2012). The local back end creates a new sub shell and calls the command(s) required. The cluster back end is a general interface to HPC batch systems based on string substitution. DRMAA specifies a system library that interfaces all common batch systems based on a generalized task model, thus multiple batch systems are supported through one interface.


	Supplementary Material

