Published July 9, 2018 | Version v1
Poster Open

Repositorg: FOSS Workflows for Transfering, Repositing, Renaming, and Standardizing Large Data Collections

  • 1. Institute for Biomedical Engineering, ETH and University of Zurich

Description

Many data modalities, and optical imaging in particular, rely on large and heterogeneous collections of relatively small data files produced by proprietary programs on restrictively managed systems. The advent of big data, however, mandates data standardization and access to data on transparent and reproducibly managed data analysis systems. Efforts to manually enforce standardization, as well as manual repositing, fragment the imaging analysis workflow, increase the probability of data loss and corruption, and cannot guarantee perfect standard compliance.

Repositing files and standardizing formats and naming via simple in-house scripts entails significant effort duplication across groups, encourages standards divergence, and creates unsustainable workflow dependencies. Here we present a pipeline package designed to accelerate and automate (a) data transfer between a potentially proprietary acquisition and an analysis system, (b) basic preprocessing in order to enforce data format standards, (c) file renaming to continuous namespaces with duplicate detection.

Files

poster.pdf

Files (475.3 kB)

Name Size Download all
md5:b12f392bd33007ccbb42ed0021bb283f
475.3 kB Preview Download

Additional details

Related works