Published June 15, 2020 | Version v1
Poster Open

The DataLad Handbook: A user-focused and workflow-based addition to standard software documentation

Description

A poster presented at the Organization for Human Brain Mapping (OHBM) conference 2020.

The submission abstract was:
 

Big data has arrived in neuroscience: There is growing awareness of the role of sample size and replicable results (Button et al., 2013; Turner et al., 2018), a rise of platforms, tools, and standards that aim to facilitate data sharing and management (Wiener et al., 2016), and unprecedented sample sizes (e.g., UKBiobank; Bzdok & Yeo, 2017). Together with increasingly complex data analyses, it requires methodological knowledge, but also skills in data and software management, to put open, reproducible, and scalable neuroimaging research into effect. Researchers, planners (PIs or funders), and trainers in neuroscience alike are challenged by these demands (Fothergill et al., 2019, Grisham et al., 2016) and rely on documentation of their tools. Researchers need accessible educational content to understand and use a tool, planners need high-level, non-technical information in order to make informed yet efficient decisions on whether a tool fulfils their needs, and trainers need reliable, open teaching material. But without a computer science background, many neuroscientists receive no formal training in data management. To be broadly accessible, documentation needs to cultivate confidence in users regardless of their background. However, software documentation can be suboptimal for novices: It may be incomplete, narrowly focused on individual commands, or assume existing knowledge novices lack (Segal, 2007; Pawlik et al., 2015), and can thereby discourage potential users or inhibit the adoption of valuable tools.

DataLad (Halchenko, Hanke et al., 2019) is an open source data management multi-tool that facilitates digital workflows of data and science. It is used as an infrastructure component or utility in a growing number of services and tools (e.g., OpenNeuro, CBRAIN platform, HeuDiConv, brainlife.io). Despite being accessible (free, open, with support for all major OSs) and comprehensive, adoption among users is impeded by a lack of workflow-based and user-focused educational materials.

Methods

A user-driven alternative to scientific software documentation by software developers, “Documentation Crowdsourcing”, has been successfully employed by the NumPy project (Oliphant, 2006; Pawlik et al., 2015). Extending this concept beyond documentation, we have created the DataLad handbook (handbook.datalad.org) as a free & open-source, user-driven and -focused educational instrument and resource for trainers, users, and planners for (research) data management, independent of their background and skill level. The handbook contains three components: ‘The Basics’, a course-like, continuous narrative that integrates basic functions and concepts into larger workflows as a ‘code-along’ crash course serves as a tutorial. Recipe-like step-by-step how-to-guides (‘use cases’) show high-level overviews for specific applications. Expandable sections contain background information and serve as explanations. The code is tested automatically to ensure correct operation, and the comprehensibility of contents is tested and improved via handbook-based teaching sessions for data management novices (various career stages, MSc students up to PI-level). Slides and automatic code demos can be semi-automatically extracted as readily available, tested resources for teachers. 

Results

The handbook serves several needs: 1) learning-oriented and lesson-like tutorials allow novices to get started, 2) goal-oriented how-to-guides show planners how to solve specific problems, 3) optional explanations provide background and context beyond what is necessary to use a concept or command, and 4) a continuously tested codebase allows trainers to get reliable teaching materials with reproducible examples.

Conclusions

We present an accessible, free, and open source educational instrument to explain and demonstrate state-of-the-art data management procedures with DataLad. The framework behind the project is generic, and could be reused for similar training materials. 


The sources can be found at https://github.com/datalad-handbook/book/blob/main/artwork

Files

OHBM_2020.pdf

Files (867.7 kB)

Name Size Download all
md5:0905f9e204b9921421bdcaa4cc83643a
867.7 kB Preview Download