Published May 15, 2019 | Version 3
Dataset Open

CWL run of Somatic Variant Calling Workflow (CWLProv 0.5.0 Research Object)

  • 1. Department of Computing and Information Systems, The University of Melbourne, Australia
  • 2. School of Computer Science, The University of Manchester, UK

Description

The somatic variant calling workflow included in this case study is designed by Blue Collar Bioinformatics (bcbio), a community-driven initiative to develop best-practice pipelines for variant calling, RNA-seq and small RNA analysis workflows. According to the documentation, the goal of this project is to facilitate the automated analysis of high throughput data by making the resources quantifiable, analyzable, scalable, accessible and reproducible.

All the underlying tools are containerized, facilitating software use in the workflow. The somatic variant calling workflow defined in CWL is available on GitHub and equipped with a well defined test dataset.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwlprov/ to explore

Steps to reproduce

To build the research object again, use Python 3 on macOS. Built on:

  • Processor 2.8GHz Intel Core i7
  • Memory: 16GB
  • OS: macOS High Sierra, Version 10.13.3
  • Storage: 250GB

To run the workflow:
 

pip3 install cwltool==1.0.20180912090223
git clone https://github.com/FarahZKhan/bcbio_test_cwlprov
cd bcbio_test_cwlprov/somatic/somatic-workflow/
cwltool --provenance somaticwf_0.5.0_mac main-somatic.cwl main-somatic-samples.json

To package the research object:
 

zip -r somaticwf_0.5.0_mac.zip somaticwf_0.5.0_mac/
sha256sum somaticwf_0.5.0_mac.zip > somaticwf_0.5.0_mac.zip.sha256

The cloned git repository is a fork of https://github.com/bcbio/test_bcbio_cwl. It was obtained using:

wget -O test_bcbio_cwl.tar.gz https://github.com/bcbio/test_bcbio_cwl/archive/master.tar.gz

The content is from an archived version from the documentation here: https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#install-bcbio-vm-with-containers

Notes

Mirrored from Mendeley Data https://data.mendeley.com/datasets/97hj93mkfd/3

Files

somaticwf_0.5.0_mac.zip

Files (32.6 MB)

Name Size Download all
md5:9a4785d6df5c6263d364919d3b9750d7
32.6 MB Preview Download
md5:f57e10516cce41cba4e939153f830b70
90 Bytes Download

Additional details

Funding

BioExcel – Centre of Excellence for Biomolecular Research 675728
European Commission