github.com/DataBiosphere/analysis_pipeline_WDL/null-model-wdl
Authors/Creators
Description
TOPMed Analysis Pipeline — WDL Version
This is a work-in-progress project to implement some components of the University of Washington TOPMed pipeline into Workflow Description Language (WDL) in a way that closely mimics the CWL version of the UW Pipeline. In other words, this is a WDL that mimics a CWL that mimics a Python pipeline. All three pipelines use the same underlying R scripts which do most of the heavy lifting, making their results directly comparable.
Features
- This pipeline is very similar to the CWL version, and while the main differences between the two are documented, testing indicates they are functionally equivalent -- so much so that files generated by the CWL are used as truth files for the WDL
- As it works in a Docker container, it does not have any external dependencies other than the usual setup required for WDL and Cromwell
- Contains multiple checker workflows for validating sets of known inputs and expected outputs
Usage
These workflows are tested on both Terra and the local Cromwell execution engine. Example files are provided in test-data-and-truths and in gs://topmed_workflow_testing/UWGAC_WDL/.
Essentially all workflows which take in chromosome-level files share filename requirements. For these files, the chromosome must be included in the filename with the format chr## where ## is the name of the chromosome (1-24 or X, Y). Chromosome can be included at any part of the filename provided they follow this format. For instance, data_subset_chr1.gds, data_chr1_subset.gds, and chr1_data_subset.gds are all valid names, while data_chromosome1_subset.gds and data_subset_c1.gds are not valid.
The original CWL pipelines had arguments relating to runtime such as ncores and cluster_type that do not apply to WDL. Please familiarize yourself with the runtime attributes of WDL if you are unsure how your settings may transfer. For more information on specific runtime attributes for specific tasks, see the further reading section.
Terra users
For Terra users, it is recommended to import via Dockstore. Importing the correct JSON file for your workflow at the workflow field entry page will fill in test data and recommended runtime attributes for said test data. For example, load vcf-to-gds-terra.json for vcf-to-gds.wdl. If you are using your own data, please be sure to adjust your runtime attributes appropriately.
Local users
Cromwell does not manage resources well on local executions -- parameters such as memory and disks get ignored when Cromwell detects it is not running on the cloud. As a result, these pipelines (LD pruning especially) may get their processes killed by your OS for hogging too much memory, or completely lock up Docker, even on a relatively powerful machine running on downsampled test data. That being said, preliminary testing of these pipelines is performed on a local machine running OSX Catalina, so while we cannot officially support this method of execution, the only thing really blocking it from running smoothly on a local machine is Cromwell's resource management and the power needed by some of these algorithms. These issues can generally be avoided by changing the concurrent job limit in your Cromwell configuration. See instructions here for how to set it in the Dockstore CLI.
Further reading
- checker workflows
- Association testing -- aggregate: assoc-aggregate
- KING IBDSEG: KING
- Linkage disequilibrium pruning: ld-pruning
- Null model generation: null-model
- pc-air
- pc-relate
- VCF to GDS file conversion: vcf-to-gds
Author
Ash O'Farrell (aofarrel@ucsc.edu)
Files
github.com-DataBiosphere-analysis_pipeline_WDL-null-model-wdl_v7.1.1.zip
Files
(7.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:238682d58189d2195936952e33280902
|
7.7 kB | Preview Download |
Additional details
Related works
- Is identical to
- https://dockstore.org/aliases/workflow-versions/10.5281-zenodo.16415005 (URL)
- https://dockstore.org/workflows/github.com/DataBiosphere/analysis_pipeline_WDL/null-model-wdl:v7.1.1 (URL)
- https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2FDataBiosphere%2Fanalysis_pipeline_WDL%2Fnull-model-wdl/versions/v7.1.1/PLAIN-WDL/descriptor/null-model.wdl (URL)