github.com/aofarrel/SRANWRP/get_all_organisms_from_biosample
Description
SRAnwrp
SRAnwrp ("Saran Wrap") envelops several SRA-related tools in the warm, polyethylene embrace of a single Ubuntu-based Docker image and some optional assorted workflows. For the sake of simplicity, releases on main follow the same versioning scheme as the Docker image.
What tasks can it perform?
The combination of e-direct and sra-tools allows it do basically anything you can do from SRA's website. These exist in the form of WDL workflows -- more on WDL here.
Pulling FASTQs
- Pull paired FASTQs from a list of run accessions (SRR/ERR/DRR)
- Pull paired FASTQs from a lit of BioSample accessions - can be SRS or SAME notation
- Plus some bonus non-workflow pulling tasks
- Note -- as a pre-3.0.5 version of fasterq-dump is being used, pulling non-Illumina fastqs is not supported.
- Note -- it is recommended you set the disk_size variable to 20x the size of the largest .sra that you want to download.
Getting Organism + TaxID from a list of BioProject/BioSample accessions
There's a lot of BioProjects on SRA, and some of them are multi-species. Use this workflow to get a list of all run accessions, and said run accessions' species and TaxIDs, from a list of BioProject accessions. If you instead have a list of BioSamples, use this workflow to get species and taxid (as well as a list of all run accessions).
Getting sample accessions from run accessions (SRR/ERR/DRR)
If you have a list of run accessions, this workflow will get a list of sample accessions that they cover. Some samples have more than one run -- those samples will only appear in the output once.
Other stuff?
Here's some other tasks that can help you convert between data types.
What's included in the Docker image?
Non-exhaustive list:
- The TB reference genome and a BED of its commonly masked regions
- bash-5.1.16(1)-release
- bedtools-latest
- bc-latest
- bcftools-1.16
- cpan-latest
- curl-latest
- entrez-direct-latest (aka edirect)
- gcc-latest
- git-latest
- htslib-1.16
- make-latest
- Matplotlib-latest
- numpy-latest
- pandas-latest
- pigz-latest
- python-3.12
- note: must be called with
python3instead ofpython(andpip3instead ofpip) when running non-interactively
- note: must be called with
- samtools-1.16
- mpileup, minimap2, fixmate, etc
- seqtk-latest
- sra-tools-3.0.1 (aka SRAtools, SRA tools, SRA toolkit, etc)
- align-info, fastq-dump, fasterq-dump, prefetch, sam-dump, sra-pileup, etc
- fyi: ncbi/ncbi-vdb was merged with sra-tools in sra-tools-3.0.0 and vdb-get was retired in 3.0.1
- sudo-latest
- taxoniumtools-latest
- tree-latest
- vim-latest
- wget-latest
Who builds?
Right now, the image is built and pushed manually. You'll need to include your own copy of the TB reference tarball -- it can be created with clockwork refprep, or downloaded from this Google bucket. MD5s are provided in this repo as a double-check.
Why?
- Docker Hub's latest version of staphb/sratoolkit, as of my writing this in October 2022, runs version 2.9.2 (see command 15), which doesn't work at all anymore
- Existing Docker images tend to contain either the SRA toolkit or Entrez Direct, not both
- Building SRA Toolkit on your own, without conda, is not intuitive
- Building SRA Toolkit on your own, with conda, is also not intutive (you usually end up with v2.10 which only sometimes works)
- No need to run
vdb-config --interactiveor any other interactive process before using anything in this image; SRA Toolkit's config file is generated while building the image
Files
github.com-aofarrel-SRANWRP-get_all_organisms_from_biosample_v1.1.19.zip
Files
(9.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7d7ce98ac8d4be473be3197f69d996f6
|
9.0 kB | Preview Download |
Additional details
Related works
- Is identical to
- https://dockstore.org/aliases/workflow-versions/10.5281-zenodo.16414660 (URL)
- https://dockstore.org/workflows/github.com/aofarrel/SRANWRP/get_all_organisms_from_biosample:v1.1.19 (URL)
- https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Faofarrel%2FSRANWRP%2Fget_all_organisms_from_biosample/versions/v1.1.19/PLAIN-WDL/descriptor/get_organisms_from_biosample.wdl (URL)