There is a newer version of the record available.

Published September 8, 2023 | Version 0.2
Workflow Open

Resources bundle for somatic workflow using HiFi reads

Authors/Creators

Description

The following files are included in the "hifisomatic_resources.tar.gz" resource bundle.

├── chr.bed
├── GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta
├── GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta.fai
├── human_GRCh38_no_alt_analysis_set.trf.bed
├── refFlat.hg38.txt
├── sniffles_all_non_germline.nosamples.vcf.gz
└── sniffles_all_non_germline.nosamples.vcf.gz.tbi

For the Sniffles non_germline VCF, please cite the Human Pangenome Project paper: https://www.nature.com/articles/s41586-022-04601-8. Briefly, Sniffles 2.0.7 were used to generate a joint-call VCF from 118 control samples with the "--non-germline" flag (all other parameters remain default). "human_GRCh38_no_alt_analysis_set.trf.bed" was downloaded from https://github.com/PacificBiosciences/pbsv/blob/master/annotations/human_GRCh38_no_alt_analysis_set.trf.bed. "GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta" was obtained from Wagner, J. et al. (Nat Biotechnol 2022). "chr.bed" is simply the start and end coord of each chromosome and was created manually using the genome size from the fasta index.

In addition, the repo contains two small demo dataset to test the workflow:

"COLO829.30X.SV_region.bam" and "COLO829BL.30X.SV_region.bam" contains region in COLO829 cell lines that has 57 (out of 62) truth SV from Valle-Inclan et. al. 2022. "HCC1395.chr20.30X.bam" and "HCC1395BL.chr20.30X.bam" contains the chr20 region of HCC1395 cell lines.

 

Files

Files (8.9 GB)

Name Size Download all
md5:4f0a58766c98d372f5d36bfc9367963b
3.0 GB Download
md5:3250c6dd079997e18ed9ace65589123a
2.9 GB Download
md5:c14f322a112b56372180886b442b09ac
1.1 GB Download
md5:37de5452e6a84515d80d253d3b5f9a5c
1.0 GB Download
md5:5fd593de69df1256aeffa047cfeb735b
920.1 MB Download
md5:43f7a2b5123b3f8ae9135d5f4c725e3c
145.3 kB Download
md5:5b16df62daf8b532466672dd6f914221
10.1 kB Download