Raw BRCA1/2 variants in breast cancer patients and healthy relatives produced with GATK.
Description
Aligned sequencing data is available in the NCBI Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) under accession SRP095082. Variants were called using GATK HaplotypeCaller (version 3.6). After joint performing joint genotyping multi-sample vcf file was generated. Next, SNPs and indels were extracted into two different vcf files and specific set of filters were applied for each case.
File descriptions
Datasets
BRCA_SNVs.vcf - this file contains SNPs called with GATK and hard filters applied. Following filtering options were applied: "QD < 2.0", "FS > 60.0", "MQ < 40.0", "MQRankSum < -12.5", "ReadPosRankSum < -8.0", "SB < -0.10" , "DP < 10" , "GQ < 30" , and "SOR > 3.0"
BRCA_indels.vcf - This file contains indels called with GATK and hard filters applied. Following filtering options were applied: "QD < 2.0", "FS > 200.0", "ReadPosRankSum < -20.0", "InbreedingCoeff < -0.8", "SOR > 10.0".
Scripts package (scritps.zip)
Scripts.zip file contains scripts and supporting files for genotype calling and filtering.
raw.variant.caling.sh – bam files preprocessing, alignment refining and raw genotype calling with HaplotypeCaller.
genotyping_and_filtering.sh – joint genotyping, variant hard filtering and callset refinement.
LIST.txt – supporting file that contains bam filenames containing aligned reads.
sample_order.txt – supporting file for sample renaming.
Reference files (hg19) used in variant calling scripts
Reference files can be downloaded from GATK bundle web-site at https://software.broadinstitute.org/gatk/download/bundle.
ucsc.hg19.fasta - human genome assembly;
Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz – set of known indels to be used for local realignment;
1000G_phase1.indels.hg19.sites.vcf.gz – set of known indels to be used for local realignment;
dbsnp_138.hg19.vcf.gz – a recent dbSNP release (build 138);
1000G_phase3_v4_20130502.hg19.lifted.sites.vcf – the latest set from 1000G phase 3 (v4) for genotype refinement.
Files
scritps.zip
Files
(255.0 kB)
Name | Size | Download all |
---|---|---|
md5:a186bc3435a898cd6e169194aab6d621
|
58.1 kB | Download |
md5:fc1c2a2b52578630166b557846c0a8ba
|
193.5 kB | Download |
md5:9607c82e3990cf245ea3b75b53587f0f
|
3.4 kB | Preview Download |