Published October 4, 2016 | Version v1
Dataset Open

SNP and indel discovery and genotyping in next-generation sequencing data

  • 1. University of Sussex

Description

Code, logs and data for discovery and genotyping of SNPs and indels, in the the D.melanogaster genome, using GATK HaplotypeCaller. Code is in the zipped folder named code.zip. Run logs for this code as in the zipped folder named logs.zip. The unfiltered vcf genotypes file is named lhm_rg_HC_2015-09-15.vcf.gz. The filtered vcf genotypes file is named f1.lhm_rg_HC_raw.vcf.gz. The vcf submitted to NCBI dbSNP (filtered, and with indels >50bp and variants with null alternate alleles both removed) is named dbSNP.lhm_rg_HC_raw.vcf.gz. The folder local_reference.zip contains the reference assembly files against which genotypes were called against, and includes the code used to format the data prior to use. Also included is genotypes data from the two in-house reference line samples sequenced (BDGP6+ISO1 mito/dm6, Bloomington Drosophila Stock Center no. 2057)

Samples are 220 Sussex-LHM hemiclones, and 2 RG. The first run did not include chromosome 4 and the mitochondrial genome, so these were genotyped separately, and then added to the rest of the results.

The link for the NCBI dbSNP record is currently https://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewBatch.cgi?sbid=1062461and the submitter handle is MORROW_EBE_SUSSEX.

At the time of writting, the NCBI D.melanogaster build is still being updated, and therefore ss identifiers, but not rs identifers are available.

The pre-print manuscript for this data is available on biorxiv: "Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample" http://biorxiv.org/content/early/2016/10/17/081554 doi: http://dx.doi.org/10.1101/081554

Files

code.zip

Files (7.8 GB)

Name Size Download all
md5:48c781b4ad2b7d57d88c2a83dde16c03
6.8 kB Preview Download
md5:67108750de5d38acea322ae503f2a982
2.3 GB Download
md5:04b9ceb95bc326da5d6b622e5c1f19a0
2.6 GB Download
md5:6893aaf4d0a22b03f086634f5122c92d
2.7 GB Download
md5:b48ca4ce7d1dbb6b49cf96cbd15eb756
45.2 MB Preview Download
md5:c050e67c73985bcb797b9268c8275cac
211.6 kB Preview Download
md5:ec69be1f6531d1e197642f55dc814de5
757.0 kB Download
md5:d6173e2a6aab873e93910427e63840f8
64.0 MB Preview Download
md5:448d08dcaf6f4607a2df8ee1adfa09fd
18.0 MB Preview Download

Additional details

Funding

2SEXES_1GENOME – Sex-specific genetic effects on fitness and human disease 280632
European Commission