Published May 19, 2022 | Version 1.0.0
Dataset Open

AmelHap pilot: filter1 data

  • 1. University of the Basque Country
  • 2. University of Edinburgh
  • 3. Beebytes Analytics CIC
  • 4. INRAE Toulouse

Description

Honey bee Apis mellifera drones are typically haploid, developing from an unfertilized egg, inheriting only their queen’s alleles and none from the many drones she mated with. Being haploid, the ordered combination or ‘phase’ of alleles is known, making drones a valuable haplotype resource. We collated whole genome sequence data for 688 drones, including 45 newly sequenced Scottish drones, which collectively represent 13 countries, 7 subspecies and various hybrids strains. After alignment to the reference assembly Amel_Hav3.1, and haploid variant calling, we identified 18.9M variants. 

Whole-genome sequencing data underpinning the dataset is available from the European Nucleotide Archive (ENA), https://www.ebi.ac.uk/ena, with the project accession codes: PRJEB16533, PRJNA311274, PRJNA363032, PRJNA516678, PRJNA544324, and PRJEB39369.

Sequencing reads were aligned to the Amel_HAv3.1 reference genome using BWA-MEM v0.7.17. Reads were sorted with SAMtools v1.9 and duplicates marked (MarkDuplicates) with GATK v4.0.11.0. Variants for each sample were called using GATK’s HaplotypeCaller with the following non-default parameters --ERC GVCF, --sample-ploidy 1 and -A AlleleFraction. Joint variant calling was performed across all samples using GATK’s GenomicDBImport and GenotypeGVCFs with --sample-ploidy 1 and a window size of 2.5 Mb. 

This dataset is the result of applying filters to exclude variants with 'QD<20 || QD>40 || MQ < 50 || SOR >3' in the raw dataset, leaving 16.6M variants. The code used in filtering is outlined here: https://bitbucket.org/scriptBee/hapmap-pilot.

Files

Files (29.8 GB)

Name Size Download all
md5:300af8184de54d1e77cf0b696927e96d
29.8 GB Download
md5:8a41d158a23fed00fe6a0ce967930194
195.6 kB Download
md5:9cae4958bb631cce4e3af501ad429b46
49 Bytes Download
md5:d44282d8af52c80f64c70590278a8882
34.3 kB Download