Published May 19, 2022 | Version 1.0.0
Dataset Open

AmelHap pilot: filter3 data

  • 1. University of the Basque Country
  • 2. University of Edinburgh
  • 3. Beebytes Analytics CIC
  • 4. INRAE Toulouse

Description

Honey bee Apis mellifera drones are typically haploid, developing from an unfertilized egg, inheriting only their queen’s alleles and none from the many drones she mated with. Being haploid, the ordered combination or ‘phase’ of alleles is known, making drones a valuable haplotype resource. We collated whole genome sequence data for 688 drones, including 45 newly sequenced Scottish drones, which collectively represent 13 countries, 7 subspecies and various hybrids strains. After alignment to the reference assembly Amel_Hav3.1, and haploid variant calling, we identified 18.9M variants. 

Whole-genome sequencing data underpinning the dataset is available from the European Nucleotide Archive (ENA), https://www.ebi.ac.uk/ena, with the project accession codes: PRJEB16533, PRJNA311274, PRJNA363032, PRJNA516678, PRJNA544324, and PRJEB39369.

Sequencing reads were aligned to the Amel_HAv3.1 reference genome using BWA-MEM v0.7.17. Reads were sorted with SAMtools v1.9 and duplicates marked (MarkDuplicates) with GATK v4.0.11.0. Variants for each sample were called using GATK’s HaplotypeCaller with the following non-default parameters --ERC GVCF, --sample-ploidy 1 and -A AlleleFraction. Joint variant calling was performed across all samples using GATK’s GenomicDBImport and GenotypeGVCFs with --sample-ploidy 1 and a window size of 2.5 Mb. 

This dataset is the result of applying filters to monomorphic variants in the filter2 dataset, leaving 16.2M variants. The code used in filtering is outlined here: https://bitbucket.org/scriptBee/hapmap-pilot.

Files

Files (28.5 GB)

Name Size Download all
md5:0b7057e1b7c31a1996f2639bb12f7c52
28.5 GB Download
md5:09a94914aeef19ee5905f96f83415b9f
195.1 kB Download
md5:5fb21a32cdca40743c0f8593664076b7
49 Bytes Download
md5:d44282d8af52c80f64c70590278a8882
34.3 kB Download