Published May 19, 2022 | Version 1.0.0
Dataset Open

AmelHap pilot: raw data

  • 1. University of the Basque Country
  • 2. University of Edinburgh
  • 3. Beebytes Analytics CIC
  • 4. INRAE Toulouse

Description

Honey bee Apis mellifera drones are typically haploid, developing from an unfertilized egg, inheriting only their queen’s alleles and none from the many drones she mated with. Being haploid, the ordered combination or ‘phase’ of alleles is known, making drones a valuable haplotype resource. We collated whole genome sequence data for 688 drones, including 45 newly sequenced Scottish drones, which collectively represent 13 countries, 7 subspecies and various hybrids strains. After alignment to the reference assembly Amel_Hav3.1, and haploid variant calling, we identified 18.9M variants. 

Whole-genome sequencing data underpinning the dataset is available from the European Nucleotide Archive (ENA), https://www.ebi.ac.uk/ena, with the project accession codes: PRJEB16533, PRJNA311274, PRJNA363032, PRJNA516678, PRJNA544324, and PRJEB39369.

Sequencing reads were aligned to the Amel_HAv3.1 reference genome using BWA-MEM v0.7.17. Reads were sorted with SAMtools v1.9 and duplicates marked (MarkDuplicates) with GATK v4.0.11.0. Variants for each sample were called using GATK’s HaplotypeCaller with the following non-default parameters --ERC GVCF, --sample-ploidy 1 and -A AlleleFraction. Joint variant calling was performed across all samples using GATK’s GenomicDBImport and GenotypeGVCFs with --sample-ploidy 1 and a window size of 2.5 Mb. 

This dataset is unfiltered, and contains all variants regardless of quality or call rate.

 

Files

Files (29.7 GB)

Name Size Download all
md5:b45e40f7d2b22ad7264de40108591d57
29.7 GB Download
md5:264e23d32035a775e52f5f8bbcf3dc03
195.7 kB Download
md5:d112061516f9cfe745887682da9186d7
45 Bytes Download
md5:d44282d8af52c80f64c70590278a8882
34.3 kB Download