Published August 11, 2022 | Version 1.1.1
Dataset Open

AmelHap

  • 1. University of the Basque Country
  • 2. University of Edinburgh
  • 3. Beebytes Analytics CIC
  • 4. INRAE Toulouse

Description

AmelHap 1.1.0.f4

  • 1,328 samples
  • 17,414,346 variants

Sequencing reads were aligned to the Amel_HAv3.1 reference genome using BWA-MEM v0.7.17. Reads were sorted with SAMtools v1.9 and duplicates marked (MarkDuplicates) with GATK v4.0.11.0. Variants for each sample were called using GATK’s HaplotypeCaller with the following non-default parameters --ERC GVCF, --sample-ploidy 1 and -A AlleleFraction. Joint variant calling was performed across all samples collated for AmelHap using GATK’s GenomicDBImport and GenotypeGVCFs with --sample-ploidy 1 and a window size of 10 Mb.

The AmelHap dataset comprises samples and variants that have passed a series of filters. The first filter excluded variants with quality by depth (QD) less than 20 or greater than 40, or with mapping quality (MQ) less than 50, or with a strand odds ratio (SOR) greater than 3. The second filter set genotypes with depth (DP) greater than 704 or quality (GQ) less than 40 as missing. The third filter removed monomorphic variants. The fourth and final filter retained samples and variants with a minimum call rate of 90%. Unfiltered data for all samples processed is also available in the community repository in the form of individual gVCF files and joint-called raw variants grouped by ENA project accession.

Sample metadata is available here: https://doi.org/10.5281/zenodo.7030888

Citing this resource

If you use this resource please cite both the resource, and the publication describing it:

  • Parejo, Melanie, Talenti, Andrea, Richardson, Matthew, Vignal, Alain, Barnett, Mark, & Wragg, David. (2022). AmelHap (1.1.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6983102
  • Parejo, M., Talenti, A., Richardson, M. et al. AmelHap: Leveraging drone whole-genome sequence data to create a honey bee HapMap. Sci Data 10, 198 (2023). https://doi.org/10.1038/s41597-023-02097-z

 

Files

Files (25.8 GB)

Name Size Download all
md5:62ac168e20c470c062cbd9ab139c0502
25.8 GB Download
md5:2118382b5cabf44a7a44ac9f673ecdc0
189.5 kB Download