Published June 26, 2023 | Version v1
Dataset Open

Genome sequencing of 2,000 canids advances the understanding of demography, genome function and architecture

  • 1. University of Michigan

Contributors

Contact person:

  • 1. University of Michigan

Description

Background: The international Dog10K project aims to sequence and analyze several thousand canine genomes. Incorporating 20x data from 1,987 individuals, including 1,611 dogs (321 breeds), 309 village dogs, 63 wolves and four coyotes, we identify genomic variation across the canid family, setting the stage for detailed studies of domestication, behavior, morphology, disease susceptibility and genome architecture and function.

Results: We report the analysis of >48M single nucleotide, indel, and structural variants spanning the autosomes, X chromosome and mitochondria. We discover more than 75% of variation for 239 sampled breeds. Allele sharing analysis indicates that 94.9% of breeds form monophyletic clusters and 25 major clades. German Shepherd Dogs and related breeds show the highest allele sharing with independent breeds from multiple clades. On average, each breed dog differs from the UU_Cfam_GSD_1.0 reference at 26,960 deletions and 14,034 insertions greater than 50bp, with wolves having 14% more variants. Discovered variants include retrogene insertions from 926 parent genes. To aid functional prioritization, single nucleotide variants were annotated with SnpEff and Zoonomia phyloP constraint scores. Constrained positions were negatively correlated with allele frequency. Finally, the utility of the Dog10K data as an imputation reference panel is assessed, generating high confidence calls across varied genotyping platform densities including for breeds not included in the Dog10K collection.

Conclusions: We have developed a dense dataset of 1,987 sequenced canids that reveals patterns of allele sharing, identifies likely functional variants, informs breed structure, and enables accurate imputation. Dog10K data are publicly available

Files

callable-genome.README.txt

Files (51.6 GB)

Name Size Download all
md5:b7746fe62ca3aba22bef5974fb158ba2
4.4 GB Download
md5:1cb4bca47bfc26d8db9008311551450a
1.4 MB Download
md5:d3db249e62463165e7c7f1b7e1cff88d
5.5 GB Download
md5:caca3b09049f9017d19d1d3eb80ededd
1.6 MB Download
md5:f1317cbdb18bba62919050a98b491ce9
974 Bytes Preview Download
md5:a629e44e65d15649071b3ca33f4e8e90
6.3 kB Preview Download
md5:a44c35fb2db63ddf9edf62a78cbcf58c
262.2 MB Download
md5:f08ffc952a0027092e8e420e3bbb92ad
76.7 kB Download
md5:2768d9485383f60157b8a6e876565a9d
150.3 MB Download
md5:3efff76e5bf5768824b8b693813eb4bb
78.1 kB Download
md5:24d1101fb7a7a6fe13bf5ca93bfa0ca6
533.6 MB Download
md5:8e6e0c9c8c8fd445c3a5c6b0c98c6279
86.0 kB Download
md5:3a43fce8bdc5ebfdb3c530ff2df64864
261.6 kB Preview Download
md5:011f99f675b03e77d823ca06baf77214
18.8 GB Download
md5:c34a98b9769dc9ed352653671076a236
5.7 MB Download
md5:f433cca65e58879833508ce9364bc59c
6.6 GB Download
md5:b39075c6ff5916e27f5c22c678cacfe6
2.5 GB Download
md5:bf4c4634209abc6a6a56b8c840dd32ee
22.3 MB Download
md5:56a46762315df92ec4c6be0063ee91bf
12.8 GB Download