Published October 18, 2016 | Version v1
Dataset Open

Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations

  • 1. University of Washington

Description

Whole genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: 1) the expected frequency of heterozygotes exceeds that for singleton loci, and 2) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely re-diploidized following an ancient whole genome duplication. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.

Notes

Files

HDplot_R_genericInput.txt

Files (203.5 MB)

Name Size Download all
md5:5c7942c1ad1ff254045b538fd0f94ef5
5.0 MB Download
md5:13c4245b25af8aa93c1df5fdbffab26b
115.0 MB Download
md5:4e934155d5389e67cf499fb257c8941a
722 Bytes Download
md5:35738e1f35b3cb2d4e0dc4c7c27506b9
2.1 kB Download
md5:4fca3ad630ce6e63b9a0c557d110d9bd
43.5 MB Download
md5:98d4324896e7739fcfbd956bd8e105a9
451.4 kB Preview Download
md5:021abab1f17977327a4f45f07d711964
22.9 kB Download
md5:77fa08ec623504cc9ed089ebfc630421
39.5 MB Download
md5:ad3de9c3ffb276f06354311b8d62c453
3.4 kB Download

Additional details

Related works

Is cited by
10.1111/1755-0998.12613 (DOI)