Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations
- 1. University of Washington
Description
Whole genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: 1) the expected frequency of heterozygotes exceeds that for singleton loci, and 2) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely re-diploidized following an ancient whole genome duplication. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.
Notes
Files
HDplot_R_genericInput.txt
Files
(203.5 MB)
Name | Size | Download all |
---|---|---|
md5:5c7942c1ad1ff254045b538fd0f94ef5
|
5.0 MB | Download |
md5:13c4245b25af8aa93c1df5fdbffab26b
|
115.0 MB | Download |
md5:4e934155d5389e67cf499fb257c8941a
|
722 Bytes | Download |
md5:35738e1f35b3cb2d4e0dc4c7c27506b9
|
2.1 kB | Download |
md5:4fca3ad630ce6e63b9a0c557d110d9bd
|
43.5 MB | Download |
md5:98d4324896e7739fcfbd956bd8e105a9
|
451.4 kB | Preview Download |
md5:021abab1f17977327a4f45f07d711964
|
22.9 kB | Download |
md5:77fa08ec623504cc9ed089ebfc630421
|
39.5 MB | Download |
md5:ad3de9c3ffb276f06354311b8d62c453
|
3.4 kB | Download |
Additional details
Related works
- Is cited by
- 10.1111/1755-0998.12613 (DOI)