Dataset Open Access

Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project

Speidel, Leo; Forest, Marie; Shi, Sinan; Myers, Simon R.

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3234689", 
  "language": "eng", 
  "title": "Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project", 
  "issued": {
    "date-parts": [
  "abstract": "<p><strong>Overview</strong></p>\n\n<p>Coalescence rates, allele ages, and p-values for evidence of positive selection calculated for 2478&nbsp;samples of the&nbsp;1000 Genomes Project&nbsp;using Relate.</p>\n\n<p>We estimated the joint genealogy of all 1000 GP populations and then extracted the embedded genealogy for each population.<br>\nFor the genealogy of each population, we jointly estimated the population size history and branch lengths.&nbsp;<br>\nVariants segregating in more than one&nbsp;population&nbsp;therefore have&nbsp;correlated but different allele ages in each population.</p>\n\n<p>Please refer to&nbsp;<a href=\"\">Speidel et al.&nbsp;Nature Genetics (2019)</a>&nbsp;for more details or email for any queries.</p>\n\n<p><strong>Coalescence rates</strong></p>\n\n<p>The zipped directory&nbsp;;contains coalescence rates for 26 populations in the 1000 Genomes Project data set.</p>\n\n<ul>\n\t<li>The .coal files show the haploid coalescence rates, please refer to the&nbsp;<a href=\"\">Relate documentation</a>&nbsp;for the file format.</li>\n\t<li>The popsize.RData file is an R data frame storing the diploid population sizes (0.5/coalescence rate) calculated using the .coal files. The columns of this data frame, named &quot;pop_size&quot;,&nbsp;are\n\t<ul>\n\t\t<li>gens_ago: Time in generations at which epoch starts. (To get years from generations, we multiply by 28.)</li>\n\t\t<li>population_size: Diploid population size in this epoch.</li>\n\t\t<li>population: Name of population&nbsp;</li>\n\t\t<li>region: Name of region (AFR, AMR, EAS, EUR, SAS)</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p><strong>Allele ages and selection p-values</strong></p>\n\n<p>The zipped directories&nbsp;allele_ages_*.zip&nbsp;contain&nbsp;R&nbsp;data frames for each 1000GP population storing allele ages and selection p-values.<br>\nPlease note that only mutations that segregate in the population and map to a unique branch in the Relate-estimated marginal trees are included. Selection p-values are only provided for mutations of DAF &gt; 2 that pass quality filters (see Speidel et al., 2019).&nbsp;</p>\n\n<p>To get an age estimate for a neutral mutation, use&nbsp;0.5*(lower_age + upper_age). To get years from generations, we multiply by 28.</p>\n\n<p>The columns of these&nbsp;data frames, named &quot;allele_ages&quot;,&nbsp;are</p>\n\n<ul>\n\t<li>CHR: chromosome index</li>\n\t<li>BP: base-pair position (GRCh37)</li>\n\t<li>ID: id of SNP</li>\n\t<li>lower_age: Age in generations of coalescence event at the lower end of the branch onto which the mutation maps</li>\n\t<li>upper_age: Age in generations of coalescence event at the upper end of the branch onto which the mutation maps</li>\n\t<li>ancestral/derived: Ancestral/derived allele</li>\n\t<li>upstream: Upstream (5&#39;) allele</li>\n\t<li>downstream: Downstream (3&#39;) allele</li>\n\t<li>DAF: Derived-allele frequency</li>\n\t<li>pvalue: log10 p-value for selection evidence</li>\n</ul>", 
  "author": [
      "family": "Speidel, Leo"
      "family": "Forest, Marie"
      "family": "Shi, Sinan"
      "family": "Myers, Simon R."
  "note": "For R object files, use load() to load data frames into R.", 
  "version": "v1.0.0", 
  "type": "dataset", 
  "id": "3234689"
All versions This version
Views 474474
Downloads 217217
Data volume 296.5 GB296.5 GB
Unique views 437437
Unique downloads 102102


Cite as