Dataset Open Access

Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project

Speidel, Leo; Forest, Marie; Shi, Sinan; Myers, Simon R.


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Speidel et al., Nature Genetics 2019, A method for genome-wide genealogy estimation for thousands of samples. https://doi.org/10.1038/s41588-019-0484-x</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Genetics</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Genealogy</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Population size</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Allele age</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Positive selection</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">1000 Genomes Project</subfield>
  </datafield>
  <controlfield tag="005">20200124192619.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">For R object files, use load() to load data frames into R.</subfield>
  </datafield>
  <controlfield tag="001">3234689</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Université du Québec à Montréal, Montréal, Canada</subfield>
    <subfield code="a">Forest, Marie</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Statistics, University of Oxford</subfield>
    <subfield code="a">Shi, Sinan</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Statistics, University of Oxford</subfield>
    <subfield code="0">(orcid)0000-0002-2585-9626</subfield>
    <subfield code="a">Myers, Simon R.</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2830727172</subfield>
    <subfield code="z">md5:c3d94e2084205dcb7101bd51d3660409</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/allele_ages_AFR.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1127375764</subfield>
    <subfield code="z">md5:f7da0238962e45a2446fbe9d88b6fef8</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/allele_ages_AMR.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1040290936</subfield>
    <subfield code="z">md5:f1e5241291fe674fb5c3d4390c9d6663</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/allele_ages_EAS.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1177791492</subfield>
    <subfield code="z">md5:c637c78c2248ca46144eb40d2c4f6c4d</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/allele_ages_EUR.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1223898401</subfield>
    <subfield code="z">md5:9ee3fada4ae59271992d8288ee8a82d1</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/allele_ages_SAS.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">19658</subfield>
    <subfield code="z">md5:b7b247836e1d078de755824fcecfc75b</subfield>
    <subfield code="u">https://zenodo.org/record/3234689/files/coalescence_rates.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-05-29</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:3234689</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Department of Statistics, University of Oxford</subfield>
    <subfield code="0">(orcid)0000-0002-4644-8033</subfield>
    <subfield code="a">Speidel, Leo</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Coalescence rates, allele ages, and p-values for evidence of positive selection calculated for 2478&amp;nbsp;samples of the&amp;nbsp;1000 Genomes Project&amp;nbsp;using Relate.&lt;/p&gt;

&lt;p&gt;We estimated the joint genealogy of all 1000 GP populations and then extracted the embedded genealogy for each population.&lt;br&gt;
For the genealogy of each population, we jointly estimated the population size history and branch lengths.&amp;nbsp;&lt;br&gt;
Variants segregating in more than one&amp;nbsp;population&amp;nbsp;therefore have&amp;nbsp;correlated but different allele ages in each population.&lt;/p&gt;

&lt;p&gt;Please refer to&amp;nbsp;&lt;a href="https://www.nature.com/articles/s41588-019-0484-x"&gt;Speidel et al.&amp;nbsp;Nature Genetics (2019)&lt;/a&gt;&amp;nbsp;for more details or email leo.speidel@outlook.com for any queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coalescence rates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The zipped directory&amp;nbsp;coalescence_rates.zip&amp;nbsp;contains coalescence rates for 26 populations in the 1000 Genomes Project data set.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The .coal files show the haploid coalescence rates, please refer to the&amp;nbsp;&lt;a href="https://myersgroup.github.io/relate/modules.html#PopulationSizeScript_FileFormats"&gt;Relate documentation&lt;/a&gt;&amp;nbsp;for the file format.&lt;/li&gt;
	&lt;li&gt;The popsize.RData file is an R data frame storing the diploid population sizes (0.5/coalescence rate) calculated using the .coal files. The columns of this data frame, named &amp;quot;pop_size&amp;quot;,&amp;nbsp;are
	&lt;ul&gt;
		&lt;li&gt;gens_ago: Time in generations at which epoch starts. (To get years from generations, we multiply by 28.)&lt;/li&gt;
		&lt;li&gt;population_size: Diploid population size in this epoch.&lt;/li&gt;
		&lt;li&gt;population: Name of population&amp;nbsp;&lt;/li&gt;
		&lt;li&gt;region: Name of region (AFR, AMR, EAS, EUR, SAS)&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Allele ages and selection p-values&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The zipped directories&amp;nbsp;allele_ages_*.zip&amp;nbsp;contain&amp;nbsp;R&amp;nbsp;data frames for each 1000GP population storing allele ages and selection p-values.&lt;br&gt;
Please note that only mutations that segregate in the population and map to a unique branch in the Relate-estimated marginal trees are included. Selection p-values are only provided for mutations of DAF &amp;gt; 2 that pass quality filters (see Speidel et al., 2019).&amp;nbsp;&lt;/p&gt;

&lt;p&gt;To get an age estimate for a neutral mutation, use&amp;nbsp;0.5*(lower_age + upper_age). To get years from generations, we multiply by 28.&lt;/p&gt;

&lt;p&gt;The columns of these&amp;nbsp;data frames, named &amp;quot;allele_ages&amp;quot;,&amp;nbsp;are&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;CHR: chromosome index&lt;/li&gt;
	&lt;li&gt;BP: base-pair position (GRCh37)&lt;/li&gt;
	&lt;li&gt;ID: id of SNP&lt;/li&gt;
	&lt;li&gt;lower_age: Age in generations of coalescence event at the lower end of the branch onto which the mutation maps&lt;/li&gt;
	&lt;li&gt;upper_age: Age in generations of coalescence event at the upper end of the branch onto which the mutation maps&lt;/li&gt;
	&lt;li&gt;ancestral/derived: Ancestral/derived allele&lt;/li&gt;
	&lt;li&gt;upstream: Upstream (5&amp;#39;) allele&lt;/li&gt;
	&lt;li&gt;downstream: Downstream (3&amp;#39;) allele&lt;/li&gt;
	&lt;li&gt;DAF: Derived-allele frequency&lt;/li&gt;
	&lt;li&gt;pvalue: log10 p-value for selection evidence&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3234688</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3234689</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
527
276
views
downloads
All versions This version
Views 527527
Downloads 276276
Data volume 368.5 GB368.5 GB
Unique views 485485
Unique downloads 125125

Share

Cite as