Dataset Open Access

Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project

Speidel, Leo; Forest, Marie; Shi, Sinan; Myers, Simon R.


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.3234689</identifier>
  <creators>
    <creator>
      <creatorName>Speidel, Leo</creatorName>
      <givenName>Leo</givenName>
      <familyName>Speidel</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-4644-8033</nameIdentifier>
      <affiliation>Department of Statistics, University of Oxford</affiliation>
    </creator>
    <creator>
      <creatorName>Forest, Marie</creatorName>
      <givenName>Marie</givenName>
      <familyName>Forest</familyName>
      <affiliation>Université du Québec à Montréal, Montréal, Canada</affiliation>
    </creator>
    <creator>
      <creatorName>Shi, Sinan</creatorName>
      <givenName>Sinan</givenName>
      <familyName>Shi</familyName>
      <affiliation>Department of Statistics, University of Oxford</affiliation>
    </creator>
    <creator>
      <creatorName>Myers, Simon R.</creatorName>
      <givenName>Simon R.</givenName>
      <familyName>Myers</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2585-9626</nameIdentifier>
      <affiliation>Department of Statistics, University of Oxford</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Relate-estimated coalescence rates, allele ages, and selection p-values for the 1000 Genomes Project</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2019</publicationYear>
  <subjects>
    <subject>Genetics</subject>
    <subject>Genealogy</subject>
    <subject>Population size</subject>
    <subject>Allele age</subject>
    <subject>Positive selection</subject>
    <subject>1000 Genomes Project</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2019-05-29</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3234689</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.3234688</relatedIdentifier>
  </relatedIdentifiers>
  <version>v1.0.0</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Coalescence rates, allele ages, and p-values for evidence of positive selection calculated for 2478&amp;nbsp;samples of the&amp;nbsp;1000 Genomes Project&amp;nbsp;using Relate.&lt;/p&gt;

&lt;p&gt;We estimated the joint genealogy of all 1000 GP populations and then extracted the embedded genealogy for each population.&lt;br&gt;
For the genealogy of each population, we jointly estimated the population size history and branch lengths.&amp;nbsp;&lt;br&gt;
Variants segregating in more than one&amp;nbsp;population&amp;nbsp;therefore have&amp;nbsp;correlated but different allele ages in each population.&lt;/p&gt;

&lt;p&gt;Please refer to&amp;nbsp;&lt;a href="https://www.nature.com/articles/s41588-019-0484-x"&gt;Speidel et al.&amp;nbsp;Nature Genetics (2019)&lt;/a&gt;&amp;nbsp;for more details or email leo.speidel@outlook.com for any queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coalescence rates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The zipped directory&amp;nbsp;coalescence_rates.zip&amp;nbsp;contains coalescence rates for 26 populations in the 1000 Genomes Project data set.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The .coal files show the haploid coalescence rates, please refer to the&amp;nbsp;&lt;a href="https://myersgroup.github.io/relate/modules.html#PopulationSizeScript_FileFormats"&gt;Relate documentation&lt;/a&gt;&amp;nbsp;for the file format.&lt;/li&gt;
	&lt;li&gt;The popsize.RData file is an R data frame storing the diploid population sizes (0.5/coalescence rate) calculated using the .coal files. The columns of this data frame, named &amp;quot;pop_size&amp;quot;,&amp;nbsp;are
	&lt;ul&gt;
		&lt;li&gt;gens_ago: Time in generations at which epoch starts. (To get years from generations, we multiply by 28.)&lt;/li&gt;
		&lt;li&gt;population_size: Diploid population size in this epoch.&lt;/li&gt;
		&lt;li&gt;population: Name of population&amp;nbsp;&lt;/li&gt;
		&lt;li&gt;region: Name of region (AFR, AMR, EAS, EUR, SAS)&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Allele ages and selection p-values&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The zipped directories&amp;nbsp;allele_ages_*.zip&amp;nbsp;contain&amp;nbsp;R&amp;nbsp;data frames for each 1000GP population storing allele ages and selection p-values.&lt;br&gt;
Please note that only mutations that segregate in the population and map to a unique branch in the Relate-estimated marginal trees are included. Selection p-values are only provided for mutations of DAF &amp;gt; 2 that pass quality filters (see Speidel et al., 2019).&amp;nbsp;&lt;/p&gt;

&lt;p&gt;To get an age estimate for a neutral mutation, use&amp;nbsp;0.5*(lower_age + upper_age). To get years from generations, we multiply by 28.&lt;/p&gt;

&lt;p&gt;The columns of these&amp;nbsp;data frames, named &amp;quot;allele_ages&amp;quot;,&amp;nbsp;are&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;CHR: chromosome index&lt;/li&gt;
	&lt;li&gt;BP: base-pair position (GRCh37)&lt;/li&gt;
	&lt;li&gt;ID: id of SNP&lt;/li&gt;
	&lt;li&gt;lower_age: Age in generations of coalescence event at the lower end of the branch onto which the mutation maps&lt;/li&gt;
	&lt;li&gt;upper_age: Age in generations of coalescence event at the upper end of the branch onto which the mutation maps&lt;/li&gt;
	&lt;li&gt;ancestral/derived: Ancestral/derived allele&lt;/li&gt;
	&lt;li&gt;upstream: Upstream (5&amp;#39;) allele&lt;/li&gt;
	&lt;li&gt;downstream: Downstream (3&amp;#39;) allele&lt;/li&gt;
	&lt;li&gt;DAF: Derived-allele frequency&lt;/li&gt;
	&lt;li&gt;pvalue: log10 p-value for selection evidence&lt;/li&gt;
&lt;/ul&gt;</description>
    <description descriptionType="Other">For R object files, use load() to load data frames into R.</description>
    <description descriptionType="Other">{"references": ["Speidel et al., Nature Genetics 2019, A method for genome-wide genealogy estimation for thousands of samples. https://doi.org/10.1038/s41588-019-0484-x"]}</description>
  </descriptions>
</resource>
669
366
views
downloads
All versions This version
Views 669669
Downloads 366366
Data volume 493.5 GB493.5 GB
Unique views 610610
Unique downloads 172172

Share

Cite as