Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published September 16, 2021 | Version 1.0.0
Dataset Open

A unified genealogy of modern and ancient genomes: Unified, inferred tree sequences of 1000 Genomes, Human Genome Diversity, and Simons Genome Diversity Projects with ancient samples

  • 1. Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • 2. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
  • 3. Harvard Medical School Department of Genetics, Boston, MA, USA
  • 4. Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria

Description

Unified, inferred tree sequences built from the 1000 Genomes phase 3, Human Genome Diversity, and Simons Genome Diversity Projects with high coverage sequenced ancient samples. The ancient samples are the Altai, Chagyrskaya, and Vindija Neanderthals, the Denisovan, and a high-coverage family of four from the Afanasievo Culture.

Each tree sequence is the arm of an autosome (the short arm of acrocentric chromosomes are not included). Tree sequences were inferred with tsinfer version 0.2.1 and tsdate version 0.1.4, as described in Wohns et al. (2021). The files were compressed using tszip. All data is in GRCh38.

The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub. A description can be found in the Supplementary Material of Wohns et al. (2021).

Tree sequences can be decompressed as follows:

$ tsunzip hgdp_tgp_sgdp_high_cov_ancients_chr1_p.dated.trees.tsz

Once decompressed, trees files can be loaded and processed in Python using tskit

import tskit
ts = tskit.load("hgdp_tgp_sgdp_high_cov_ancients_chr1_p.dated.trees")
# ts is an instance of tskit.TreeSequence 
print("The short arm of chromosome 1 contains {} trees".format(ts.num_trees))

Accessing variant sites in the tree sequence provides the position and id of variants:

import json
site = ts.site(1000)
site_metadata = json.loads(site.metadata)
print("The position of site 1000 is {} and its ID is {}.".format(site.position, site_metadata["ID"]))

Metadata associated with individuals and populations was derived from the original sources (TGP, HGDP, and SGDP) and converted to JSON form. For example, to access individual metadata we can use:

ind = ts.individual(0)
metadata_dict = json.loads(ind.metadata)

The metadata_dict variable will now contain all the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example,

pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()]
ind_node = ts.node(ind.nodes[0])
ind_pop_metadata = pop_metadata[ind_node.population]

After this, the ind_pop_metadata variable will contain the population level metadata for individual ID 0.

Files

Files (2.7 GB)

Name Size Download all
md5:667503fcd1c95154f562fba29c40c239
44.1 MB Download
md5:405d2a12c53ae9b9556d4389c30d1154
91.2 MB Download
md5:26e795658c785a2371b81763b41bdff1
50.2 MB Download
md5:0b0cf2d0dd651f9221c9186e795defc1
78.9 MB Download
md5:3fe4bd9f3b96933ac56e355dfcfa37d9
36.5 MB Download
md5:cfeb6e6e72280d6d722675242e3f361e
93.7 MB Download
md5:143751185fc1e3807a06003cd3110deb
98.2 MB Download
md5:ed49a6ff5f957c9fc4078a98b4219e04
90.2 MB Download
md5:069d934e545981316abbc28a563ee48f
87.0 MB Download
md5:db789babb839d4a781598fc0b32d3995
39.2 MB Download
md5:ac01be21d421154fc7a0e28d11b5d31b
50.8 MB Download
md5:de0e736c0778ed4c556424c58ee61112
25.5 MB Download
md5:f6042c8b164b621133b1a4a11b0972ff
56.7 MB Download
md5:09e20d00af3fb11164fb6a28e2cf3364
18.5 MB Download
md5:9a2edd3294be47f024a0797c037ed016
61.6 MB Download
md5:7cce25ccc894e89888e549088bc2a66c
25.9 MB Download
md5:d0401ff9f94ee1180fecd1e167c8d658
33.9 MB Download
md5:cd6a4ae8c76d881a3034b81927594bc3
118.4 MB Download
md5:6cbe795fbc1df80eab511091440c592b
99.6 MB Download
md5:f398e36e9532ba9758df832d8ac06d79
28.9 MB Download
md5:360573dfe31cd7ac81d309456941202d
37.4 MB Download
md5:3be403fae45841524ea90bc5989070a1
38.4 MB Download
md5:f2737a8063735710944b0eab983df7bd
38.9 MB Download
md5:d5ca695ee1a1bd6c80252d6511f45e64
99.3 MB Download
md5:3e348b98ab4c4d9ac0469b8e4f5c3abd
138.4 MB Download
md5:d9c7d5ccde64aecea95d8f2f50d3f75e
92.6 MB Download
md5:a3c613b421f02f33a0b4db10d204df78
95.0 MB Download
md5:8aba93d37fb283048283a2bcb8c3d96b
53.3 MB Download
md5:3f5ed3332e69115e49ff639102bf97d8
134.5 MB Download
md5:76e1f363e505e11a456301d8323cdd14
47.5 MB Download
md5:2c8254deb8a718005a568513e756b634
122.9 MB Download
md5:68a86963e305e80897eec0899f5f8e97
59.4 MB Download
md5:3a0253ef8fc3131759cb8b11d305677b
104.8 MB Download
md5:57dec8bb11aaabb9d154d42ae39ae136
63.7 MB Download
md5:5807e6c4d1e9ae248c7ef99309b3d8dc
91.5 MB Download
md5:50ac74062654ec418ff8028ea7267ea1
57.7 MB Download
md5:e8883086e9195948c0660dd24186f441
94.2 MB Download
md5:2354c79af213df58be0ad5829b51fcdb
54.1 MB Download
md5:7358c1237a85421945d35407b7427613
74.2 MB Download