Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes
Creators
- 1. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford
Description
Tree sequences inferred for the 1000 Genomes phase 3 autosomes using tsinfer version 0.1.4 and compressed using tszip. Tree sequences can be decompressed as follows:
$ tsunzip 1kg_chr1.trees.tsz
Once decompressed, trees files can be loaded and processed using tskit.
import tskit
ts = tskit.load("1kg_chr1.trees")
# ts is an instance of tskit.TreeSequence
print("Chromosome 1 contains {} trees".format(ts.num_trees))
Metadata associated with individuals and populations was derived from the original source and converted to JSON form. For example, to access individual metadata we can use:
import tskit
import json
ts = tskit.load("1kg_chr1.trees")
ind = ts.individual(0)
metadata_dict = json.loads(ind.metadata)
The metadata_dict variable will now contain all the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example,
pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()]
ind_node = ts.node(ind.nodes[0])
ind_pop_metadata = pop_metadata[ind_node.population]
After this, the ind_pop_metadata variable will contain the population level metadata for individual ID 0.
The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub.
Notes
Files
Files
(2.0 GB)
Name | Size | Download all |
---|---|---|
md5:3981337e7af4f0f7f125e80230385311
|
165.2 MB | Download |
md5:d00555e85ceaef56926d0d8b2d3d2c01
|
101.0 MB | Download |
md5:16eaff6439c83b3bdb8eecce2f7bb0b7
|
95.5 MB | Download |
md5:5f8b675c9de83b44f1ef4287099308ad
|
96.0 MB | Download |
md5:d03d34a70f8aa6b9577da4c7e35f1b92
|
70.3 MB | Download |
md5:4ec5a7423a8c1ee4e1119978d70a8aa9
|
66.1 MB | Download |
md5:c1e4865a4c383fb1d1e2b9bb91374351
|
64.8 MB | Download |
md5:b7c19e6bae4d989a6c61b1a41d4a3bfb
|
71.6 MB | Download |
md5:563335e0bdfbcecf71daa7d4af057c4d
|
62.1 MB | Download |
md5:acb8803d4a0ef070e6bb439c9a047c38
|
59.8 MB | Download |
md5:6272ba610e4dc844b7ae8369b70b2186
|
43.1 MB | Download |
md5:179931fe8e41b36f352b945ef0e57a34
|
175.5 MB | Download |
md5:3f773d1350e23cb17598c3392d79252b
|
47.7 MB | Download |
md5:81346e4973709f47bb8a48e78198946f
|
29.2 MB | Download |
md5:bf38ee96367adef1c2ef67ca4c615174
|
29.7 MB | Download |
md5:060f2101c7d29f8e2a860eb03973f3d6
|
143.4 MB | Download |
md5:991b5fac9ff1640a9a5ee9dbfe095b26
|
136.9 MB | Download |
md5:7ee156094415f84d54bbcbbf5e4933d3
|
125.5 MB | Download |
md5:6ff1aa03261ec8a54b67f38f8712637f
|
125.8 MB | Download |
md5:fb1bfdcfcc6836661b59046f5b5d8692
|
117.0 MB | Download |
md5:d0d167704d20e01af74b9ab322c42894
|
112.3 MB | Download |
md5:10559023d677f83dfc4fc50eade2ef24
|
92.8 MB | Download |
Additional details
Funding
- The Genetic Analysis of Populations. 100956
- Wellcome Trust
References
- Kelleher et al 2019, Inferring the ancestry of everyone. https://doi.org/10.1101/458067