Published May 20, 2019 | Version 1.0.0
Dataset Open

Inferring whole-genome histories in large population datasets: inferred tree sequences for 1000 Genomes

  • 1. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford

Description

Tree sequences inferred for the 1000 Genomes phase 3 autosomes using tsinfer version 0.1.4 and compressed using tszip. Tree sequences can  be decompressed as follows:

$ tsunzip 1kg_chr1.trees.tsz

Once decompressed, trees files can be loaded and processed using tskit

import tskit
ts = tskit.load("1kg_chr1.trees")
# ts is an instance of tskit.TreeSequence 
print("Chromosome 1 contains {} trees".format(ts.num_trees))

Metadata associated with individuals and populations was derived from the original source and converted to JSON form. For example, to access individual metadata we can use:

import tskit
import json
ts = tskit.load("1kg_chr1.trees")
ind = ts.individual(0)
metadata_dict = json.loads(ind.metadata)

The metadata_dict variable will now contain all the metadata for the individual with ID 0 as a dictionary. Metadata associated with populations can be found in a similar way. Population IDs are associated with individuals via their constituent nodes. For example,

pop_metadata = [json.loads(pop.metadata) for pop in ts.populations()]
ind_node = ts.node(ind.nodes[0])
ind_pop_metadata = pop_metadata[ind_node.population]

After this, the ind_pop_metadata variable will contain the population level metadata for individual ID 0.

The full data pipeline used to generate these tree sequences and associated metadata is available on GitHub.

Notes

AWW and CF would like to thank the Rhodes Trust for their generous support.

Files

Files (2.0 GB)

Name Size Download all
md5:3981337e7af4f0f7f125e80230385311
165.2 MB Download
md5:d00555e85ceaef56926d0d8b2d3d2c01
101.0 MB Download
md5:16eaff6439c83b3bdb8eecce2f7bb0b7
95.5 MB Download
md5:5f8b675c9de83b44f1ef4287099308ad
96.0 MB Download
md5:d03d34a70f8aa6b9577da4c7e35f1b92
70.3 MB Download
md5:4ec5a7423a8c1ee4e1119978d70a8aa9
66.1 MB Download
md5:c1e4865a4c383fb1d1e2b9bb91374351
64.8 MB Download
md5:b7c19e6bae4d989a6c61b1a41d4a3bfb
71.6 MB Download
md5:563335e0bdfbcecf71daa7d4af057c4d
62.1 MB Download
md5:acb8803d4a0ef070e6bb439c9a047c38
59.8 MB Download
md5:6272ba610e4dc844b7ae8369b70b2186
43.1 MB Download
md5:179931fe8e41b36f352b945ef0e57a34
175.5 MB Download
md5:3f773d1350e23cb17598c3392d79252b
47.7 MB Download
md5:81346e4973709f47bb8a48e78198946f
29.2 MB Download
md5:bf38ee96367adef1c2ef67ca4c615174
29.7 MB Download
md5:060f2101c7d29f8e2a860eb03973f3d6
143.4 MB Download
md5:991b5fac9ff1640a9a5ee9dbfe095b26
136.9 MB Download
md5:7ee156094415f84d54bbcbbf5e4933d3
125.5 MB Download
md5:6ff1aa03261ec8a54b67f38f8712637f
125.8 MB Download
md5:fb1bfdcfcc6836661b59046f5b5d8692
117.0 MB Download
md5:d0d167704d20e01af74b9ab322c42894
112.3 MB Download
md5:10559023d677f83dfc4fc50eade2ef24
92.8 MB Download

Additional details

Funding

The Genetic Analysis of Populations. 100956
Wellcome Trust

References