The CRyPTIC Consortium Dataset
Description
This dataset is the first of a new release of the data collected and collated by the CRyPTIC Consortium.
All the raw genetics (FASTQ) files have been processed using a Mycobacterial pipeline as implemented in an online cloud platform. Whilst the bioinformatics components are similar (e.g. Clockwork remains the variant caller), there are some differences. This version includes all samples for which we expect to have WGS and pDST data.
It is incomplete as
- About 1000 samples failed the upload process
- There are other samples that are missing
These issues are fixed in later releases. We therefore do not recommend usage of this version -- it is recorded here for completeness.
Due to the size of some of the data tables, the larger ones are stored as PyArrow parquet files. These can be e.g. loaded using pandas
but one ordinarily needs to first install pyarrow
using pip
.
Files
DATA_SCHEMA.pdf
Files
(1.7 GB)
Name | Size | Download all |
---|---|---|
md5:693cbffe95d305499779e09d7bb903e6
|
4.6 kB | Download |
md5:fb7099a88d4be5725158fbcb67e28f00
|
90.4 kB | Preview Download |
md5:923d3a193df21698bd6a00f857ab337e
|
385 Bytes | Download |
md5:94e459036f71c4d31d2fb4ba99ce6595
|
1.0 MB | Download |
md5:247bd211a94dd83c777db3d8cdf2296c
|
795.4 kB | Download |
md5:dd2f07369a3aca03d2e6c67d8a6c0d17
|
4.6 MB | Download |
md5:a92de0aebd41a799747f4b4b9bdab96e
|
2.1 MB | Download |
md5:b90ecffbdaafccf0cc6385909e6f7b21
|
733.9 MB | Download |
md5:cb403c7517ec847467b7980cbc3e5389
|
5.9 kB | Download |
md5:077c54bf823e31ea0b50198322150799
|
1.9 MB | Download |
md5:42470635e12c39dc345dfb542dc9caa7
|
13.4 kB | Preview Download |
md5:c24c882c8988b5af9940232ada27fb60
|
1.3 kB | Download |
md5:c3f2fb5761a5fccf2d936c0dbb432577
|
994.1 MB | Download |
md5:0ad7376c95bb4830e8df9fcf36039358
|
7.8 MB | Download |
Additional details
Funding
- Wellcome Trust
- The CRyPTIC Consortium 200205/Z/15/Z
- Bill & Melinda Gates Foundation
- The CRyPTIC Consortium OPP1133541