Published January 28, 2025 | Version v3.0.0
Dataset Open

The CRyPTIC Consortium Dataset

  • 1. ROR icon University of Oxford

Description

This dataset is the first of a new release of the data collected and collated by the CRyPTIC Consortium.

All the raw genetics (FASTQ) files have been processed using a Mycobacterial pipeline as implemented in an online cloud platform. Whilst the bioinformatics components are similar (e.g. Clockwork remains the variant caller), there are some differences. This version includes all samples for which we expect to have WGS and pDST data.

It is incomplete as

  • About 1000 samples failed the upload process
  • There are other samples that are missing

These issues are fixed in later releases. We therefore do not recommend usage of this version -- it is recorded here for completeness.

Due to the size of some of the data tables, the larger ones are stored as PyArrow parquet files. These can be e.g. loaded using pandas but one ordinarily needs to first install pyarrow using pip.

Files

DATA_SCHEMA.pdf

Files (1.7 GB)

Name Size Download all
md5:693cbffe95d305499779e09d7bb903e6
4.6 kB Download
md5:fb7099a88d4be5725158fbcb67e28f00
90.4 kB Preview Download
md5:923d3a193df21698bd6a00f857ab337e
385 Bytes Download
md5:94e459036f71c4d31d2fb4ba99ce6595
1.0 MB Download
md5:247bd211a94dd83c777db3d8cdf2296c
795.4 kB Download
md5:dd2f07369a3aca03d2e6c67d8a6c0d17
4.6 MB Download
md5:a92de0aebd41a799747f4b4b9bdab96e
2.1 MB Download
md5:b90ecffbdaafccf0cc6385909e6f7b21
733.9 MB Download
md5:cb403c7517ec847467b7980cbc3e5389
5.9 kB Download
md5:077c54bf823e31ea0b50198322150799
1.9 MB Download
md5:42470635e12c39dc345dfb542dc9caa7
13.4 kB Preview Download
md5:c24c882c8988b5af9940232ada27fb60
1.3 kB Download
md5:c3f2fb5761a5fccf2d936c0dbb432577
994.1 MB Download
md5:0ad7376c95bb4830e8df9fcf36039358
7.8 MB Download

Additional details

Funding

Wellcome Trust
The CRyPTIC Consortium 200205/Z/15/Z
Bill & Melinda Gates Foundation
The CRyPTIC Consortium OPP1133541