Dataset Open Access
Baaijens, Jasmijn A.;
Zulli, Alessandro;
Ott, Isabel M.;
Petrone, Mary E.;
Alpert, Tara;
Fauver, Joseph R.;
Kalinich, Chaney C.;
Vogels, Chantal B.F.;
Breban, Mallery I.;
Duvallet, Claire;
McElroy, Kyle;
Ghaeli, Newsha;
Imakaev, Maxim;
Mckenzie-Bennett, Malaika;
Robison, Keith;
Plocik, Alex;
Schilling, Rebecca;
Pierson, Martha;
Littlefield, Rebecca;
Spencer, Michelle;
Simen, Birgitte B.;
Yale SARS-CoV-2 Genomic Surveillance Initiative;
Hanage, William P.;
Grubaugh, Nathan D.;
Peccia, Jordan;
Baym, Michael
To evaluate the accuracy of variant abundance predictions from wastewater sequencing, we built a collection of benchmarking datasets that resemble real wastewater samples. For each variant (B.1.1.7, B.1.351, B.1.427, B.1.429, P.1) we created a series of 33 benchmarks by simulating sequencing reads from a variant genome, as well as a collection of background (non-variant of concern/interest) sequences, such that the variant abundance ranges from 0.05% to 100%. Analogously, we created a second series of benchmarks, simulating reads only from the Spike gene of each SARS-CoV-2 genome. We refer to the first set of benchmarks as "whole genome" (WG) and to the second set of benchmarks as "S-only". We repeated these simulations at different sequencing depths: 100x and 1000x coverage for the whole genome benchmarks, and 100x, 1000x, and 10,000x coverage for the S-only benchmarks.
Name | Size | |
---|---|---|
S-only-10000x.tar.gz
md5:8ae9a37da2a6d7b6b07e04359b1a19ad |
4.0 GB | Download |
S-only-1000x.tar.gz
md5:d307294a92d25102a5cc70341ac89c16 |
391.2 MB | Download |
S-only-100x.tar.gz
md5:58bdd104f63668a56831ddc6f76632a1 |
38.8 MB | Download |
WG-1000x.tar.gz
md5:72e6236d04fba162c8b6f246efc9a52b |
3.1 GB | Download |
WG-100x.tar.gz
md5:56b7bb7e7a16e155cc17b1fb320a64c5 |
307.5 MB | Download |
All versions | This version | |
---|---|---|
Views | 432 | 432 |
Downloads | 43 | 43 |
Data volume | 78.5 GB | 78.5 GB |
Unique views | 417 | 417 |
Unique downloads | 25 | 25 |