Simulated metagenomic DNA sequencing reads for complete FDA-ARGOS bacterial genomes
Description
This dataset comprises simulated DNA sequencing reads in FASTQ format for all 988 complete FDA-ARGOS bacterial reference genomes (https://www.nature.com/articles/s41467-019-11306-6) including plasmids, downloaded on 2025-02-25. Included are simulated long Oxford Nanopore Technologies R10.4 reads with ~4% error rate and simulated short (2x150bp) Illumina reads with 1% error rate. The source reference sequences are provided as argos988.fa.zst. Files are compressed with Zstandard in order to fit inside Zenodo's 50GB limit.
Simulated long reads (Oxford Nanopore Technologies)
-
argos988.fastq.zst -
Measured empirical error rate: ~4%
-
Simulator: PBSIM
3.0.4(https://academic.oup.com/nargab/article/4/4/lqac092/6855700)-
Model:
ERRHMM-ONT-HQ -
Depth: 10x
-
Mean read length: 5,000bp
-
Max read length: 50,000bp
-
Mean accuracy: 0.98
-
Random seed: 1
-
Command used:
for fasta in argos988/*.fa; doacc=$(basename "$fasta" .fa)pbsim --seed 1 --strategy wgs --method errhmm --errhmm pbsim3/data/ERRHMM-ONT-HQ.model --depth 10 --genome ${fasta} --prefix ${acc} --id-prefix ${acc}__ --length-mean 5000 --length-max 50000 --accuracy-mean 0.98; cat ${acc}*.fastq | pigz > ${acc}.fastq.gzdone
-
Simulated short reads (Illumina)
-
argos988.r1.fastq.zstandargos988.r2.fastq.zst -
Measured empirical error rate: 1%
-
Simulator: dwgsim
0.1.14; conda package version1.1.14, (https://github.com/nh13/DWGSIM)-
Read length: 2x150bp (paired)
-
Depth: 10x
-
Random read probability (
-y): 0 -
Error rate (
-eand-E): 0.01 -
Mutation rate (
-r): 0.0-
Of which low frequency somatic mutations (
-F): 0.0
-
-
Random seed (
-z): 1 -
Command used:
for fasta in argos988/*.fa; doacc=$(basename "$fasta" .fa)dwgsim -C 10 -1 150 -2 150 -y 0.0 -o 1 -z 1 -F 0.0 -r 0.0 -e 0.01 -E 0.01 "$fasta" "$acc"done
-
Files
Files
(49.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8c97e4921298e9518a1a7bc163894399
|
1.2 GB | Download |
|
md5:67758da31266800629ebd482913084d1
|
12.3 GB | Download |
|
md5:55928aa0110379a7e1e0e6e5338fd404
|
17.9 GB | Download |
|
md5:797fb104153567fc9a9c4cf6d531feb7
|
17.9 GB | Download |