Simulated metagenomic DNA sequencing reads for complete NCBI RefSeq virus sequences
Description
This dataset comprises simulated DNA sequencing reads in FASTQ format for 17,900 complete NCBI RefSeq virus sequences downloaded on 2025-02-05. Included are simulated long Oxford Nanopore Technologies R10.4 reads with ~4% error rate and simulated short (2x150bp) Illumina reads with 1% error rate. The source reference sequences are provided as rsviruses17900.fa.gz.
Simulated long reads (Oxford Nanopore Technologies)
-
rsviruses17900.fastq.gz -
Measured empirical error rate: ~4%
-
Simulator: PBSIM
3.0.4(https://academic.oup.com/nargab/article/4/4/lqac092/6855700)-
Model:
ERRHMM-ONT-HQ -
Depth: 10x
-
Mean read length: 1,000bp
-
Max read length: 10,000bp
-
Mean accuracy: 0.98
-
Random seed: 1
-
Command used:
for fasta in rsviruses17900/*.fa; doacc=$(basename "$fasta" .fa)pbsim --seed 1 --strategy wgs --method errhmm --errhmm pbsim3/data/ERRHMM-ONT-HQ.model --depth 10 --genome ${fasta} --prefix ${acc} --id-prefix ${acc}__ --length-mean 1000 --length-max 10000 --accuracy-mean 0.98; cat ${acc}*.fastq | pigz > ${acc}.fastq.gzdone
-
Simulated short reads (Illumina)
-
rsviruses17900.r1.fastq.gzandrsviruses17900.r2.fastq.gz -
Measured empirical error rate: 1%
-
Simulator: dwgsim
0.1.14; conda package version1.1.14, (https://github.com/nh13/DWGSIM)-
Read length: 2x150bp (paired)
-
Depth: 10x
-
Random read probability (
-y): 0 -
Error rate (
-eand-E): 0.01 -
Mutation rate (
-r): 0.0-
Of which low frequency somatic mutations (
-F): 0.0
-
-
Random seed (
-z): 1 -
Command used:
for fasta in rsviruses17900/*.fa; doacc=$(basename "$fasta" .fa)dwgsim -C 10 -1 150 -2 150 -y 0.0 -o 1 -z 1 -F 0.0 -r 0.0 -e 0.01 -E 0.01 "$fasta" "$acc"done
-
Files
Files
(6.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:78332c82a49fb35cbe10b1bd133b61c6
|
166.4 MB | Download |
|
md5:4e7528f0101f7f41a4a2262b10e16944
|
1.6 GB | Download |
|
md5:f344c229e1d4a4c32f5fe90e05c2119b
|
2.3 GB | Download |
|
md5:8f17dfb62c0c4dea1b5510f143985616
|
2.3 GB | Download |