Published May 14, 2025 | Version 1
Dataset Open

Simulated metagenomic DNA sequencing reads for complete NCBI RefSeq virus sequences

  • 1. ROR icon University of Oxford

Description

This dataset comprises simulated DNA sequencing reads in FASTQ format for 17,900 complete NCBI RefSeq virus sequences downloaded on 2025-02-05. Included are simulated long Oxford Nanopore Technologies R10.4 reads with ~4% error rate and simulated short (2x150bp) Illumina reads with 1% error rate. The source reference sequences are provided as rsviruses17900.fa.gz.

Simulated long reads (Oxford Nanopore Technologies)

  • rsviruses17900.fastq.gz

  • Measured empirical error rate: ~4%

  • Simulator: PBSIM 3.0.4 (https://academic.oup.com/nargab/article/4/4/lqac092/6855700)

    • Model: ERRHMM-ONT-HQ

    • Depth: 10x

    • Mean read length: 1,000bp

    • Max read length: 10,000bp

    • Mean accuracy: 0.98

    • Random seed: 1

    • Command used:

      for fasta in rsviruses17900/*.fa; do
         acc=$(basename "$fasta" .fa)
        pbsim --seed 1 --strategy wgs --method errhmm --errhmm pbsim3/data/ERRHMM-ONT-HQ.model --depth 10 --genome ${fasta} --prefix ${acc} --id-prefix ${acc}__ --length-mean 1000 --length-max 10000 --accuracy-mean 0.98; cat ${acc}*.fastq | pigz > ${acc}.fastq.gz
      done

Simulated short reads (Illumina)

  • rsviruses17900.r1.fastq.gz and rsviruses17900.r2.fastq.gz

  • Measured empirical error rate: 1%

  • Simulator: dwgsim 0.1.14; conda package version 1.1.14, (https://github.com/nh13/DWGSIM)

    • Read length: 2x150bp (paired)

    • Depth: 10x

    • Random read probability (-y): 0

    • Error rate (-e and -E): 0.01

    • Mutation rate (-r): 0.0

      • Of which low frequency somatic mutations (-F): 0.0

    • Random seed (-z): 1

    • Command used:

      for fasta in rsviruses17900/*.fa; do
         acc=$(basename "$fasta" .fa)
        dwgsim -C 10 -1 150 -2 150 -y 0.0 -o 1 -z 1 -F 0.0 -r 0.0 -e 0.01 -E 0.01 "$fasta" "$acc"
      done

Files

Files (6.4 GB)

Name Size Download all
md5:78332c82a49fb35cbe10b1bd133b61c6
166.4 MB Download
md5:4e7528f0101f7f41a4a2262b10e16944
1.6 GB Download
md5:f344c229e1d4a4c32f5fe90e05c2119b
2.3 GB Download
md5:8f17dfb62c0c4dea1b5510f143985616
2.3 GB Download