Published May 15, 2025 | Version 1
Dataset Open

Simulated metagenomic DNA sequencing reads for complete FDA-ARGOS bacterial genomes

  • 1. ROR icon University of Oxford

Description

This dataset comprises simulated DNA sequencing reads in FASTQ format for all 988 complete FDA-ARGOS bacterial reference genomes (https://www.nature.com/articles/s41467-019-11306-6) including plasmids, downloaded on 2025-02-25. Included are simulated long Oxford Nanopore Technologies R10.4 reads with ~4% error rate and simulated short (2x150bp) Illumina reads with 1% error rate. The source reference sequences are provided as argos988.fa.zst. Files are compressed with Zstandard in order to fit inside Zenodo's 50GB limit.

Simulated long reads (Oxford Nanopore Technologies)

  • argos988.fastq.zst

  • Measured empirical error rate: ~4%

  • Simulator: PBSIM 3.0.4 (https://academic.oup.com/nargab/article/4/4/lqac092/6855700)

    • Model: ERRHMM-ONT-HQ

    • Depth: 10x

    • Mean read length: 5,000bp

    • Max read length: 50,000bp

    • Mean accuracy: 0.98

    • Random seed: 1

    • Command used:

      for fasta in argos988/*.fa; do
         acc=$(basename "$fasta" .fa)
        pbsim --seed 1 --strategy wgs --method errhmm --errhmm pbsim3/data/ERRHMM-ONT-HQ.model --depth 10 --genome ${fasta} --prefix ${acc} --id-prefix ${acc}__ --length-mean 5000 --length-max 50000 --accuracy-mean 0.98; cat ${acc}*.fastq | pigz > ${acc}.fastq.gz
      done

Simulated short reads (Illumina)

  • argos988.r1.fastq.zst and argos988.r2.fastq.zst

  • Measured empirical error rate: 1%

  • Simulator: dwgsim 0.1.14; conda package version 1.1.14, (https://github.com/nh13/DWGSIM)

    • Read length: 2x150bp (paired)

    • Depth: 10x

    • Random read probability (-y): 0

    • Error rate (-e and -E): 0.01

    • Mutation rate (-r): 0.0

      • Of which low frequency somatic mutations (-F): 0.0

    • Random seed (-z): 1

    • Command used:

      for fasta in argos988/*.fa; do
         acc=$(basename "$fasta" .fa)
        dwgsim -C 10 -1 150 -2 150 -y 0.0 -o 1 -z 1 -F 0.0 -r 0.0 -e 0.01 -E 0.01 "$fasta" "$acc"
      done

Files

Files (49.3 GB)

Name Size Download all
md5:8c97e4921298e9518a1a7bc163894399
1.2 GB Download
md5:67758da31266800629ebd482913084d1
12.3 GB Download
md5:55928aa0110379a7e1e0e6e5338fd404
17.9 GB Download
md5:797fb104153567fc9a9c4cf6d531feb7
17.9 GB Download