Dataset Open Access
Tutorial data for chloroplast genome assembly: fastq reads from illumina and nanopore sequencing for the snow gum, Eucalyptus pauciflora.
Data from: Wang, W., Schalamun, M., Morales-Suarez, A. et al. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genomics 19, 977 (2018) doi:10.1186/s12864-018-5348-8
Data hosted at NCBI under accession numbers: illumina (SRR7153063) and nanopore (SRR7153095). Additional illumina file SRR7153071 not used here.
This is how the files have been changed from the original datasets:
Using the Galaxy platform (usegalaxy.org):
Each dataset was separately mapped to the NCBI Reference Sequence for Eucalyptus pauciflora chloroplast NC_039597.1, using BWA-MEM.
Unmapped reads were filtered out using a SAMtools flag.
Bam files were converted to fastq files.
Each fastq file was then reduced in size:
snow-gum-illumina-cp-reduced: has the first 62,500 reads only. Note that original pairing of reads has not been preserved so consider these to be unpaired reads for this tutorial.
snow-gum-nanopore-cp-reduced: has only reads that are longer than 90,000 bp.