Dataset Open Access

Chloroplast genome sequencing reads from snow gum

Syme, Anna

Tutorial data for chloroplast genome assembly: fastq reads from illumina and nanopore sequencing for the snow gum, Eucalyptus pauciflora.

Data from: Wang, W., Schalamun, M., Morales-Suarez, A. et al. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genomics 19, 977 (2018) doi:10.1186/s12864-018-5348-8

Data hosted at NCBI under accession numbers: illumina (SRR7153063) and nanopore (SRR7153095). Additional illumina file SRR7153071 not used here. 

This is how the files have been changed from the original datasets: 

Using the Galaxy platform ( 

  • Each dataset was separately mapped to the NCBI Reference Sequence for Eucalyptus pauciflora chloroplast NC_039597.1, using BWA-MEM. 

  • Unmapped reads were filtered out using a SAMtools flag. 

  • Bam files were converted to fastq files.

  • Each fastq file was then reduced in size:

  • snow-gum-illumina-cp-reduced: has the first 62,500 reads only. Note that original pairing of reads has not been preserved so consider these to be unpaired reads for this tutorial.

  • snow-gum-nanopore-cp-reduced: has only reads that are longer than 90,000 bp.

Files (64.1 MB)
Name Size
22.1 MB Download
42.0 MB Download
All versions This version
Views 112112
Downloads 7878
Data volume 2.4 GB2.4 GB
Unique views 9494
Unique downloads 3939


Cite as