Published January 8, 2020 | Version v1
Dataset Open

Chloroplast genome sequencing reads from snow gum

Creators

Description

Tutorial data for chloroplast genome assembly: fastq reads from illumina and nanopore sequencing for the snow gum, Eucalyptus pauciflora.

Data from: Wang, W., Schalamun, M., Morales-Suarez, A. et al. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genomics 19, 977 (2018) doi:10.1186/s12864-018-5348-8

Data hosted at NCBI under accession numbers: illumina (SRR7153063) and nanopore (SRR7153095). Additional illumina file SRR7153071 not used here. 

This is how the files have been changed from the original datasets: 

Using the Galaxy platform (usegalaxy.org): 

  • Each dataset was separately mapped to the NCBI Reference Sequence for Eucalyptus pauciflora chloroplast NC_039597.1, using BWA-MEM. 

  • Unmapped reads were filtered out using a SAMtools flag. 

  • Bam files were converted to fastq files.

  • Each fastq file was then reduced in size:

  • snow-gum-illumina-cp-reduced: has the first 62,500 reads only. Note that original pairing of reads has not been preserved so consider these to be unpaired reads for this tutorial.

  • snow-gum-nanopore-cp-reduced: has only reads that are longer than 90,000 bp.

Files

Files (64.1 MB)

Name Size Download all
md5:07f1e4d07d6f2dbd31a1507ac8222beb
22.1 MB Download
md5:b422993cc7545ae252637d1106fd9d4e
42.0 MB Download