Bacterial training dataset for Galaxy training network tutorials on Genome assembly

Gladman, Simon; Seemann, Torsten; Bulach, Dieter

doi:10.5281/zenodo.582600

Published May 23, 2017 | Version v1

Dataset Open

Bacterial training dataset for Galaxy training network tutorials on Genome assembly

1. Melbourne Bioinformatics, University of Melbourne

This training dataset is from an imaginary Staphylococcus aureus bacterium with a miniature genome. There is a reference genome in various formats as well as some fastq reads of a closely related but also imaginary mutant strain.

It is a useful dataset for demonstrating:

de novo genome assembly
read mapping and variant calling
genome annotation

The files included are:

wildtype.fna: the reference genome sequence of the wildtype strain in fasta format (a header line, then the nucleotide sequence of the genome.)
wildtype.gff: the reference genome sequence of the wildtype strain in general feature format (a list of features - one feature per line, then the nucleotide sequence of the genome.)
wildtype.gbk: the reference genome sequence in genbank format.
mutant_R1.fastq and mutant_R2.fastq: Fastq sequence reads of a closely related mutant strain.
- The reads are paired-end.
- Each read is 150 bases long.
- The number of bases sequenced is equivalent to 19x the genome sequence of the wildtype strain. (Read coverage 19x - rather low!).

Files

Files (9.1 MB)

Name	Size	Download all
mutant_R1.fastq md5:32ad7e3698f3f78fd35047cfd8718ea9	4.1 MB	Download
mutant_R2.fastq md5:6df54a24aed461dcf70c6a79a56f18f7	4.1 MB	Download
wildtype.fna md5:80fe318fdf4cdd0fee4a244f520a0c54	200.7 kB	Download
wildtype.gbk md5:8d8bc40a25fb7ce700abc7c34decadd0	399.7 kB	Download
wildtype.gff md5:b175fe7ba1400f1f6e60465e60b8132b	238.3 kB	Download

	All versions	This version
Views	28,295	28,114
Downloads	96,812	96,409
Data volume	340.5 GB	337.3 GB

Bacterial training dataset for Galaxy training network tutorials on Genome assembly

Creators

Description

Files

Files (9.1 MB)