Published November 8, 2021 | Version v7
Dataset Open

Training data for 'Genome annotation with Funannotate' tutorial (Galaxy Training Material)

Description

The data provided here are part of a Galaxy Training Network tutorial for genome annotation with funannotate.

Genome was assembled following the GTN Flye assembly tutorial, then masked with RepeatMasker.

RNASeq data: SRR8534859 reads were mapped to the genome using STAR (toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0), then the bam was downsampled (10% with toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_DownsampleSam/2.18.2.1) to reduce the size of the dataset. Fastq files were then extracted from the resulting bam file (toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_SamToFastq/2.18.2.1).

SwissProt_subset.fasta is a subset of SwissProt proteins that are known to have some similarity with the genome (found using Diamond against the genome, then extracting sequences matching with e-value < 0.0001).

Files

Files (547.1 MB)

Name Size Download all
md5:13eb71e0504e55d89e6f8a8a3d00acdc
78.4 MB Download
md5:688b17c52af781bb8f2b7f303a3660b4
11.3 MB Download
md5:81fd396797c394c1968012af7e0ebf18
49.2 MB Download
md5:ca50ac884a00ccb7553287b9d601ecd9
184.0 MB Download
md5:df5de61de301484850e2244f06a459d1
221.4 MB Download
md5:12d46b3ad9b1b2b5c73c14f6b19b4a9c
2.9 MB Download