Training data for 'Genome annotation with Funannotate' tutorial (Galaxy Training Material)

Anthony Bretaudeau; Alexandre Cormier; Stéphanie Robin; Erwan Corre; Laura Leroi

doi:10.5281/zenodo.5726818

Published November 8, 2021 | Version v4

Dataset Open

Training data for 'Genome annotation with Funannotate' tutorial (Galaxy Training Material)

1. INRAE
2. Ifremer
3. CNRS

The data provided here are part of a Galaxy Training Network tutorial for genome annotation with funannotate.

Genome was assembled following the GTN Flye assembly tutorial, then masked with RepeatMasker.

RNASeq data: SRR8534859 reads were mapped to the genome using STAR (toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0), then the bam was downsampled (10% with toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_DownsampleSam/2.18.2.1) to reduce the size of the dataset. Fastq files were then extracted from the resulting bam file (toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_SamToFastq/2.18.2.1).

SwissProt_subset.fasta is a subset of SwissProt proteins that are known to have some similarity with the genome (found using Diamond against the genome, then extracting sequences matching with e-value < 0.0001).

Files

Files (457.9 MB)

Name	Size	Download all
genome_masked.fasta md5:e28b3275a1a45057b87d193c1df6168b	49.6 MB	Download
rnaseq_R1.fq.gz md5:ca50ac884a00ccb7553287b9d601ecd9	184.0 MB	Download
rnaseq_R2.fq.gz md5:df5de61de301484850e2244f06a459d1	221.4 MB	Download
SwissProt_subset.fasta md5:12d46b3ad9b1b2b5c73c14f6b19b4a9c	2.9 MB	Download

	All versions	This version
Views	3,840	444
Downloads	4,605	233
Data volume	612.8 GB	27.4 GB

Training data for 'Genome annotation with Funannotate' tutorial (Galaxy Training Material)

Authors/Creators

Description

Files

Files (457.9 MB)