Published September 7, 2022 | Version v1
Dataset Open

Test data for the Large Genome Assembly tutorial

Creators

Description

A set of test data to use for the Galaxy Training Network tutorial, Large genome assembly. This data is publicly available in other sources, but has been combined here and subsampled for easier use in the tutorial. We do not claim ownership of this data - please see the full attribution to each of the data sources explained below.

Sequencing reads:

From the Snow gum: Eucalyptus pauciflora. From NCBI BioProject number: PRJNA450887; Paper: Wang W, Das A, Kainer D, Schalamun M, Morales-Suarez A, Schwessinger B, Lanfear R; 2020, doi: 10.1093/gigascience/giz160.

From NCBI, three read files were imported into Galaxy for this tutorial: nanopore reads (SRR7153076), and paired Illumina reads (SRR7153045). For the test data set: these were randomly subsampled to 10% of the original file size, and reads mapping to related chloroplast gene sequences (rbcL sequence: accession KM360776.1; matK sequence: accession KT632904.1) were excluded. 

Files:  Nanopore reads; Illumina reads, R1 and R2

Reference genome: 

Arabidopsis thaliana. Although this is not the same species as above, we can use it as an example for a comparison step in the tutorial. This has been downloaded from The Arabidopsis Information Resource at https://www.arabidopsis.org/index.jsp from Genes: Download: TAIR10 genome release: TAIR10 chromosome files: file TAIR10_chr_all.fas.gz. Then unzipped into a fasta file. 

Files

Files (1.3 GB)

Name Size Download all
md5:1fbb74ba826f41d2affa44517e982162
745.5 MB Download
md5:2c61269e625ff62be61be80236fb7108
205.2 MB Download
md5:da6696a5f7ec0ec789ce560ef8ebbaf7
193.2 MB Download
md5:127a4a1dc4c9396a0e76b0a98970813f
121.2 MB Download