Test data for the Large Genome Assembly tutorial
Creators
Description
A set of test data to use for the Galaxy Training Network tutorial, Large genome assembly. This data is publicly available in other sources, but has been combined here and subsampled for easier use in the tutorial. We do not claim ownership of this data - please see the full attribution to each of the data sources explained below.
Sequencing reads:
From the Snow gum: Eucalyptus pauciflora. From NCBI BioProject number: PRJNA450887; Paper: Wang W, Das A, Kainer D, Schalamun M, Morales-Suarez A, Schwessinger B, Lanfear R; 2020, doi: 10.1093/gigascience/giz160.
From NCBI, three read files were imported into Galaxy for this tutorial: nanopore reads (SRR7153076), and paired Illumina reads (SRR7153045). For the test data set: these were randomly subsampled to 10% of the original file size, and reads mapping to related chloroplast gene sequences (rbcL sequence: accession KM360776.1; matK sequence: accession KT632904.1) were excluded.
Files: Nanopore reads; Illumina reads, R1 and R2
Reference genome:
Arabidopsis thaliana. Although this is not the same species as above, we can use it as an example for a comparison step in the tutorial. This has been downloaded from The Arabidopsis Information Resource at https://www.arabidopsis.org/index.jsp from Genes: Download: TAIR10 genome release: TAIR10 chromosome files: file TAIR10_chr_all.fas.gz. Then unzipped into a fasta file.
Files
Files
(1.3 GB)
Name | Size | Download all |
---|---|---|
md5:1fbb74ba826f41d2affa44517e982162
|
745.5 MB | Download |
md5:2c61269e625ff62be61be80236fb7108
|
205.2 MB | Download |
md5:da6696a5f7ec0ec789ce560ef8ebbaf7
|
193.2 MB | Download |
md5:127a4a1dc4c9396a0e76b0a98970813f
|
121.2 MB | Download |