Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published January 8, 2018 | Version v1
Journal article Open

De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species

  • 1. Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK
  • 2. Glasgow Polyomics, Wolfson Wohl Cancer Research Centre, University of Glasgow, G61 1QH, Glasgow, UK

Description

Background: Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improving understanding of the genetic mechanisms underlying traits in these species would significantly progress research in these fields. Here we generate high quality de novo transcriptomes for four salmonid species: Atlantic salmon (Salmo salar), brown trout (Salmo trutta), Arctic charr (Salvelinus alpinus), and European whitefish (Coregonus lavaretus). All species except Atlantic salmon have no reference genome publicly available and few if any genomic studies to date.

Results: We used paired-end RNA-seq on Illumina to generate high coverage sequencing of multiple individuals, yielding between 180 and 210 M reads per species. After initial assembly, strict filtering was used to remove duplicated, redundant, and low confidence transcripts. The final assemblies consisted of 36,505 protein-coding transcripts for Atlantic salmon, 35,736 for brown trout, 33,126 for Arctic charr, and 33,697 for European whitefish and are made publicly available. Assembly completeness was assessed using three approaches, all of which supported high quality of the assemblies: 1) ~78% of Actinopterygian single-copy orthologs were successfully captured in our assemblies, 2) orthogroup inference identified high overlap in the protein sequences present across all four species (40% shared across all four and 84% shared by at least two), and 3) comparison with the published Atlantic salmon genome suggests that our assemblies represent well covered (~98%) protein-coding transcriptomes. Thorough comparison of the generated assemblies found that 84-90% of transcripts in each assembly were orthologous with at least one of the other three species. We also identified 34-37% of transcripts in each assembly as paralogs. We further compare completeness and annotation statistics of our new assemblies to available related species.

Conclusion: New, high-confidence protein-coding transcriptomes were generated for four ecologically and economically important species of salmonids. This offers a high quality pipeline for such complex genomes, represents a valuable contribution to the existing genomic resources for these species and provides robust tools for future investigation of gene expression and sequence evolution in these and other salmonid species.

Files

12864_2017_4379_MOESM1_ESM.pdf

Files (10.9 MB)

Name Size Download all
md5:7ee6eb451289310642e08d702d0e97e0
59.6 kB Preview Download
md5:94f88ef0c46d362c4fff720d89770958
9.0 MB Download
md5:d85a39c95fb350151c904d17b52151dd
59.2 kB Preview Download
md5:7abba1d70362214a90556390266b5c6e
233.8 kB Preview Download
md5:33c0c2bba7efd4b7ae1a443dbe9568f8
36.7 kB Preview Download
md5:faca6bdd4268c19eb66d90d1a1367198
252.0 kB Preview Download
md5:d5806fe63bf5b9907013bab624e7c5f2
35.3 kB Preview Download
md5:0c836ab6c92fcb54862defe24a68b0d1
1.2 MB Preview Download
md5:7fdcd0dc7b44aaf998e6cb74217849e2
17.5 kB Download

Additional details

Funding

GEN ECOL ADAPT – Adaptation genomics of trophic polymorphism 321999
European Commission