Published March 24, 2023 | Version v1
Software Open

Supporting Data for: The genome of the pygmy right whale illuminates the evolution of rorquals

  • 1. Senckenberg Biodiversity and Climate Research Centre
  • 2. Goethe University Frankfurt
  • 3. LOEWE Centre for Translational Biodiversity Genomics
  • 4. Lund University

Description

Background

Baleen whales are a clade of gigantic and highly specialized marine mammals. Their genomes have been used to investigate their complex evolutionary history and to decipher the molecular mechanisms that allowed them to reach these dimensions. However, many unanswered questions remain, especially about the early radiation of rorquals and how cancer resistance interplays with their huge number of cells. The pygmy right whale is the smallest and most elusive among the baleen whales. It reaches only a fraction of the body length compared to its relatives and it is the only living member of an otherwise extinct family. This placement makes the pygmy right whale genome an interesting target to update the complex phylogenetic past of baleen whales, because it splits up an otherwise long branch that leads to the radiation of rorquals. Apart from that, genomic data of this species might help to investigate cancer resistance in large whales, since these mechanisms are not as important for the pygmy right whale as in other giant rorquals and right whales.

Results

Here, we present a first de novo genome of the species and test its potential in phylogenomics and cancer research. To do so, we constructed a multi-species coalescent tree from fragments of a whole-genome alignment and quantified the amount of introgression in the early evolution of rorquals. Furthermore, a genome wide comparison of selection rates between large and small bodied baleen whales revealed a small set of conserved candidate genes with potential connections to cancer resistance.

Conclusions

Our results suggest that the evolution of rorquals is best described as a hard polytomy with a rapid radiation and high levels of introgression. The lack of shared positive selected genes between different large-bodied whale species supports a previously proposed convergent evolution of gigantism and hence cancer resistance in baleen whales.

Notes

General Usage:

Many files containing sequence data are zipped using gzip. Use "gunzip" to reverse this. Also, directories containing many sub-files are compiled in a tar ball. Use "tar -xzvf" to open the directory first.

Usage Annotation Data:

The assembly as well as the cds and amino acid sequences are in typical fasta format and can be viewed by any type of text editor. The gene ID within all these files are named after the best hit within one of the used reference annotations used for homology-based annotation.

Usage Phylogenomics Data:

All alignments including the WGA, WGA fragments and SCOSs are in fasta alignment format and can again be opened by any text editor. To better understand their quality however, we recommend alignment viewing software like AliView (http://genocat.tools/tools/aliview.html). SCOS raw sequences are in regular fasta format and can be opened with any text editor. Within WGA sequences, header represent a short 6- character long species identified made from their scientific name. All SCOS gene IDs and hence header are denoted by first naming the source species and then the species from the reference annotation, separated by a "-" symbol. SNPs are contained within a vcf file that can be opened in any text editor. Trees, regardless of WGA trees or SCOS trees, are in newick format and can be opened by any type of phylogenetic program. To view and annotate trees we recommend the ITOL webserver: https://itol.embl.de/.

Usage Selection Analysis Data:

Raw SCOS sequences are in fasta format and can be opened by any text editor. Header name the species first, followed by the gene ID usually describing their homology-based source organism. Alignments, used for Ka/Ks calculation are in ClustalN format and can be opened by any text editor, however AliView (http://genocat.tools/tools/aliview.html) is also able to open them. AXT alignment files are specifically created to be used by the software KaKs_Calculator v2 (Wang et al, 2010). They can be opened by any text editor but were originally only constructed to identify non-synonymous and synonymous mutations.

References:

Wang D, Zhang Y, Zhang Z, Zhu J, Yu J. KaKs_Calculator 2.0: A Toolkit Incorporating Gamma-Series Methods and Sliding Window Strategies. Genomics, Proteomics & Bioinformatics. 2010;8(1):77–80. doi:10.1016/S1672-0229(10)60008-3.

Files

GEMOMA-to-Phylogeny.zip

Files (41.4 MB)

Name Size Download all
md5:9a159b27d31cc4edeb150e520be4ca24
41.4 MB Preview Download

Additional details

Related works