Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific
Authors/Creators
- 1. University of California, Santa Cruz
- 2. Smithsonian Tropical Research Institute
- 3. Universidad Autónoma de Baja California Sur
- 4. University of Massachusetts Boston
Description
Holacanthus angelfishes are some of the most iconic marine fishes of the Tropical Eastern Pacific (TEP). However, very limited genomic resources currently exist for the genus. In this study we: i) assembled and annotated the nuclear genome of the King Angelfish (Holacanthus passer), and ii) examined the demographic history of H. passer in the TEP. We generated 43.8 Gb of ONT and 97.3 Gb Illumina reads representing 75X and 167X coverage, respectively. The final genome assembly size was 583 Mb with a contig N50 of 5.7 Mb, which captured 97.5% complete Actinoterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs). Repetitive elements account for 5.09% of the genome, and 33,889 protein-coding genes were predicted, of which 22,984 have been functionally annotated. Our demographic model suggests that population expansions of H. passer occurred prior to the last glacial maximum (LGM) and were more likely shaped by events associated with the closure of the Isthmus of Panama. This result is surprising, given that most rapid population expansions in both freshwater and marine organisms have been reported to occur globally after the LGM. Overall, this annotated genome assembly will serve as a resource to improve our understanding of the evolution of Holacanthus angelfishes while facilitating novel research into local adaptation, speciation, and introgression in marine fishes.
Other
Funding provided by: Consejo Nacional de Humanidades, Ciencias y Tecnologías
Crossref Funder Registry ID: https://ror.org/059ex5q34
Award Number:
Methods
To annotate our genome, we used the homology-based gene prediction pipeline GeMoMa (v1.6.4). GeMoMa uses protein-coding gene models and intron position conservation from reference genomes to predict possible protein-coding genes in a target genome (Keilwagen et al., 2018). Here, we ran the GeMoMa pipeline using annotations from three fish species: Amphiprion ocellaris, Oreocromis niloticus, Electrophorus electricus (downloaded from NCBI, see Table S3). These species were selected to represent a variety of genes from close to distant high-quality fish annotations. In our particular case, the pipeline performed four main steps: 1) Extractor or external search, using the search algorithm tbalstn with cds parts as queries from our reference genomes, 2) Gene Model Mapper (GeMoMa), which builds gene models from the extractor results, 3) GeMoMa Annotation Filter (GAF) that filters and combines common gene predictions and 4) AnnotationFinalizer, which predicts UTRs for annotated coding sequences and generate genes and transcripts names (Keilwagen et al., 2018). Additionally, repetitive elements were predicted by running RepeatMasker (open-4.0.6, Smit et al. 2013–2015) with the Teleostei database to identify repetitive elements in the genome and soft-mask the assembly. RepeatMasker.out was converted to GFF with RepeatMasker script `rmOutToGFF3.pl`.
Files
protocol_GeMoMaPipeline.txt
Files
(72.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4817ead382fedfc1822f9263e4accd9b
|
1.3 kB | Download |
|
md5:981a7f666f1414471e09a30e65becc74
|
93.5 kB | Download |
|
md5:c13a8e2e5b54907fb13734891863db00
|
49.2 MB | Download |
|
md5:2577863e32c65e60447d469f1791a77f
|
22.6 MB | Download |
|
md5:05179acbd3703dfbe24dcf464966b5f9
|
93.0 kB | Preview Download |
|
md5:79f9dfe3c26fdf57b5af7ac0923c88b1
|
1.1 kB | Preview Download |