Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific

Gatins, Remy; Arias, Carlos F.; Sánchez, Carlos; Bernardi, Giacomo; De León, Luis F.

doi:10.7291/D1X10B

Published October 24, 2023 | Version v1

Dataset Open

Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific

1. University of California, Santa Cruz
2. Smithsonian Tropical Research Institute
3. Universidad Autónoma de Baja California Sur
4. University of Massachusetts Boston

Holacanthus angelfishes are some of the most iconic marine fishes of the Tropical Eastern Pacific (TEP). However, very limited genomic resources currently exist for the genus. In this study we: i) assembled and annotated the nuclear genome of the King Angelfish (Holacanthus passer), and ii) examined the demographic history of H. passer in the TEP. We generated 43.8 Gb of ONT and 97.3 Gb Illumina reads representing 75X and 167X coverage, respectively. The final genome assembly size was 583 Mb with a contig N50 of 5.7 Mb, which captured 97.5% complete Actinoterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs). Repetitive elements account for 5.09% of the genome, and 33,889 protein-coding genes were predicted, of which 22,984 have been functionally annotated. Our demographic model suggests that population expansions of H. passer occurred prior to the last glacial maximum (LGM) and were more likely shaped by events associated with the closure of the Isthmus of Panama. This result is surprising, given that most rapid population expansions in both freshwater and marine organisms have been reported to occur globally after the LGM. Overall, this annotated genome assembly will serve as a resource to improve our understanding of the evolution of Holacanthus angelfishes while facilitating novel research into local adaptation, speciation, and introgression in marine fishes.

Other

Funding provided by: Consejo Nacional de Humanidades, Ciencias y Tecnologías
Crossref Funder Registry ID: https://ror.org/059ex5q34
Award Number:

Methods

To annotate our genome, we used the homology-based gene prediction pipeline GeMoMa (v1.6.4). GeMoMa uses protein-coding gene models and intron position conservation from reference genomes to predict possible protein-coding genes in a target genome (Keilwagen et al., 2018). Here, we ran the GeMoMa pipeline using annotations from three fish species: Amphiprion ocellaris, Oreocromis niloticus, Electrophorus electricus (downloaded from NCBI, see Table S3). These species were selected to represent a variety of genes from close to distant high-quality fish annotations. In our particular case, the pipeline performed four main steps: 1) Extractor or external search, using the search algorithm tbalstn with cds parts as queries from our reference genomes, 2) Gene Model Mapper (GeMoMa), which builds gene models from the extractor results, 3) GeMoMa Annotation Filter (GAF) that filters and combines common gene predictions and 4) AnnotationFinalizer, which predicts UTRs for annotated coding sequences and generate genes and transcripts names (Keilwagen et al., 2018). Additionally, repetitive elements were predicted by running RepeatMasker (open-4.0.6, Smit et al. 2013–2015) with the Teleostei database to identify repetitive elements in the genome and soft-mask the assembly. RepeatMasker.out was converted to GFF with RepeatMasker script `rmOutToGFF3.pl`.

Files

protocol_GeMoMaPipeline.txt

Files (72.0 MB)

Name	Size	Download all
gemoma.job md5:4817ead382fedfc1822f9263e4accd9b	1.3 kB	Download
gemoma_HPA_1.1.log md5:981a7f666f1414471e09a30e65becc74	93.5 kB	Download
HPA_1.1_annotation.gff md5:c13a8e2e5b54907fb13734891863db00	49.2 MB	Download
HPA_1.1_predicted_proteins.fasta md5:2577863e32c65e60447d469f1791a77f	22.6 MB	Download
protocol_GeMoMaPipeline.txt md5:05179acbd3703dfbe24dcf464966b5f9	93.0 kB	Preview Download
README.md md5:79f9dfe3c26fdf57b5af7ac0923c88b1	1.1 kB	Preview Download

Additional details

Is source of: https://www.ncbi.nlm.nih.gov/bioproject/713824 (URL); https://github.com/remygatins/Holacanthus_passer-ONT-Illumina-Genome-Assembly (URL); 10.5281/zenodo.10035364 (DOI)

	All versions	This version
Views	177	177
Downloads	102	102
Data volume	1.4 GB	1.4 GB

Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific

Authors/Creators

Description

Other

Methods

Files

protocol_GeMoMaPipeline.txt

Files (72.0 MB)

Additional details

Related works