There is a newer version of the record available.

Published July 4, 2024 | Version v1
Dataset Open

Gene and repeat annotation for snowy owl (Bubo scandiacus) and selected species

  • 1. ROR icon Norwegian University of Life Sciences
  • 2. ROR icon University of Oslo

Description

Here we provide the gene and repeat annotation for snowy owl (Bubo scandiacus), in addition to gene and repeat annotation done for some species this was compared to. It is unfortunately currently not possible to upload repeat annotation tracks to an international nucleotide sequence database such as ENA. While uploading the gene annotation is possible, some of the cross references to different databases in the functional annotation are removed. Further, the names of the entries in the publicly available genome assemblies on ENA have different names than what is found in the annotation tracks here, so we also provide the FASTA files for the snowy owl assemblies (bBubSca1.1.hap1.fasta.gz and bBubSca1.1.hap2.fasta.gz). Ideally, all this should have been available via ENA.

We annotated the snowy owl genome assemblies, in addition to downy woodpecker (Dryobates pubescens; GCA_014839835.1), Northern Carmine bee-eater (Merops nubicus; GCA_009819595.1), Northern goshawk (Accipiter gentilis; GCA_929443795.2) and barn owl (Tyto alba; GCF_018691265.1), since no genome annotation was publicly available for these species. We used a pre-release version of the EBP-Nor genome annotation pipeline (https://github.com/ebp-nor/GenomeAnnotation). First, AGAT (https://zenodo.org/record/7255559) agat_sp_keep_longest_isoform.pl and agat_sp_extract_sequences.pl were used on the GRCg7b (GCA_016699485.1) chicken genome assembly and annotation to generate one protein (the longest isoform) per gene. Miniprot (Li, 2023) was used to align the proteins to the curated assemblies. UniProtKB/Swiss-Prot (Consortium et al., 2022) release 2022_03 in addition to the vertebrata part of OrthoDB v11 (Kuznetsov et al., 2022) were also aligned separately to the assemblies. Red (Girgis, 2015) was run via redmask (https://github.com/nextgenusfs/redmask) on the snowy owl assemblies to mask repetitive areas (we used the soft-masked genome assemblies available at NCBI for the other species). GALBA (Brůna et al., 2023; Buchfink et al., 2015; Hoff and Stanke, 2018; Li, 2023; Stanke et al., 2006) was run with the chicken proteins using the miniprot mode on the masked assemblies. The funannotate-runEVM.py script from Funannotate was used to run EvidenceModeler (Haas et al., 2008) on the alignments of chicken proteins, UniProtKB/Swiss-Prot proteins, vertebrata proteins and the predicted genes from GALBA. The resulting predicted proteins were compared to the protein repeats that Funannotate distributes using DIAMOND blastp  and the predicted genes were filtered based on this comparison using AGAT. The filtered proteins were compared to the UniProtKB/Swiss-Prot release 2022_03 using DIAMOND (Buchfink et al., 2015) blastp to find gene names and InterProScan was used to discover functional domains. AGATs agat_sp_manage_functional_annotation.pl was used to attach the gene names and functional annotations to the predicted genes. EMBLmyGFF3 (Norling et al., 2018) was used to combine the fasta files and GFF3 files into a EMBL format for submission to ENA. These files end in gff.gz (the ones ending in fa.out.gff.gz are repeat annotations), proteins.fa.gz and mrna.fa.gz. 

All species in this study downy woodpecker (Dryobates pubescens; GCA_014839835.1), Northern Carmine bee-eater (Merops nubicus; GCA_009819595.1), Northern goshawk (Accipiter gentilis; GCA_929443795.2), barn owl (Tyto alba; GCF_018691265.1), chicken (Gallus gallus; GCF_016699485.2), zebra finch (Taeniopygia guttata; GCA_003957565.4) and California condor (Gymnogyps californianus; GCF_018139145.2) in addition to hap1 of snowy owl was repeat masked with a bird-specific library from https://www.pnas.org/doi/abs/10.1073/pnas.1616702114, provided by Alexander Suh. These files are named such as MerNubi.fa.out.gff.gz, MerNubi.fa.masked.gz and MerNubi.fna.cat.gz. 

We have also included the species specific repeat library as generated by RepeatModeler running on hap1 of snowy owl. This is called bBubSca1.1.hap1.repeatlibrary.fa.gz, with the files bBubSca1.1.hap1.fasta.masked.gz, bBubSca1.1.hap1.fasta.out.gff.gz and bBubSca1.1.hap1.fasta.cat.gz resulting from running RepeatMasker one hap1 using that library.

From the Genespace analyses we have included all files including OrthoFinder results. This is found in the file genespace.tgz.

Files

Files (8.1 GB)

Name Size Download all
md5:b9c45adf4688de4d39035491dd85a3d8
6.6 MB Download
md5:c72acf620976d82695aa12badee4a0a3
9.7 MB Download
md5:d9b79932077bb44ff60cd964206bfb74
6.1 MB Download
md5:07cc9f44a0df5203d38781db4415dea1
642.0 MB Download
md5:b982c7a59cdc1c8b26fab27b81b4a1d7
378.1 MB Download
md5:915bea48b59e1d1038c1e1a362171077
362.7 MB Download
md5:14a84145b0dfbf962d21b200338f7c93
11.8 MB Download
md5:c391e8ae505917011866a529d25b1290
6.3 MB Download
md5:84f8e30db648e71103e4b7b738baa058
9.2 MB Download
md5:e19371313d22e373a4e303e9fb3c4b60
5.9 MB Download
md5:5f1e3192a4fb6ad30562138f30f09871
284.6 kB Download
md5:e99b4978ec918b1a18641048f76055c6
323.9 MB Download
md5:6111e8ace714d3f884364091f344846f
5.6 MB Download
md5:8c2908d479d5ced27957a5aa5ef7f59c
8.3 MB Download
md5:ada462c9f8a55f5fd3aa7ae4c3be4941
5.2 MB Download
md5:d56d5e7b425eea2831ed006db32c5685
934.9 MB Download
md5:793acfd8ebfa346f55521b5d2aa7efa6
368.9 MB Download
md5:24fa99d515a6e28cd8c049777075f69c
23.0 MB Download
md5:8ca0008522021199216bea70b6bb2045
565.6 MB Download
md5:d72cab7cf878dc863056ae23a3c6b586
280.1 MB Download
md5:2536421ddf3dda164024c0ce0f0e17ef
24.7 MB Download
md5:183b5a9d01dacfc00f1170d2daa0850b
6.2 MB Download
md5:ac8c7b94eadd9800c10b4e6037ece887
9.3 MB Download
md5:9cf13dee2df4087e1b5d9692681ff448
5.8 MB Download
md5:ec7001fbb476dce3ac9e39e1f309aa35
247.3 MB Download
md5:21d23ac5544f447d11352af923f2a32b
289.3 MB Download
md5:e7982325fb17034a1fe1ac7c4695f6d8
13.9 MB Download
md5:87a68c32971fe5000157561cfa423087
1.2 GB Download
md5:e07b0fbcb81eaa01fa6cc21ba22e4195
254.3 MB Download
md5:6a635645b92d40d949a82777ba75beeb
355.7 MB Download
md5:03fb6bf41c14391923e8682c8702ea5e
12.0 MB Download
md5:9d6c634b8a7f4ca9e8a4b048ec71c2e3
269.9 MB Download
md5:0fece140e95051ff234951b7d875e6d1
319.4 MB Download
md5:a475c9073c5e69cd84769bacb1adfbbb
13.6 MB Download
md5:b6e723d66b5a4d17249e16dee3b15f18
6.0 MB Download
md5:922f2d9143c18afc351ee87479e2503a
9.1 MB Download
md5:a020f57b25d220aa759c305ac211e9cc
5.7 MB Download
md5:d3e2303380672373a00105600fdcd1c2
222.6 MB Download
md5:4f004b0fbe5d35a633773200320ea70e
294.3 MB Download
md5:266f79ea2148170d825ed9b68595d737
11.8 MB Download
md5:73227dc234333516da03e8e10c733968
229.5 MB Download
md5:505be1a6423720904a638d701ccb4873
346.0 MB Download
md5:2a409378460ee51a8df0ee956c8c4d05
13.5 MB Download
md5:6632c053fdb6ab9863a79cbbca0cfceb
6.4 MB Download
md5:8d80200ed8f64454a95280b800d6ccc2
9.2 MB Download
md5:95e1ee9545bed9e8cecc92d5a95739c8
5.7 MB Download

Additional details

Funding

The Research Council of Norway
Earth Biogenome Project Norway 326819