Published July 11, 2022 | Version v1
Software Open

Transcript- and annotation-guided genome assembly of the European starling

  • 1. University of New South Wales
  • 2. University of Sydney
  • 3. University of Missouri
  • 4. University of Queensland
  • 5. Cornell University
  • 6. United States Department of Agriculture
  • 7. University of Maryland, College Park
  • 8. Newcastle University
  • 9. Carnegie Museum of Natural History
  • 10. Deakin University
  • 11. University of Adelaide
  • 12. Clemson University
  • 13. Ghent University
  • 14. University of Edinburgh

Description

The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second North American genome (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 1,628 scaffolds (72.5 Mb scaffold N50). Species-specific transcript mapping and gene annotation revealed high structural and functional completeness (94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Rapid, recent advances in sequencing technologies and bioinformatics software have highlighted the need for evidence-based assessment of assembly decisions on a case-by-case basis. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. Finally, we present a second starling assembly, S. vulgaris vNA, to facilitate comparative analysis and global genomic research on this ecologically important species.

Notes

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: LP160100610

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: LP18010072

Funding provided by: Human Sciences Frontier Programme*
Crossref Funder Registry ID:
Award Number: RGP0030/2015

Funding provided by: Roslin Institute Strategic Grant*
Crossref Funder Registry ID:
Award Number: BB/P013759/1

Funding provided by: UNSW Scientia Fellowship*
Crossref Funder Registry ID:
Award Number:

Files

Annotation_saaga_assessment.pdf

Files (2.5 MB)

Name Size Download all
md5:e07a5784136535416d85b05e99923cd6
171.9 kB Preview Download
md5:440de954a9c411a730d83eb1a6ea3cb1
125.7 kB Preview Download
md5:8da93ce2fdecccac04798faac296685c
146.4 kB Preview Download
md5:9f72b98c9b29384f92a90aa944b1ae49
147.6 kB Preview Download
md5:14da7c81e0b9ba1de9ce4eb55fcfddcb
164.0 kB Preview Download
md5:749b559b1a5724cc8848b9ec0a95c8cc
167.7 kB Preview Download
md5:4186d9ba5546da13932c9ff99d0e394b
141.3 kB Preview Download
md5:3d269b68e41ffbbf67742f5a00a81070
128.5 kB Preview Download
md5:e90587355c709c0a29106fb8dd5afad3
137.9 kB Preview Download
md5:09986a441797ad1f24a4a8bdc16c7f29
143.0 kB Preview Download
md5:edefcebc77a3f957c872daadf75e34c7
149.7 kB Preview Download
md5:666b3325acd036bb2d00469771a8f896
174.0 kB Preview Download
md5:7ca94781ea34482de711f952bfc9225f
175.0 kB Preview Download
md5:b8d3332b60e1807b26a7e661457a42f0
169.5 kB Preview Download
md5:ccf5c3350e31a88478d72738def2a16a
164.9 kB Preview Download
md5:3e223e0d888491baf379c025968aca6f
169.6 kB Preview Download

Additional details

Related works

Is cited by
10.1101/2021.04.07.438753 (DOI)
Is source of
10.5061/dryad.02v6wwq5z (DOI)