Published July 11, 2022 | Version v1
Dataset Open

Transcript- and annotation-guided genome assembly of the European starling

  • 1. University of New South Wales
  • 2. University of Sydney
  • 3. University of Missouri
  • 4. University of Queensland
  • 5. Cornell University
  • 6. United States Department of Agriculture
  • 7. University of Maryland, College Park
  • 8. Newcastle University
  • 9. Carnegie Museum of Natural History
  • 10. Deakin University
  • 11. University of Adelaide
  • 12. Clemson University
  • 13. Ghent University
  • 14. University of Edinburgh


The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second North American genome (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 1,628 scaffolds (72.5 Mb scaffold N50). Species-specific transcript mapping and gene annotation revealed high structural and functional completeness (94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Rapid, recent advances in sequencing technologies and bioinformatics software have highlighted the need for evidence-based assessment of assembly decisions on a case-by-case basis. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. Finally, we present a second starling assembly, S. vulgaris vNA, to facilitate comparative analysis and global genomic research on this ecologically important species.


Funding provided by: Australian Research Council
Crossref Funder Registry ID:
Award Number: LP160100610

Funding provided by: Australian Research Council
Crossref Funder Registry ID:
Award Number: LP18010072

Funding provided by: Human Sciences Frontier Programme*
Crossref Funder Registry ID:
Award Number: RGP0030/2015

Funding provided by: Roslin Institute Strategic Grant*
Crossref Funder Registry ID:
Award Number: BB/P013759/1

Funding provided by: UNSW Scientia Fellowship*
Crossref Funder Registry ID:
Award Number:



Files (2.2 GB)

Name Size Download all
2.1 kB Preview Download
1.1 GB Download
1.2 GB Download
449.6 kB Download
550.6 kB Download

Additional details

Related works

Is cited by
10.1101/2021.04.07.438753 (DOI)
Is derived from
10.5281/zenodo.6814567 (DOI)
Is source of
10.5281/zenodo.6814569 (DOI)