Published February 2, 2021 | Version v1
Dataset Open

Improved contiguity of the threespine stickleback genome using long-read sequencing

  • 1. University of Georgia

Description

While the cost and time for assembling a genome has drastically decreased, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we assembled a highly contiguous genome of a freshwater fish from Paxton Lake. Using contigs from this genome, we were able to fill over 76% of the gaps in the existing reference genome assembly, improving contiguity over five-fold. Our gap filling approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Notes

paxton_lake_benthic.fa.gz is the denovo assembly constructed using PacBio Long reads from a Paxton Benthic Male.

GAculeatus_UGA_version5_UN_merged.fasta is the v. 5 genome constructed by filling in gaps in v. 4 Hi-C assembled genome.

Files

Files (604.1 MB)

Name Size Download all
md5:177c664a84c44c3155630b795b1bf84d
472.2 MB Download
md5:5700d468e1155c664d7b7335ee8e99af
131.9 MB Download

Additional details

Related works