Published November 8, 2021 | Version v1
Other Open

Blueprint for phasing and assembling the genomes of heterozygous polyploids: Application to the octoploid genome of strawberry

  • 1. Agricultural Research Service
  • 2. University of California, Davis
  • 3. Pacific Biosciences (United States)
  • 4. Iowa State University
  • 5. Clemson University
  • 6. Michigan State University

Description

The challenge of allelic diversity for assembling haplotypes is exemplified in polyploid genomes containing homoeologous chromosomes of identical ancestry, and significant homologous variation within their ancestral subgenomes. Cultivated strawberry (Fragaria × ananassa) and its progenitors are outbred octoploids in which up to eight homologous and homoeologous alleles are preserved. This introduces significant risk of haplotype collapse, switching, and chimeric fusions during assembly. Using third generation HiFi sequences from PacBio, we assembled the genome of the day-neutral octoploid F. × ananassa hybrid 'Royal Royce' from the University of California. Our goal was to produce subgenome- and haplotype-resolved assemblies of all 56 chromosomes, accurately reconstructing the parental haploid chromosome complements.  Previous work has demonstrated that partitioning sequences by parental phase supports direct assembly of haplotypes in heterozygous diploid species. We leveraged the accuracy of HiFi sequence data with pedigree-informed sequencing to partition long read sequences by phase, and reduce the downstream risk of subgenomic chimeras during assembly. We were able to utilize an octoploid strawberry recombination breakpoint map containing 3.6 M variants to identify and break chimeric junctions, and perform scaffolding of the phase-1 and phase-2 octoploid assemblies. The N50 contiguity of the phase-1 and phase-2 assemblies prior to scaffolding and gap-filling was 11 Mb. The final haploid assembly represented seven of 28 chromosomes in a single contiguous sequence, and averaged fewer than three gaps per pseudomolecule. Additionally, we re-annotated the octoploid genome to produce a custom F. × ananassa repeat library and improved set of gene models based on IsoSeq transcript data and an expansive RNA-seq expression atlas. Here we present 'FaRR1', a gold-standard reference genome of F. × ananassa cultivar 'Royal Royce' to assist future genomic research and molecular breeding of allo-octoploid strawberry.

Notes

***WARNING: THIS DATA SUBMISSION CONTAINS FILES ASSOCIATED WITH THREE SEPARATE GENOME ASSEMBLIES:

- files with the prefix 'farr1.' are associated with the Royal Royce synthetic haploid genome (for most user applications)

- files with the prefix 'farr1_phase1.' are associated with the Royal Royce phase1 (parent haplotype A) genome

- files with the prefix 'farr1_phase2.' are associated with the Royal Royce phase2 (parent haplotype B) genome

Files

SuppTable_4.farr1_array_snp_positions.txt

Files (11.8 MB)

Name Size Download all
md5:5fbb480654dc87568f3dc2aa78689655
10.1 kB Download
md5:451655c9625586ab001ba52340db0b67
13.8 kB Download
md5:d09a6a2e6378c17c78a0367dca27779f
10.4 kB Download
md5:2c85f257fcf49d47dda21e6f0f8ab403
11.7 MB Preview Download

Additional details

Related works

Is derived from
10.25338/B8TP7G (DOI)