Published November 12, 2020 | Version v1
Dataset Open

Supplementary dataset to "Draft genome assembly of the biofuel grass crop Miscanthus sacchariflorus"

  • 1. Earlham Institute

Description

Miscanthus sacchariflorus (Maxim.) Hack. is a C4 perennial rhizomatous biofuel grass crop. M. sacchariflorus is among the most widely distributed species within the genus, particularly at cold northern latitudes, and one of the progenitor species of the main biomass commercial crop M. × giganteus. We generated a 2.54 Gbps whole-genome assembly of the diploid M. sacchariflorus “Robustus 297” genotype, which represented ~59% of the expected genome size. We later anchored this assembly in the chromosomal-scale M. sinensis genome to improve its contiguity. We annotated 86,767 and 69,049 protein-coding genes in the unanchored and anchored, respectively. We estimated our assemblies include ~85% of the M. sacchariflorus genes based on homology, core markers and RNA-seq alignments stats. Raw data and further metadata are available under Bioproject PRJNA435476.

  • Msac_v2.fasta: Unanchored whole-genome assembly (WGA) of M. sacchariflorus in FASTA format.
  • Msac_v3.fasta: The previous WGA re-scaffolded with the M. sinensis public reference.
  • Msac_v3.agp: Chromosomal position in the M. sinensis reference of the previous scaffolds in Msac_v3.fasta
  • Msac_v2.gff3: Gene annotation of the unanchored WGA in GFF3 format, which contains 86,767 coding genes
  • Msac_v3.gff3: Gene annotation of the anchored WGA in GFF3 format, which contains 69,049 coding genes
  • Msac_v2.func_annot.tsv: Text table containing the functional annotation of the 86,767 coding genes in Msac_v2.gff3
  • Msac_v2.repeats_annotation.gff3: Repeats annotation (Repeatmasker) of the unanchored reference.
  • Msac_v2.masked.fasta.gz: Repeats-masked version (Repeatmasker) of Msac_v2.fasta
  • all.satsuma.blocks_Msac_v2-vs-Msin.gz: Every alignment from scaffolds in Msac_v3.fasta into M. sinensis reference
  • Msac_v2.orthology_Msin.tsv: Ortologous between Msac_v2 and M. sinensis
  • Msac_v3-vs-Msin.tsv: Ortologous between Msac_v3 and M. sinensis

Files

Files (1.3 GB)

Name Size Download all
md5:b493a76ed8924b4a082980c3911532ae
78.1 MB Download
md5:e456585fa4e4237be9bdd6da207e2388
491.8 MB Download
md5:38c3e140e85770fad9c3568da4d1c648
15.9 MB Download
md5:df9e133a543e50996e7bbbf9b2233478
13.0 MB Download
md5:feb3f65b33c3ac8f1ebcebd9bbd72530
252.7 MB Download
md5:ba7464cf36bef8f2bd9354c753f770d2
2.5 MB Download
md5:67ee8454829335c92b894c228f1f5c01
58.5 MB Download
md5:5c821a80a2093ee10acab2be1a8d7241
1.5 MB Download
md5:785ed643fe645f250895d98e3cce0668
3.6 MB Download
md5:f3378c92b42155ebd2d43adaa6be1e2d
407.8 MB Download
md5:e47be8ab3527580e8cdc241b6f75902b
7.9 MB Download

Additional details

Related works

Is derived from
Dataset: PRJNA435476 (bioproject)

Funding

Signatures of Domestication and Adaptation BBS/E/T/000PR9818
UK Research and Innovation
Genetic resources for the dissection of bioenergy traits BBS/E/W/10963A01A
UK Research and Innovation