Supplementary dataset to "Draft genome assembly of the biofuel grass crop Miscanthus sacchariflorus"

De Vega, JJ

  <dc:description>Miscanthus sacchariflorus (Maxim.) Hack. is a C4 perennial rhizomatous biofuel grass crop. M. sacchariflorus is among the most widely distributed species within the genus, particularly at cold northern latitudes, and one of the progenitor species of the main biomass commercial crop M. × giganteus. We generated a 2.54 Gbps whole-genome assembly of the diploid M. sacchariflorus “Robustus 297” genotype, which represented ~59% of the expected genome size. We later anchored this assembly in the chromosomal-scale M. sinensis genome to improve its contiguity. We annotated 86,767 and 69,049 protein-coding genes in the unanchored and anchored, respectively. We estimated our assemblies include ~85% of the M. sacchariflorus genes based on homology, core markers and RNA-seq alignments stats. Raw data and further metadata are available under Bioproject PRJNA435476.

	Msac_v2.fasta: Unanchored whole-genome assembly (WGA) of M. sacchariflorus in FASTA format.
	Msac_v3.fasta: The previous WGA re-scaffolded with the M. sinensis public reference.
	Msac_v3.agp: Chromosomal position in the M. sinensis reference of the previous scaffolds in Msac_v3.fasta
	Msac_v2.gff3: Gene annotation of the unanchored WGA in GFF3 format, which contains 86,767 coding genes
	Msac_v3.gff3: Gene annotation of the anchored WGA in GFF3 format, which contains 69,049 coding genes
	Msac_v2.func_annot.tsv: Text table containing the functional annotation of the 86,767 coding genes in Msac_v2.gff3
	Msac_v2.repeats_annotation.gff3: Repeats annotation (Repeatmasker) of the unanchored reference.
	Msac_v2.masked.fasta.gz: Repeats-masked version (Repeatmasker) of Msac_v2.fasta
	all.satsuma.blocks_Msac_v2-vs-Msin.gz: Every alignment from scaffolds in Msac_v3.fasta into M. sinensis reference
	Msac_v2.orthology_Msin.tsv: Ortologous between Msac_v2 and M. sinensis
	Msac_v3-vs-Msin.tsv: Ortologous between Msac_v3 and M. sinensis
  <dc:subject>Genome assembly</dc:subject>
  <dc:subject>C4 photosynthesis</dc:subject>
  <dc:subject>gene annotation</dc:subject>
