The genome of flax ( Linum usitatissimum ) assembled de novo from short shotgun sequence reads
Creators
- Wang, Zhiwen
- Hobson, Neil
- Galindo, Leonardo
- Zhu, Shilin
- Shi, Daihu
- McDill, Joshua
- Yang, Linfeng
- Hawkins, Simon
- Neutelings, Godfrey
- Datla, Raju
- Lambert, Georgina
- Galbraith, David W.
- Grassa, Christopher J.
- Geraldes, Armando
- Cronk, Quentin C.
- Cullis, Christopher
- Dash, Prasanta K.
- Kumar, Polumetla A.
- Cloutier, Sylvie
- Sharpe, Andrew G.
- Wong, Gane K.-S.
- Wang, Jun
- Deyholos, Michael K.
Description
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) = 694 kb, including contigs with N(50) = 20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43 384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.
Files
article.pdf
Files
(620.7 kB)
Name | Size | Download all |
---|---|---|
md5:3764195cabf4f73e54eacf7e297a21af
|
620.7 kB | Preview Download |