Published July 15, 2020 | Version v1
Dataset Open

Data from: A chromosomal-scale genome assembly of Tectona grandis reveals the importance of tandem gene duplication and enables discovery of genes in natural product biosynthetic pathways

Description

Background: Teak, a member of the Lamiaceae family, produces one of the most expensive hardwoods in the world. High demand coupled with deforestation have caused a decrease in natural teak forests, and future supplies will be reliant on teak plantations. Hence, selection of teak tree varieties for clonal propagation with superior growth performance is of great importance, and access to high-quality genetic and genomic resources can accelerate the selection process by identifying genes underlying desired traits. Findings: To facilitate teak research and variety improvement, we generated a highly contiguous, chromosomal-scale genome assembly using high-coverage PacBio long reads coupled with high-throughput chromatin conformation capture. Of the 18 teak chromosomes, we generated 17 near-complete pseudomolecules with one chromosome present as two chromosome arm scaffolds. Genome annotation yielded 31,168 genes encoding 46,826 gene models, of which, 39,930 and 41,155 had Pfam domain and expression evidence, respectively. We identified 14 clusters of tandem-duplicated terpene synthases (TPSs), genes central to the biosynthesis of terpenes which are involved in plant defense and pollinator attraction. Transcriptome analysis revealed 10 TPSs highly expressed in woody tissues, of which, 8 were in tandem, revealing the importance of resolving tandemly duplicated genes and the quality of the assembly and annotation. We also validated the enzymatic activity of four TPSs to demonstrate the function of key TPSs. Conclusions: In summary, this high-quality chromosomal-scale assembly and functional annotation of the teak genome will facilitate the discovery of candidate genes related to traits critical for sustainable production of teak and for anti-insecticidal natural products.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: IOS-1444499

Files

teak_working_gene_fpkm_matrix_con_sorted.txt

Files (992.5 MB)

Name Size Download all
md5:51cf4426e59abae72feba81e8b10d505
14.9 kB Download
md5:548c52d272d4b0e9db4fafaf24946f0c
91.5 MB Download
md5:380b1fd7e7e805d644d171932cf5e09f
66.1 MB Download
md5:9a5f912697591e124a8e88f6f0d0caf5
22.6 MB Download
md5:187ea37ee0ddd8fbe739f8ecfafce2f5
72.7 MB Download
md5:b58ececc59daa3d8447587ae207b828a
89.3 MB Download
md5:70e5e60d57a114f58bec7455e10a7e11
64.4 MB Download
md5:0b396e9da7e9c8227137a0153c50f79c
23.1 MB Download
md5:3952854d455e607a67072e922ba0ddaa
70.1 MB Download
md5:a2ec4f5f12fec42d6bc0b41c1a8371e1
53.2 MB Download
md5:365efc5f364b4ff8e7284a5ee7f54b33
40.4 MB Download
md5:ee9134486325e360bb50b888f6a61b11
14.6 MB Download
md5:c60e3e67d31ef097072ee6ebb9693b28
39.6 MB Download
md5:587a8f5d21ac98dfd83bad25ecd1aac1
341.7 MB Download
md5:9b76c92c449db81a27fdf043d67eecc7
3.1 MB Preview Download

Additional details

Related works