2.1. Volatile terpenoid distributions in C. indicum tissues

To analyze the distributions of volatile terpenoids in different parts of C. indicum, fresh root, stem, leaf, flower bud and flower tissues were extracted and subjected to volatile metabolic profiling through gas chromatography–mass spectrometry (GC–MS). The terpenoid profiles differed by tissue type (Fig. 1A). Of the 44 terpenoids identified, 17 were monoterpenes and 27 were sesquiterpenes (Fig. S1 and Supplementary Table S1). Abundance clustering of these volatile terpenoids was performed using the heatmap package in OmicShare Tools, an online data analysis platform (https://www.omicshare.com/tools/; Fig. 1B). Total ion chromatography revealed clear differences in the composition and quantity of terpenoids distributed in the various tissues (Fig. 1 and Supplementary Table S1). Most of the terpenoids were observed in all five tissue types, but others were found only in specific tissues. For example, petasitene was only detected in root tissue. Overall, the terpenoids were most abundant in the flower bud tissue. Notably, the abundance of the sesquiterpenoids was substantially higher than that of the monoterpenoids. In the root, sesquiterpenoids were 4.95 times more abundant than monoterpenoids (Fig. 2A). The percentages of different volatile terpenoids in the five types of tissue differed significantly (Fig. 2 and Fig. S2). For example, β- farnesene accounted for 70% of the 23 types of volatile terpenoids detected in the root. In the flower bud, the three most abundant metabolites were β- farnesene (25.2%), α- pinene (18.8%), and camphor (10%; Fig. 2B).

2.2. Isolation of four CiTPS and sequence analysis

On the basis of the high reads per kb per million reads (RPKM) values of the root, flower bud, or flower from the transcriptome data, four TPSs, namely the coding regions of CiTPS1, CiTPS2, CiTPS3, and CiTPS4 (unigenes 0021,699, 0037,767, 0060,549, and 0062,052, respectively) were cloned and sequenced. They encode 602, 588, 547, and 559 amino acids, respectively. The molecular weights of the corresponding proteins were estimated as 69.90, 68.6, 63.82, and 64.44 kDa. The amino acid sequences were aligned with six TPS sequences from other plants using DNAMAN software (Fig. 3A). The alignment revealed the presence of the DDXXD motif (which is responsible for metal cofactor binding), an NSE/ DTE motif (also for metal binding), and an RXR motif in all 10 proteins. The RR motif from the N-terminal end of numerous monoterpenes contributes crucially to the isomerization of GPP into LPP (Williams et al., 1998). Notably, the N-terminal end of CiTPS1 and CiTPS2 contains RRX8W, an arginine-rich motif, whereas that of CiTPS3 and CiTPS4 contains an RPX8W motif (Fig. 3A). Comparison of the amino acid sequences revealed that the four TPSs shared over 50.25% of their sequence homology. CiTPS1 shared 51.49% of its identity (the highest overall) with CiTPS2, and CiTPS3 shared 35.87% of its identity with CiTPS4 (the lowest overall; Supplementary Table S2).

Phylogenetic tree analysis indicated that CiTPS1 and CiTPS2 belonged in the TPS-b subfamily with the monoterpene synthases from the other plants, whereas CiTPS3 and CiTPS4 were classified into the TPS-a subfamily with the sesquiterpene synthases from the other plants (Fig. 3B). The four CiTPS sequences were submitted to GenBank with the identification codes of MT624788, MT624789, MT624790, and MT624791, respectively. The TPSs in the phylogenetic analysis from other plants were shown in the Supplementary Table S3.

2.3. CiTPS1 and CiTPS2 catalyzed GPP to produce α - pinene

The protein sequence alignments and the phylogenetic tree analysis suggested that CiTPS1 and CiTPS2 were monoterpene synthases with long plastidial targeting sequences. In general, full-length versions of chloroplast-localized proteins (e.g., limonene synthase, bornyl diphosphate synthase, and isopentenyl pyrophosphate isomerase in Mentha spicata, Amomum villosum, and Artemisia annua) with transit peptides (TPs) easily form intractable inclusion bodies when expressed as 6 × His-tagged fusion proteins system (Ma et al., 2017; Wang et al., 2018; Williams et al., 1998). In our previous study, a pMAL-C5X vector and a maltose binding protein (MBP) successfully expressed a full-length version of geraniol synthase with TP in Gardenia jasminoides (Ye et al., 2019). In the present study, a pMAL-C5X system was used for the recombinant expression of all four CiTPSs (Fig. S3). After heterologous expression in Escherichia coli Rosetta (DE3) cells, the proteins were purified through MBP affinity chromatography with Dextrin Beads 6 F F resin. The weight of the purified CiTPS1 and CiTPS2, which was fused with an MBP tag, were approximately 113.5 kDa and 112.2 kDa (Fig. S3 AB). Before enzyme activity was tested, the MBP tag was removed through cleavage by Factor Xa protease. The recombinant protein was incubated with either GPP or FPP (see Materials and Methods). CiTPS1 successfully converted GPP to α- pinene (Fig. 4A); however, no enzymatic activity was observed when FPP was used as a substrate. The mass profiles of the products matched with those of the α- pinene standard and those of α- pinene produced by all five types of tissue (Fig. S4A). No product could be observed when the recombinant proteins of CiTPS2 was used for incubating with either GPP or FPP.

TPs of monoterpene synthases often reduce soluble protein expression, hinder catalytical activity, even deactivate the enzyme. The potential chloroplast TPs of CiTPS1 and CiTPS2 were analyzed using TargetP 1.1 (http://www.cbs.dtu.dk/services/TargetP-1.1/). The 34 amino acids contained at the N-terminal end of CiTPS1 were indicated to be the targeting sequence. Thereafter, removal of 34 amino acids resulted in a truncated construct of (pET-(-tp)CiTPS1) (Fig. S5). Biochemical analysis indicated that the truncated protein was more catalytically efficient than pET-CiTPS1 or pMAL-C5X-CiTPS1 (Fig. S6).

Similarly, amino acids within the coding sequence of CiTPS2 were analyzed using TargetP. The 44 amino acids of the N-terminal end were predicted as a possible targeting sequence, therefore, removal of 44 amino acids resulted in a truncated construct of (pET-(-tp)CiTPS2). Biochemical assay showed the truncated version successfully converted GPP to α- pinene (Fig. 4A).

2.4. CiTPS3 catalyzed FPP to produce three sesquiterpenoids

CiTPS3 was speculated to be a sesquiterpene synthase (Fig. 3B). Its coding sequence of CiTPS3 was subcloned into the pMAL-C5X expression vector and purified. Sodium dodecyl (lauryl) sulfate–polyacrylamide gel electrophoresis indicated that the weight of the CiTPS3 recombinant protein fused with the MBP tag was approximately 107.4 kDa (Fig. S3B). CiTPS3 catalyzed the production of three sesquiterpenoids from FPP: petasitene, β- farnesene, and α- bisabolene (Fig. 4B). Notably, petasitene was only detected in the root tissue (Fig. 1B and Supplementary Table S1). To the best of our knowledge, petasitene has only been found in roots and rhizomes of Petasites hybridus and Centaurea stoebe, which belong to the Asteraceae family (Gfeller et al., 2019; Nawade et al., 2019; Saritas et al., 2002). Although the petasitene standard was commercially unavailable, the mass of the fragment from the enzymatic reaction was comparable with that of metabolites in the root as well as relevant data from the NIST 14 database (Fig. S4B). In addition, No compound could be produced from CiTPS3 when GPP was used as the substrate.

2.5. CiTPS4 acted as a difunctional enzyme, producing four

monoterpenoids and three sesquiterpenoids

The pMAL-C5X vector of CiTPS4 (fused with an MBP tag) was constructed and expressed in E. coli. The recombinant protein had a mass of approximately 108 kDa (Fig.S3C). Because CiTPS4, a sesquiterpene synthase, was classified into the TPS-a subfamily (Fig. 3B), FPP was first used as the substrate. Biochemical analysis indicated that the recombinant CiTPS4 converted FPP into petasitene, trans-α- bergamotene, and β- farnesene, with β- farnesene constituting 54.46% of the total product (Fig. 4C). When GPP was used as a substrate, CiTPS4 generated four monoterpenes, namely α- pinene, β- myrecene, trans-β- ocimene, α- terpineol, with α- terpineol accounting for 81.12% of the total product (Fig. 4C).

2.6. Expression analysis of CiTPSs in different tissues of C. indicum

The expression patterns of CiTPSs in the five tissue types were analyzed using quantitative reverse transcription polymerase chain reaction (qRT-PCR). CiTPS1 was highly expressed in the root and flower bud tissues, CiTPS2, CiTPS3 and CiTPS4 was highly expressed in the root (Fig. 5). The levels of terpenoids produced by CiTPS1, 2, 3, and 4 in vitro were partially consistent with their gene expression patterns in the five parts of the plant (Fig. 5E).

3. Discussion(s)

3.1. Analysis of volatile terpenoids

Multiple studies have indicated that in general, monoterpenoids are more abundant than sesquiterpenoids in the germplasm of C. indicum, regardless of the geographical origin (Stoianova-Ivanova et al., 1983; Sun et al., 2015; Wu et al., 2013; Zeng et al., 2020; Zhang et al., 2010; Zhu et al., 2005). However, in the present study, high ratios of sesquiterpenoids to monoterpenoids were noted in all five types of tissues. For example, sesquiterpenoids were 4.95 times more abundant than monoterpenoids in the root (Fig. 2A). The sesquiterpenoids and monoterpenoids in the flower followed the similar trend, in line with the high sesquiterpenoids/monoterpenoids ratios in previous studies of tetraploid species (Uchio et al., 1981), suggesting that sesquiterpenes dominated tetraploid species distributed in Tokushima, Japan and the tetraploid species distributed in Guangzhou, China (this study).

The most abundant monoterpenoid and sesquiterpenoid in the flower tissue were α- pinene and β- farnesene. Considerable differences in the qualitative and quantitative composition of terpenoids were observed both in the present study and in the literature. For example, Zhu et al. (2005) reported that the most predominant monoterpenoid and sesquiterpenoid in fresh C. indicum flowers from Hubei Province, China was 1,8-cineole and germacrene D. By contrast, Zeng et al. (2020) found that piperitone and α- cyperone were the most abundant monoterpenoid and sesquiterpenoid in the essential oils extracted from dried C. indicum flowers from Yunnan Province, China. The differences in terpenoid composition could be attributed to variations between natural hybrid species and varieties of C. indicum. Other factors include differences in germplasm, geographical origin, harvesting time, and extraction method. We are currently collecting diploid cultivars of C. indicum. In future, we may isolate and functionally characterize the properties of their TPSs.

Our previous studies (e.g., Jiang et al., 2019) have indicated that substantial amounts of flavonoids and terpenoids (e.g., sesquiterpenoids) are found in not only the flower bud and flowers but also the root, stem, and leaf. In ancient China, the root of C. indicum was used to treat centipede bites, eczema, and furunculosis. This indicates the application potential of all parts of the plant to the pharmaceutical, cosmetics, and perfume industries.

3.2. Effects of TPs on enzymes

TargetP software indicated the presence of a chloroplast TP at the Nterminal end of the amino acid sequence of CiTPS1 and CiTPS2. Our previous study on C. indicum demonstrated the bioactivity of an MBP fused with the full-length coding sequence of isopentenyl pyrophosphate isomerase with TPs (unpublished data). The same result was found for geraniol synthase G. jasminoides (Ye et al., 2019). In the present study, when the coding sequences of CiTPS1 was fused with MBP tags, CiTPS1 catalyzed GPP to produce α- pinene. According to the literature, some full-length versions of monoterpene synthases fused with 6 × His-tag are also active. These include ()-β- phellandrene synthase and terpinolene synthase from Abies grandis (Bohlmann et al., 1999), ()-4 S-LPP from M. spicata (Colby et al., 1993), and γ- terpinene synthase from Thymus caespititius (Mendes et al., 2014). In some circumstance, the purified preprotein is generally kinetically impaired or functionally compromised due to the presence of the plastidial TP (Williams et al., 1998). Herein, full-length version of CiTPS1 was less active than was the truncated version, pET-(-tp)CiTPS1, in which the first 34 amino acids were absent (Fig. S5). This result was in line with those reported in a study on linalool synthase in Mentha citrate (Crowell et al., 2002). In our study, no product could be detected when full-length version of CiTPS2 was used for catalyzing GPP. It is possible that TP gave rise to inactivity, after removal of TP, the truncated one can produce a minor amount of α- pinene..

3.3. Discrepancy between gene expression levels and terpenoid

compositions in different tissues of C. indicum

The petasitene only found in the root was well associated with CiTPS3 and CiTPS4 high expression in root tissue. However, the abundance of enzymatic products from the in vitro assay was not always positively correlated with that of metabolites extracted from other plant tissues. Similarly, in a study by Matarese et al. (2014) on Vitis vinifera, the high expression of VvPNaPin 1 in the root did not correlate with the scarce amounts of α- pinene found in the same tissue. In the present study, the expression of CiTPS1 and CiTPS2 did not correspond to the accumulation pattern of α- pinene in root and other tissues. This might be ascribed to the further oxidization of α- pinene to acyclic metabolites by bacteria such as the P18.3 strain of Nocardia sp. In soil (Griffiths et al., 1987). In addition, some discrepancies were observed between the gene expression levels and the terpenoid compositions in different tissues. For example, the accumulations of four monoterpenoids were not clearly correlated with expression levels of CiTPS4 (Fig. 5C). It is possible that CiTPS4 played a major role in sesquiterpenoid biosynthesis in plants due to enzyme compartmentalization and the availability of FPP (the substrate). Moreover, the plant regulation machinery plays a key role in terpene biosynthesis as reported by Salvagnin et al. (2016), where almost no correlation was found between gene expression and sesquiterpene quantity in transgenic strains of Arabidopsis overexpressing E (β)-caryophyllene synthase from V. vinifera.