Published March 31, 2026 | Version v1
Journal article Open

Transcriptome characterisation, SSR marker development and genetic diversity analysis of the endangered species Camellia cucphuongensis Ninh & Rosmann using Illumina sequencing

  • 1. Jiangsu Vocational Institute of Architectural Technology, School of Ecological Engineering, Xuzhou 221000, Jiangsu, China
  • 2. The Olympia schools, To Huu Street, Hanoi, Vietnam
  • 3. Institute of Biology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
  • 4. Faculty of Forest Resources and Environmental Management, Vietnam National University of Forestry, Xuan Mai, Hanoi, Vietnam
  • 5. Department of Horticulture, Gomal University, Dera Ismail Khan, Pakistan
  • 6. Faculty of Biotechnology, Vietnam National University of Agriculture, Ngo Xuan Quang, Gia Lam, Hanoi, Vietnam, Hanoi, Vietnam
  • 7. College of Forestry Biotechnology, Vietnam National University of Forestry, Xuan Mai, Hanoi, Vietnam, Hanoi, Vietnam
  • 8. Joint Vietnam - Russia Tropical Science and Technology Research Center, 63 Nguyen Van Huyen, Nghia Do, Hanoi, Vietnam

Description

Overharvesting for ornamental and medicinal purposes, combined with ongoing habitat loss and fragmentation in Vietnam, has severely threatened wild populations of Camellia cucphuongensis. Effective conservation and management of this species, therefore, require robust genomic resources and informative molecular markers to quantify genetic diversity and population structure. In this study, we generated the first transcriptome dataset for C. cucphuongensis and developed expressed sequence tag–simple sequence repeat (EST-SSR) markers using Illumina HiSeq™ 4000 sequencing. A total of 13,600,954 clean reads were obtained (Q20 = 97.55%, Q30 = 93.11%, GC = 44.08%). De novo assembly produced 118,552 unigenes with a mean length of 541.2 bp and an N50 of 683 bp. Functional annotation revealed that 52,107 and 25,640 unigenes had significant matches in the Nr and Swiss-Prot databases, respectively. Additionally, 28,007 unigenes were assigned to Gene Ontology terms, 27,968 to KOG categories and 11,959 to 117 KEGG pathways. Mining for simple sequence repeats identified 9,661 EST-SSR loci. From 60 screened primer pairs, 11 polymorphic EST-SSR markers were validated and applied to 60 individuals from three natural populations. Genetic diversity was moderate (NE = 2.17; PIC = 0.548; HO = 0.46; HE = 0.50), with most variation occurring within individuals (79%) and 11% amongst populations (FST = 0.113; Nm = 1.96). Principal coordinate analysis (PCoA), discriminant analysis of principal components (DAPC), STRUCTURE and neighbour-joining (NJ) analyses all indicated detectable population structuring, with population CP showing clearer differentiation relative to LH and TL. Collectively, these transcriptomic resources and EST-SSR markers provide practical tools for genetic monitoring and can support conservation strategies that emphasise habitat protection and maintenance of connectivity to mitigate genetic erosion in this endangered golden camellia.

Files

BDJ_article_186683.pdf

Files (887.0 kB)

Name Size Download all
md5:992a0a945effbf0a312268c25d847bd6
669.6 kB Preview Download
md5:df6d5c8ac73019ce8f411afadf2632f4
217.4 kB Preview Download

Linked records

Additional details

References

  • Ahmad A, Wang J, Pan Y, Sharif R, Gao S (2018) Development and use of Simple Sequence Repeats (SSRs) markers for sugarcane breeding and genetic studies. Agronomy 8 (11): 260. https://doi.org/10.3390/agronomy8110260
  • Apweiler R (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32: D115‑9. https://doi.org/10.1093/nar/gkh131
  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The gene ontology onsortium. Nature genetics 25 (1): 25‑9. https://doi.org/10.1038/75556
  • Beech E, Barstow M, Rivers M (2017) The Red List of Theaceae. Botanic Gardens Conservation International Descanso House, 199 Kew Road, Richmond, Surrey, TW9 3BW, UK, 48 pp.
  • Beier S, Thiel T, Münch T, Scholz U, Mascher M (2017) MISA-web: a web server for microsatellite prediction. Bioinformatics 33 (16): 2583‑2585. https://doi.org/10.1093/bioinformatics/btx198
  • Bosse M, van Loon S (2022) Challenges in quantifying genome erosion for conservation. Frontiers in Genetics 13: 960958. https://doi.org/10.3389/fgene.2022.960958
  • Bui X, Vu D (2024) Population genetics analysis of Diospyros mun A.Chev. ex Lecomte (Ebenaceae) based on EST-SSR markers derived from a novel transcriptome. Biodiversity Data Journal 12: e130385. https://doi.org/10.3897/bdj.12.e130385
  • Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371 (6494): 215‑220. https://doi.org/10.1038/371215a0
  • Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21 (18): 3674‑3676. https://doi.org/10.1093/bioinformatics/bti610
  • Davey J, Hohenlohe P, Etter P, Boone J, Catchen J, Blaxter M (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12 (7): 499‑510. https://doi.org/10.1038/nrg3012
  • Deng YY, Li JQ, Wu SF, Zhu YP, Cai YW, He FC (2006) Integrated NR database in protein annotation system and its localization. Computer Engineering 32: 71‑72.
  • Earl D, vonHoldt B (2011) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources 4 (2): 359‑361. https://doi.org/10.1007/s12686-011-9548-7
  • Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology 14 (8): 2611‑2620. https://doi.org/10.1111/j.1365-294x.2005.02553.x
  • Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evolutionary Bioinformatics 1: 47‑50. https://doi.org/10.1177/117693430500100003
  • Fan P, Zhu G, Wambulwa M, Milne R, Wu Z, Luo Y, Shahi Shavvon R, Jump A, Maity D, Gao L, Qi H, Wu H, Kong X, Khan R, Yan L, Turuspekov Y, Li D, Liu J (2025) Genetic origins and climate‐induced erosion in economically important Asian walnuts. Conservation Biology 40 (1): e70125. https://doi.org/10.1111/cobi.70125
  • Finn R, Mistry J, Tate J, Coggill P, Heger A, Pollington J, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy S, Bateman A (2009) The Pfam protein families database. Nucleic Acids Research 38 https://doi.org/10.1093/nar/gkp985
  • Gao J, Parks CR, Du Y, (2005) Collected species of the genus Camellia: An illustrated outline. Zhejiang Science and Technology Press, Hangzhou [In Chinese].
  • Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29 (7): 644‑652. https://doi.org/10.1038/nbt.1883
  • Ha BH, Pham MP, Nguyen QK, Bui TTX, Vu G, Shah SNM, Vu DD (2025) Transcriptome characterisation and population genetics of Cunninghamia konishii Hayata – An endangered gymnosperm and implication for its conservation in Vietnam. Biodiversity Data Journal 13: e153663. https://doi.org/10.3897/bdj.13.e153663
  • Hamrick JL, Godt MJ (1990) Plant population genetics, breeding, and genetic resources. In: Brown AHD, Clegg MT, Kahler AL, Weir BS (Eds) Allozyme diversity in plant species. Sinauer Associates Inc.
  • Hedrick P (2005) A standardized genetic differentiation measure. Evolution 59 (8): 1633‑1638. https://doi.org/10.1554/05-076.1
  • Jakobsson M, Rosenberg N (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23 (14): 1801‑1806. https://doi.org/10.1093/bioinformatics/btm233
  • Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics 11 (1): 94. https://doi.org/10.1186/1471-2156-11-94
  • Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the computer program Cervus accommodates genotyping error increases success in paternity assignment. Molecular Ecology 16 (5): 1099‑1106. https://doi.org/10.1111/j.1365-294x.2007.03089.x
  • Kanehisa M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Research 32: D277‑80. https://doi.org/10.1093/nar/gkh063
  • Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biology 5 (2). https://doi.org/10.1186/gb-2004-5-2-r7
  • Kõressaar T, Lepamets M, Kaplinski L, Raime K, Andreson R, Remm M (2018) Primer3_masker: integrating masking of template sequence with primer design software. Bioinformatics 34 (11): 1937‑1938. https://doi.org/10.1093/bioinformatics/bty036
  • Kumar R, Das SP, Choudhury BU, Kumar A, Prakash NR, Verma R, Chakraborti M, Devi AG, Bhattacharjee B, Das R, Das B, Devi HL, Das B, Rawat S, Mishra VK (2024) Advances in genomic tools for plant breeding: harnessing DNA molecular markers, genomic selection, and genome editing. Biological Research 57 (1). https://doi.org/10.1186/s40659-024-00562-6
  • Li Q, Su X, Ma H, Du K, Yang M, Chen B, Fu S, Fu T, Xiang C, Zhao Q, Xu L (2021) Development of genic SSR marker resources from RNA-seq data in Camellia japonica and their application in the genus Camellia. Scientific Reports 11: 9919. https://doi.org/10.1038/s41598-021-89350-w
  • Li SG, Chen SY, Fu YP, Yin AP, Sima YK, Qi RP, Zhou Y, Wang SC (2024) Genetic diversity and genetic structure of Kadsura coccinea germplasm resources revealed by SSR markers. Journal of Central South University of Forestry & Technology 44: 156‑165. [In Chinese].
  • Liu J, Möller M, Provan J, Gao L, Poudel RC, Li D (2013) Geological and ecological factors drive cryptic speciation of yews in a biodiversity hotspot. New Phytologist 199 (4): 1093‑1108. https://doi.org/10.1111/nph.12336
  • Li X, Wang J, Fan Z, Li J, Yin H (2019) Genetic diversity in the endangered Camellia nitidissima assessed using transcriptome-based SSR markers. Trees 34 (2): 543‑552. https://doi.org/10.1007/s00468-019-01935-1
  • Mao J, Huang D, Wang K, Peng H, Yao X, Mao Y, Jiao L, Wang H, Long Y, Tan R, Tsering O, Wang W, Tsering W, Chen L, Chen X, Leng Y (2025) Genetic diversity and population structure of tea (Camellia sinensis) germplasm from the Xizang Plateau. Horticulturae 12 (1): 50. https://doi.org/10.3390/horticulturae12010050
  • Martínez-Campos C, Lanz-Mendoza H, Cime-Castillo J, Peralta-Zaragoza Ó, Madrid-Marina V (2025) RNA through time: From the origin of life to therapeutic frontiers in transcriptomics and epitranscriptional medicine. International Journal of Molecular Sciences 26 (11): 4964. https://doi.org/10.3390/ijms26114964
  • Ming TL, Bartholomew B (2007) Theaceae. In: Wu ZY, Raven PH, Hong DY (Eds.), Flora of China, Vol. 12 (Hippocastanaceae through Theaceae). Science Press, Beijing & Missouri Botanical Garden Press, St. Louis, 366-478 pp.
  • Mueller U, Wolfenbarger LL (1999) AFLP genotyping and fingerprinting. Trends in Ecology & Evolution 14 (10): 389‑394. https://doi.org/10.1016/s0169-5347(99)01659-6
  • Nguyen DH, Luong VD, Le TT, Tran QT, Do ND, Ly NS (2020) Camellia puhoatensis (Sect. Archecamellia – Theaceae), a new species from Vietnam. PhytoKeys 153: 1‑11. https://doi.org/10.3897/phytokeys.153.49388
  • Orel G, Nguyen MC, Rivers MC (2018) Camellia cucphuongensis. IUCN Red List of Threatened Species https://doi.org/10.2305/iucn.uk.2018-1.rlts.t191422a1979357.en
  • Parmar R, Seth R, Sharma RK (2022) Genome-wide identification and characterization of functionally relevant microsatellite markers from transcription factor genes of tea (Camellia sinensis (L.) O. Kuntze). Scientific Reports 12 (1): 201. https://doi.org/10.1038/s41598-021-03848-x
  • Peakall R, Smouse P (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update. Bioinformatics 28 (19): 2537‑2539. https://doi.org/10.1093/bioinformatics/bts460
  • Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J (2003) TIGR Gene Indices clustering tools (TGICL): a softwaresystem for fast clustering of large EST datasets. Bioinformatics 19 (5): 651‑652. https://doi.org/10.1093/bioinformatics/btg034
  • Pham M, Vu DD, Bei C, Bui TTX, Vu DG, Shah SNM (2024) Characterisation of the Cinnamomum parthenoxylon (Jack) Meisn (Lauraceae) transcriptome using Illumina paired-end sequencing and EST-SSR markers development for population genetics. Biodiversity Data Journal 12: e123405. https://doi.org/10.3897/bdj.12.e123405
  • Phan H, Thi-Lan N, Le O, Nguyen C, Trinh C, Ha Y, Shah SNM, Vu D (2026) Transcriptome characterization and population genetics of Ludisia discolor (Ker Gawl.) A.Rich (Orchidaceae): implication for its conservation in Vietnam. Biodiversity Data Journal 14: e173579. https://doi.org/10.3897/bdj.14.e173579
  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155 (2): 945‑959. https://doi.org/10.1093/genetics/155.2.945
  • Rosenberg N (2003) distruct: a program for the graphical display of population structure. Molecular Ecology Notes 4 (1): 137‑138. https://doi.org/10.1046/j.1471-8286.2003.00566.x
  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406‑425. https://doi.org/10.1093/oxfordjournals.molbev.a040454
  • Tamura K, Stecher G, Kumar S (2021) MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Molecular Biology and Evolution 38 (7): 3022‑3027. https://doi.org/10.1093/molbev/msab120
  • Tan L, Wang L, Wei K, Zhang C, Wu L, Qi G, Cheng H, Zhang Q, Cui Q, Liang J (2013) Floral transcriptome sequencing for SSR marker development and linkage map construction in the Tea plant (Camellia sinensis). PLOS ONE 8 (11): e81611. https://doi.org/10.1371/journal.pone.0081611
  • Tautz D, Renz M (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Research 12 (10): 4127‑4138. https://doi.org/10.1093/nar/12.10.4127
  • Tian Q, Huang B, Huang J, Wang B, Dong L, Yin X, Gong C, Wen Q (2022) Microsatellite analysis and polymorphic marker development based on the full-length transcriptome of Camellia chekiangoleosa. Scientific Reports 12 (1): 18906. https://doi.org/10.1038/s41598-022-23333-3
  • Tong Y, Gao L (2020) Development and characterization of EST- SSR markers for Camellia reticulata. Applications in Plant Sciences 8 (5). https://doi.org/10.1002/aps3.11348
  • Tran N (1998) Camellia cucphuongensis: a new species of yellow Camellia from Vietnam. International Camellia Journal 30: 71‑72.
  • van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) micro‐checker: software for identifying and correcting genotyping errors in microsatellite data. Molecular Ecology Notes 4 (3): 535‑538. https://doi.org/10.1111/j.1471-8286.2004.00684.x
  • Vieira MLC, Santini L, Diniz AL, Munhoz CdF (2016) Microsatellite markers: what they mean and why they are so useful. Genetics and Molecular Biology 39 (3): 312‑328. https://doi.org/10.1590/1678-4685-gmb-2016-0027
  • Vu DD, Shah SNM, Pham MP, Bui VT, Nguyen MT, Nguyen TPT (2020) De novo assembly and Transcriptome characterization of an endemic species of Vietnam, Panax vietnamensis Ha et Grushv., including the development of EST-SSR markers for population genetics. BMC Plant Biology 20 (1): 358. https://doi.org/10.1186/s12870-020-02571-5
  • Wang F, Cheng X, Cheng S, Li W, Huang X (2023) Genetic diversity of the wild ancient tea tree (Camellia taliensis) populations at different altitudes in Qianjiazhai. PLOS One 18 (4): e0283189. https://doi.org/10.1371/journal.pone.0283189
  • Wang R, Comptom SG, Chen XY (2011) Fragmentation can increase spatial genetic structure without decreasing pollen-mediated gene flow in a wind-pollinated tree. Molecular Ecology 20 (21): 4421‑4432. https://doi.org/10.1111/j.1365-294x.2011.05293.x
  • Wariss HM, Liu T, Zhang H, Wu J, Yang Z, Li W (2025) Genetic diversity and population structure of the endangered medicinal plant Ferula sinkiangensis. Global Ecology and Conservation 58: e03437. https://doi.org/10.1016/j.gecco.2025.e03437
  • Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38 (6): 1358‑1370. https://doi.org/10.2307/2408641
  • Xia E, Jiang J, Huang H, Zhang L, Zhang H, Gao L (2014) Transcriptome analysis of the oil-rich tea plant, Camellia oleifera, reveals candidate genes related to lipid metabolism. PLOS One 9 (8). https://doi.org/10.1371/journal.pone.0104150
  • Xie Y, Su M, Gao H, Yan G, Li S, Chen J, Bai Y, Deng J (2025) SSR marker-based genetic diversity and structure analyses of Camellia nitidissima var. phaeopubisperma from different populations. PeerJ 13 https://doi.org/10.7717/peerj.18845
  • Xin T, Huang W, De Riek J, Zhang S, Ahmed S, Van Huylenbroeck J, Long C (2017) Genetic diversity, population structure, and traditional culture of (Camellia reticulata). Ecology and Evolution 7 (21): 8915‑8926. https://doi.org/10.1002/ece3.3340
  • Yang G, Ma T, Wang Z (2025) Genetic diversity and population structure of Sphaeropteris brunoniana based on SSR molecular markers. Global Ecology and Conservation 62: e03720. https://doi.org/10.1016/j.gecco.2025.e03720
  • Ye Z, Wu Y, Ul Haq Muhammad Z, Yan W, Yu J, Zhang J, Yao G, Hu X (2020) Complementary transcriptome and proteome profiling in the mature seeds of Camellia oleifera from Hainan Island. PLOS One 15 (2): e0226888. https://doi.org/10.1371/journal.pone.0226888
  • Zhang H, Xia E, Huang H, Jiang J, Liu B, Gao L (2015) De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response. BMC Genomics 16 (1): 298. https://doi.org/10.1186/s12864-015-1494-4
  • Zhao D, Parnell JN, Hodkinson T (2017) Typification of names in the genus Camellia (Theaceae). Phytotaxa 292 (2): 171‑179. https://doi.org/10.11646/phytotaxa.292.2.4