Published May 31, 2021 | Version v1
Dataset Open

Data from: Highly efficient and comprehensive identification of ethyl methanesulfonate-induced mutations in Nicotiana tabacum L. by whole-genome and whole-exome sequencing

  • 1. Leaf Tobacco Research Center, Japan Tobacco Inc.
  • 2. RIKEN Nishina Center for Accelerator-Based Science

Description

Tobacco (Nicotiana tabacum L.) is a complex allotetraploid species with a large 4.5-Gb genome that carries duplicated gene copies. In this study, we describe the development of a whole-exome sequencing (WES) procedure in tobacco and its application to characterize a test population of ethyl methanesulfonate (EMS)-induced mutations. A probe set covering 50.3-Mb protein coding regions was designed from a reference tobacco genome. The EMS-induced mutations in 19 individual M2 lines were analyzed using our mutation analysis pipeline optimized to minimize false positives/negatives. In the target regions, the on-target rate of WES was approximately 75%, and 61,146 mutations were detected in the 19 M2 lines. Most of the mutations (98.8%) were single nucleotide variants, and 95.6% of them were C/G to T/A transitions. The number of mutations detected in the target coding sequences by WES was 93.5% of the mutations detected by whole-genome sequencing (WGS). The amount of sequencing data necessary for efficient mutation detection was significantly lower in WES (11.2 Gb), which is only 6.2% of the required amount in WGS (180 Gb). Thus, WES was almost comparable to WGS in performance but is more cost effective. Therefore, the developed target exome sequencing, which could become a fundamental tool in high-throughput mutation identification, renders the genome-wide analysis of tobacco highly efficient.

Files

Files (850.4 MB)

Name Size Download all
md5:d20b657423d6f09a0323f678cd08f977
836.5 MB Download
md5:c897d8286a4f5245ccb96f6eed276d52
11.3 MB Download
md5:6e1f6e2114f06bc615bf54ad41183bf5
2.6 MB Download