Dataset Open Access

The genetic basis of cytoplasmic male sterility and fertility restoration in wheat

Small, Ian; Melonek, Joanna; Duarte, Jorge; Martin, Jerome; Murigneux, Alain; Varenne, Pierrick; Comadran, Jordi; Specel, Sebastien; Levadoux, Sylvain; Bernath-Levin, Kalia; Beuf, Laurent; Torney, François; Pichon, Jean-Philippe; Perez, Pascual

Hybrid wheat varieties give higher yields than conventional lines but are difficult to produce due to a lack of effective control of male fertility in breeding lines. One promising system involves the Rf1 and Rf3 genes that restore fertility of wheat plants carrying Triticum timopheevii-type cytoplasmic male sterility (T-CMS). By genetic mapping and comparative sequence analyses we identified Rf1 and Rf3 candidates that could restore normal pollen production in transgenic wheat plants carrying T-CMS. We show that Rf1 and Rf3 bind to the mitochondrial orf279 transcript and induce cleavage, preventing expression of the CMS trait. The identification of restorer genes in wheat is an important step towards the development of hybrid wheat varieties based on a CMS-Rf system. The characterisation of their mode of action brings new insights into the molecular basis of CMS and fertility restoration in plants.

This dataset includes transcript count and coverage data from 2 RNA-seq experiments looking at gene expression in various male-sterile or male-fertile wheat lines examined in the course of this research.

For dataset 1, the files included here are:

  • experimental_design.xlsx — lists the samples and genotypes
  • references — folder of fasta files containing reference transcripts for the respective genotypes (input to Salmon)
  • quants — folder of quant.sf files containing nuclear/cytosolic transcript counts (output from Salmon)
  • mt_quants — folder of quant.sf files containing mitochondrial transcript counts (output from Salmon)
  • rnaseq.ipynb — Jupyter notebook (Python code) to reproduce Fig. 2b and Fig. S2 from the paper using the quants files (requires Python packages pandas, numpy, matplotlib, seaborn, sklearn and diffexpr (
  • mt.ipynb — Jupyter notebook (Python code) to reproduce Figs. 2c and Fig. 2d from the paper using the mt_quants files

For dataset 2, the files included here are:

  • references — folder containing a fasta file containing reference transcripts (input to Salmon)
  • RNASeq_quants.xlsx — table of read counts extracted from Salmon output 
  • mt_cov — folder of strand-specific read coverage files (generated by genomeCoverageBed from the bedtools2 package)
  • Transgene_TPM.ipynb — Jupyter notebook (Python code) to reproduce Fig. 3c from the paper using the quants files
  • mt_coverage.ipynb — Jupyter notebook (Python code) to reproduce Figures 5 and S5 from the paper using the mt_cov files

The source data underlying Figs 2a, 3b-f, 4c-e, 6c, 7b and Supplementary Figs S3b, S4b-e, S6a and S7b-d are provided as a Source Data zip file.

Funding provided by: Australian Research Council
Crossref Funder Registry ID:
Award Number: CE140100008

Files (604.5 MB)
Name Size
386.0 MB Download
100.1 MB Download
118.4 MB Download
Views 26
Downloads 12
Data volume 3.5 GB
Unique views 26
Unique downloads 10


Cite as