Published August 5, 2019 | Version 1.0
Dataset Open

BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq

  • 1. Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
  • 2. Biomathematics and Statistics Scotland, Aberdeen, AB25 2ZD, UK
  • 3. Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
  • 4. Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee DD2 5DA, UK
  • 5. Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee DD2 5DA, UK.Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK
  • 6. Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee DD2 5DA, UK.MRC Protein Phosphorylation and Ubiquitylation Unit, Sir James Black Centre, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK.
  • 7. Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK.Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee DD2 5DA, UK.

Description

Background
Time consuming computational assembly and quantification of gene expression and splicing analysis from RNA-seq data vary considerably. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

Results
A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al., 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al., 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5’ and 3’ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2,791 differentially alternatively spliced genes and 2,768 transcripts with differential transcript usage.

Conclusion
A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

Notes

The zip file contains two pairs of FASTA + GFF files -- one for the padded and one for the unpadded version of the RTD. For details on padding please refer to the preprint publication at https://www.biorxiv.org/content/10.1101/638106v3. The FASTA files contain the transcript sequences. The GFF files contain exon coordinates of the transcripts on the 2017 Morex barley reference sequence (https://www.nature.com/articles/nature22043).

Files

BaRT_v1_0.zip

Files (197.5 MB)

Name Size Download all
md5:cde197c4877e02c37f7bb8329460dd6d
197.5 MB Preview Download

Additional details

References