Published September 2, 2019 | Version 0.9.6
Software Open

RLeave: an in silico cross-validation protocol for transcript differential expression analysis

  • 1. Matheus
  • 2. Juliana Doblas

Description

Attention: The reference article is on the following link https://arxiv.org/abs/2012.05421.

RLeave: an in silico cross-validation protocol for transcript differential expression analysis

Although massive parallel sequencing usually provides a great number of differential expressed transcripts, the expression variability of each transcript may impair the choice of the most relevant ones to be in silico or in vitro validated. The RLeave algorithm permits a further normalization of the transcript variability expression by combining two mathematical approaches (conventional analysis – upper branch analysis plus the Leave-one-out approach – lower branch analysis). The combined strategy associated with an in silico validation procedure (Decision Tree) sheds light onto the more relevant transcripts to be validated. The script was initially created to analyze microRNA; however, it can be used for any transcript, just changing the name of type of transcript (microRNA, mRNA, and non-coding RNA) in the file script.

How to use: i) name the files regarding the raw counts (counts.rda) and sample identification (groups.txt: identify the samples according to raw counts file, define the name of the samples for the subsequent steps, and identify which group of comparison you need to perform); and ii) introduce the stringency values of the statistical parameters (Pvalue, logFoldChange and False Discovery Rate - FDR) according to your data and preferences, for instance, the user may or not consider FDR values for the selection of the relevant transcripts. If FDR is considered to be used, carefully choose the FDR value to avoid inconsistencies in the differential expression analysis. It is recommended to start with NULL value for 'fdr' to get an overview of the statistical value of sample distribution (RNA_seqFinalUn file), and only then define what your cutoff point for FDR will be chosen. At the end, one output file will be generated for each analysis and one for the global analysis.

Final RLeave results may be exported as: i) the RNA_seqFinalMax file that represents the highest RLeave indexes; ii) the RNA_seqFinalTotal file that represents the results of upperbranch sampling; iii) the RNA_seqFinalUn file that represents all annotated differentially expressed transcripts in any given analysis; iv) the PresenceTranscripts file which describes the RLeave score presented by the studied transcripts; v) the figure RLeave which demonstrates how RLeave scores are distributed; and vi) the figure Pvalue_express that demonstrates how the Pvalues are distributed and what is the regulation (up or down-regulation) of each transcript. Final decision for "best candidates" must focus on transcripts exhibiting higher RLeave scores.

Files

groups.txt

Files (650.9 kB)

Name Size Download all
md5:8b6b4f3b80e811d85fe073885b0394ae
628.0 kB Download
md5:deea4bf5de1032fb8b2621038efb8dc5
723 Bytes Preview Download
md5:d3c806ea56779b5ca8cb5f54556b3994
22.2 kB Download

Additional details

Related works

Is referenced by
Report: arXiv:2012.05421 (arXiv)