RLeave: an in silico cross-validation protocol for transcript differential expression analysis
Description
Attention: The reference article is on the following link https://arxiv.org/abs/2012.05421.
RLeave: an in silico cross-validation protocol for transcript differential expression analysis
Although massive parallel sequencing usually provides a great number of differential expressed transcripts, the expression variability of each transcript may impair the choice of the most relevant ones to be in silico or in vitro validated. The RLeave algorithm permits a further normalization of the transcript variability expression by combining two mathematical approaches (conventional analysis – upper branch analysis plus the Leave-one-out approach – lower branch analysis). The combined strategy associated with an in silico validation procedure (Decision Tree) sheds light onto the more relevant transcripts to be validated. The script was initially created to analyze microRNA; however, it can be used for any transcript, just changing the name of type of transcript (microRNA, mRNA, and non-coding RNA) in the file script.
How to use: i) name the files regarding the raw counts (counts.rda) and sample identification (groups.txt: identify the samples according to raw counts file, define the name of the samples for the subsequent steps, and identify which group of comparison you need to perform); and ii) introduce the stringency values of the statistical parameters (Pvalue, logFoldChange and False Discovery Rate - FDR) according to your data and preferences, for instance, the user may or not consider FDR values for the selection of the relevant transcripts. If FDR is considered to be used, carefully choose the FDR value to avoid inconsistencies in the differential expression analysis. It is recommended to start with NULL value for 'fdr' to get an overview of the statistical value of sample distribution (RNA_seqFinalUn file), and only then define what your cutoff point for FDR will be chosen. At the end, one output file will be generated for each analysis and one for the global analysis.
Final RLeave results may be exported as: i) the RNA_seqFinalMax file that represents the highest RLeave indexes; ii) the RNA_seqFinalTotal file that represents the results of upperbranch sampling; iii) the RNA_seqFinalUn file that represents all annotated differentially expressed transcripts in any given analysis; iv) the PresenceTranscripts file which describes the RLeave score presented by the studied transcripts; v) the figure RLeave which demonstrates how RLeave scores are distributed; and vi) the figure Pvalue_express that demonstrates how the Pvalues are distributed and what is the regulation (up or down-regulation) of each transcript. Final decision for "best candidates" must focus on transcripts exhibiting higher RLeave scores.
Files
groups.txt
Files
(650.9 kB)
Name | Size | Download all |
---|---|---|
md5:8b6b4f3b80e811d85fe073885b0394ae
|
628.0 kB | Download |
md5:deea4bf5de1032fb8b2621038efb8dc5
|
723 Bytes | Preview Download |
md5:d3c806ea56779b5ca8cb5f54556b3994
|
22.2 kB | Download |
Additional details
Related works
- Is referenced by
- Report: arXiv:2012.05421 (arXiv)