Perl-for-Bioinformatics: lncRNApipe: A pipeline to identify putative novel lncRNAs from deep sequencing data
Description
lncRNApipe is a pipeline to extract putative novel lncRNAs ab initio, given a list of transcripts in GTF format assembled from deep sequencing data (ex: RNA-Seq) and annotation data.
This pipeline script will bind together the functionality of the tools / scripts: cuffcompare, categorize_ncRNAs.pl, get_unique_features.pl, fetch_seq_from_ucsc.pl, RNAfold, Infernal and Coding Potential Calculator (CPC.sh). Transcriptome construction tools such as Cufflinks produces a set of assembled transcripts in GTF format. lncRNApipe uses this data in addition to known gene annotation to extract putative lncRNAs constructed by the ab initio assemblers. The pipeline relies on the FPKM / RPKM values generated by these assemblers to assess the confidence of the constructed de novo transcripts and validates it against the known reference gene and non coding RNA information to identify putative novel lncRNAs.
The quality of predicted novel lncRNAs highly depends upon the most up-to-date known gene and / or ncRNA annotation file(s) supplied to the pipeline.
IO::RoutineThe scripts use custom IO::Routine Perl Module.
If you are installing lncRNApipe Pipeline, IO::Routine module is automatically installed.
Requires Bio::SeqIO module be installed and available.
Head on to NGS-Utils directory for script list.
-
Install lncRNApipe and all its dependencies (Mac and Linux):
cd /to/your/preferred/install/path curl -O https://raw.githubusercontent.com/biocoder/Perl-for-Bioinformatics/master/NGS-Utils/lncRNApipe perl lncRNApipe -setup -
Documentation:
perl lncRNApipe -hor
perldoc lncRNApipeor to get help documentation for individual modules, do:
perl lncRNApipe -h cuff perl lncRNApipe -h cat perl lncRNApipe -h get perl lncRNApipe -h fetch perl lncRNApipe -h cpc perl lncRNApipe -h rna perl lncRNApipe -h inf -
Known issues:
- If pipeline setup fails due to XML::Parser module, you need to install XML parser C libraries.
-
On Ubuntu / Debian based Linux distributions, as root user, do:
apt-get install libexpat1 libexpat1-dev -
On RedHat / Fedora / CentOS based Linux distributions, as root user do:
yum install expat expat-devel
- RNAfold: RNAfold is slow and does not work for sequences over 10000bp in length. I am working on including an alternative secondary structure prediction program instead of RNAfold. Meanwhile you may skip running RNAfold module by not issuing the --rnafold option with lncRNApipe.
-
Caveats:
- The pipeline script uses a lot of inherent Linux core utils and has been only tested in BASH shell.
- Please use absolute full PATH names. Instead of using lncRNApipe -run ./lncRNApipe_output ..., use lncRNApipe -run /data/lncRNApipe_output ...
Konganti, Kranti (2015). lncRNApipe: A pipeline to identify putative novel lncRNAs from deep sequencing data. https://github.com/biocoder/Perl-for-Bioinformatics/releases
Cheers,
BioCoder
Files
Perl-for-Bioinformatics-v0.7.0.zip
Files
(91.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0d8b26a4ec6d62e6155dda8c383f5865
|
91.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/biocoder/Perl-for-Bioinformatics/tree/v0.7.0 (URL)