myattlin/de-novo-assembly: de-novo-assembly of transcripts from RNA-Seq data
Description
Automated de novo assembly of selected genes using BBMap and Trinity. We automated the process with python scripts, which can be executed in Windows Subsystem for Linux from a shell script file, which can be supplied with multiple SRA IDs for high-throughput assembly. Instructions for how to run these scripts are included in "run_get_coverage.sh". Summary of the process is as follows. The SRA file is downloaded with fastq-dump program available from SRA-Toolkit (https://hpc.nih.gov/apps/sratoolkit.html). The reads aligned to sequence of interest are selected with BBMap program (by Bushnell B. https://sourceforge.net/projects/bbmap/) in 'vslow' and 'local' modes and "maxindel" set to 100. Next, the paired reads in the fastq file exported by BBMap are separated into two fastq files with bbsplitpairs scripts from BBMap, which are then assembled de novo by Trinity (https://github.com/trinityrnaseq/trinityrnaseq) three separate times: (1) --KMER_SIZE 32, (2) stringent setting, which includes "--min_kmer_cov 4 –min_glue 4 –min_iso_ratio 0.2 –glue_factor 0.2 –jaccard_clip", (3) both –KMER_SIZE 32 and stringent setting. If there are more than 10,000 reads in each fastq file, the first 5000 reads extracted with seqtk program (https://github.com/lh3/seqtk) are assembled in two more Trinity runs with –KMER_SIZE 32 with or without the stringent setting. The read coverages of starting bases are then obtained for assemblies that covered at least 90% of the reference sequences with alignment scores greater than 350 using BBMap under "perfectmode" and "startcov=t".
Files
myattlin/de-novo-assembly-v5.3.zip
Files
(11.2 kB)
Name | Size | Download all |
---|---|---|
md5:1893794d01c91617a7938a8d9ce7b485
|
11.2 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/myattlin/de-novo-assembly/tree/v5.3 (URL)