There is a newer version of the record available.

Published August 12, 2025 | Version v1.4.14
Software Open

github.com/NDeeSeee/altanalyze2_snaf/star_2pass_alignment

  • 1. Cincinnati Children's Hospital Medical Center

Description

STAR 2-Pass RNA-seq Alignment

Containerized STAR 2-pass RNA-seq alignment for modern bioinformatics workflows. This implementation provides a portable, reproducible solution that works across local, cloud, and HPC environments.

Files Overview

Core Components

  • star_alignment.sh - Main alignment script (containerized)
  • Dockerfile - Multi-stage Docker build for STAR 2.4.0h
  • star_alignment.wdl - WDL task definition for workflow systems
  • docker_build.sh - Docker build and validation script

Why Container-Only Approach?

Modern & Portable:

  • Works everywhere Docker is available
  • Multi-platform support (AMD64 and ARM64)
  • Compatible with HPC via Singularity/Shifter
  • Cloud-native for Terra, Cromwell, Nextflow workflows

Reproducible:

  • Consistent environment across all platforms
  • Version-controlled dependencies
  • No environment-specific configurations

Simplified Maintenance:

  • Single script to maintain and update
  • Standard containerization practices
  • Pre-built images available on Docker Hub

Usage

1. Build Container

# Pull from Docker Hub (recommended)
docker pull ndeeseee/star-aligner:latest

# Or build locally
./docker_build.sh

2. Run Alignment

Local Docker

docker run --rm \
  -v /path/to/data:/data \
  ndeeseee/star-aligner:latest \
  /data/input/sample.1.fastq.gz \
  /data/reference/star_index \
  /data/reference/genome.fa \
  /data/output

HPC with Singularity

# Convert Docker to Singularity
singularity build star_aligner.sif docker://ndeeseee/star-aligner:latest

# Run on HPC
singularity exec \
  --bind /scratch:/data \
  star_aligner.sif \
  star_alignment.sh \
  /data/input/sample.1.fastq.gz \
  /data/reference/star_index \
  /data/reference/genome.fa \
  /data/output

Cloud Workflows

Use star_alignment.wdl with:

  • Terra/FireCloud - Upload WDL and run workflows
  • Cromwell - Local or cloud execution
  • Nextflow - Adapt WDL to Nextflow DSL

3. Example WDL Input

{
  "StarAlignmentWorkflow.fastq_r1": "gs://bucket/sample.1.fastq.gz",
  "StarAlignmentWorkflow.fastq_r2": "gs://bucket/sample.2.fastq.gz", 
  "StarAlignmentWorkflow.star_genome_dir": "gs://bucket/star_index/",
  "StarAlignmentWorkflow.reference_genome": "gs://bucket/genome.fa",
  "StarAlignmentWorkflow.sample_name": "sample_001",
  "StarAlignmentWorkflow.cpu_cores": 16,
  "StarAlignmentWorkflow.memory_gb": 128
}

Requirements

Input Files

  • R1/R2 FASTQ files - Paired-end RNA-seq data (.fastq.gz)
  • STAR genome index - Pre-built index directory
  • Reference genome - FASTA file (.fa or .fasta)

System Requirements

  • Docker (local/cloud) or Singularity (HPC)
  • Memory: 64GB+ recommended
  • CPU: 8+ cores recommended
  • Disk: 3x input file size + index size

Output

  • {sample}.bam - Coordinate-sorted aligned reads
  • {sample}_Log.final.out - Alignment statistics and metrics

STAR 2-Pass Strategy

  1. Pass 1: Initial alignment discovers novel splice junctions
  2. Pass 2: Re-alignment using sample-specific splice junctions

This approach significantly improves alignment accuracy by incorporating discovered splice sites, particularly important for detecting novel isoforms and splice variants in RNA-seq data.

Advanced Usage

Batch Processing

# Process multiple samples
for sample in samples/*.1.fastq.gz; do
  docker run --rm \
    -v $(pwd):/data \
    ndeeseee/star-aligner:latest \
    /data/${sample} \
    /data/reference/star_index \
    /data/reference/genome.fa \
    /data/output
done

Resource Customization

The container automatically detects available CPU cores. For memory-intensive datasets, ensure adequate RAM allocation in your Docker/Singularity settings.

Files

github.com-NDeeSeee-altanalyze2_snaf-star_2pass_alignment_v1.4.14.zip

Files (3.3 kB)

Additional details