SWANS: Benchmarked Dataset Outputs
Authors/Creators
Description
Pipeline Overview
SWANS (Single-entity Workflow ANalysiS) is a comprehensive framework for pre-processing and post-annotation analysis of single-cell and single-nucleus RNA-sequencing data. The pipeline operates in two phases:
**Preliminary Analysis Phase**: Compares different clustering arrangements using multiple analysis methods, resolutions, and parameters to find optimal configuration. Results are visualized in an interactive Shiny app to assist in choosing the best analysis schema.
**Post-annotation Analysis Phase**: Once optimal configuration is chosen, performs differential gene expression (DGE) analysis based on experimental conditions, gene set enrichment analysis (GSEA), and optionally trajectory analysis.
SWANS uses Snakemake as workflow manager, Cell Ranger for alignment and quantification, Seurat (v5.1.0) for single cell analysis, and additional R packages for quality control and downstream analysis. The pipeline generates comprehensive reports with figures, interactive tables, quality control metrics, and benchmarking information.
Preprint Citation
SWANS: A highly configurable analysis pipeline for single-cell and single-nuclei RNA-sequencing data.
Katherine Beigel, Eric Wafula, Dana V Mitchell, Steven J Pastor, Michelle Gong, Robert Heuckeroth, Julio C Ricarte-Filho, Aime T. Franco, Erin R Reichenberger
bioRxiv doi: https://doi.org/10.1101/2025.05.14.654073 2025
Dataset Purpose & Overview
These benchmark outputs enable researchers to assess SWANS computational efficiency across different dataset sizes and complexities, compare analytical approaches, or use the detailed resource usage metrics as performance baselines for similar single-cell/nucleus RNA-seq workflows.
This repository contains the complete SWANS pipeline output for eight publicly available datasets used for benchmarking purposes. Each dataset was processed through both preliminary and post-annotation analysis phases, generating comprehensive reports, interactive visualizations, quality control metrics, differential gene expression results, gene set enrichment analyses, and benchmarking data.
**Reproducibility and Analysis Tracking**: To ensure complete reproducibility and facilitate future reference, all analysis parameters and configurations used to generate these results have been preserved within the output directories. Configuration files, sample lists (samples.sample_list), user-supplied gene files, and annotation files are automatically copied into the report and final_analysis folders. This approach ensures that the exact methodology used for each analysis is documented and accessible alongside the results, eliminating uncertainty about how the data were processed when revisiting results months or years later.
Datasets
### Dataset 1: AML (10X Genomics)
- **Source**: 10X Genomics
- **NCBI Project**: Not applicable
- **Organism**: Human
- **Tissue**: Bone marrow
- **Tissue State**: Single cell
- **Project Overview**: Human acute myeloid leukemia (AML) bone marrow samples including healthy controls and pre/post-transplant samples
- **Sample Count**: 6 samples
- **Total Cell Count**: 16,452 cells
### Dataset 2: PRJNA1010957 - SLE Study
- **Source**: NCBI (GSE242001)
- **NCBI Project**: PRJNA1010957
- **Organism**: Mouse
- **Tissue**: Colon
- **Tissue State**: Single nuclei, frozen
- **Project Overview**: Transcriptional profiling of peripheral blood mononuclear cells (PBMCs) from healthy individuals and patients with systemic lupus erythematosus (SLE)
- **Sample Count**: 2 samples
- **Total Cell Count**: 29,245 cells
### Dataset 3: PRJNA1095970 - COVID-19/Influenza Study
- **Source**: NCBI (GSE263228)
- **NCBI Project**: PRJNA1095970
- **Organism**: Mouse
- **Tissue**: Bone marrow
- **Tissue State**: Single cell, fresh
- **Project Overview**: Single-cell RNA-seq analysis of human lung tissues reveals distinct molecular signatures of COVID-19 and influenza virus infections
- **Sample Count**: 4 samples
- **Total Cell Count**: 41,531 cells
### Dataset 4: E-MTAB-13067 - Human Immune Cells
- **Source**: ArrayExpress
- **ArrayExpress ID**: E-MTAB-13067
- **Organism**: Human
- **Tissue**: PBMC
- **Tissue State**: Single nuclei, frozen
- **Project Overview**: Single-cell RNA-seq of human immune cells
- **Sample Count**: 2 samples
- **Total Cell Count**: 23,078 cells
### Dataset 5: E18 Mouse Brain (10X Genomics)
- **Source**: 10X Genomics
- **NCBI Project**: Not applicable
- **Organism**: Mouse
- **Tissue**: Brain
- **Tissue State**: Single cell
- **Project Overview**: Brain cells from embryonic day 18 (E18) mice
- **Sample Count**: 4 samples
- **Total Cell Count**: 36,510 cells
### Dataset 6: PRJNA1006693 - Glioblastoma Study
- **Source**: NCBI (GSE241184)
- **NCBI Project**: PRJNA1006693
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell
- **Project Overview**: Transcriptional profiling of human glioblastoma multiforme tumor samples before and after treatment
- **Sample Count**: 3 samples
- **Total Cell Count**: 28,509 cells
### Dataset 7: PRJNA1185392 - Pancreatic Cancer Study
- **Source**: NCBI (GSE281736)
- **NCBI Project**: PRJNA1185392
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell, fresh
- **Project Overview**: Single-cell RNA-seq analysis of immune cell types in the context of pancreatic cancer
- **Sample Count**: 12 samples
- **Total Cell Count**: 67,054 cells
### Dataset 8: PRJNA790856 - Macrophage Response Study
- **Source**: NCBI (GSE191288)
- **NCBI Project**: PRJNA790856
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell, fresh
- **Project Overview**: Transcriptome-wide analysis of human macrophages in response to various stimuli
- **Sample Count**: 7 samples
- **Total Cell Count**: 32,082 cells
Files
Additional details
Identifiers
- DOI
- 10.1101/2025.05.14.654073
- Other
- https://github.com/FrancoResearchLab/SWANS
Related works
- Is referenced by
- Other: 10.1101/2025.05.14.654073 (DOI)
Funding
- United States Department of Defense
- W81XWH2210655
- Lustgarten Foundation
- The Suzi and Scott Lustgarten Endowment
Dates
- Updated
-
2025-04/2025-07Datasets were processed through the SWANS pipeline from April-July 2025.
Software
- Repository URL
- https://github.com/FrancoResearchLab/SWANS
- Programming language
- Python , R , Snakemake
- Development Status
- Active