Published May 22, 2026 | Version 2
Other Restricted

SWANS: Benchmarked Dataset Outputs

  • 1. ROR icon Children's Hospital of Philadelphia

Contributors

Project leader:

Project member:

  • 1. ROR icon Children's Hospital of Philadelphia

Description

Pipeline Overview

SWANS (Single-entity Workflow ANalysiS) is a comprehensive framework for pre-processing and post-annotation analysis of single-cell and single-nucleus RNA-sequencing data. The pipeline     operates in two phases:
 
**Preliminary Analysis Phase**: Compares different clustering arrangements using multiple analysis methods, resolutions, and parameters to find optimal configuration. Results are visualized in an interactive Shiny app to assist in choosing the best analysis schema.

**Post-annotation Analysis Phase**: Once optimal configuration is chosen, performs differential gene expression (DGE) analysis based on experimental conditions, gene set enrichment analysis (GSEA), and optionally trajectory analysis.

SWANS uses Snakemake as workflow manager, Cell Ranger for alignment and quantification, Seurat (v5.1.0) for single cell analysis, and additional R packages for quality control and downstream analysis. The pipeline generates comprehensive reports with figures, interactive tables, quality control metrics, and benchmarking information.

Preprint Citation

SWANS: A highly configurable analysis pipeline for single-cell and single-nuclei RNA-sequencing data.
Katherine Beigel, Eric Wafula, Dana V Mitchell, Steven J Pastor, Michelle Gong, Robert Heuckeroth, Julio C Ricarte-Filho, Aime T. Franco, Erin R Reichenberger
bioRxiv doi: https://doi.org/10.1101/2025.05.14.654073 2025

Dataset Purpose & Overview

These benchmark outputs enable researchers to assess SWANS computational efficiency across different dataset sizes and complexities, compare analytical approaches, or use the detailed resource usage metrics as performance baselines for similar single-cell/nucleus RNA-seq workflows.

This repository contains the complete SWANS pipeline output for eight publicly available datasets used for benchmarking purposes. Each dataset was processed through both preliminary and post-annotation analysis phases, generating comprehensive reports, interactive visualizations, quality control metrics, differential gene expression results, gene set enrichment analyses, and benchmarking data.

**Reproducibility and Analysis Tracking**: To ensure complete reproducibility and facilitate future reference, all analysis parameters and configurations used to generate these results have been preserved within the output directories. Configuration files, sample lists (samples.sample_list), user-supplied gene files, and annotation files are automatically copied into the report and final_analysis folders. This approach ensures that the exact methodology used for each analysis is documented and accessible alongside the results, eliminating uncertainty about how the data were processed when revisiting results months or years later.

Datasets

### Dataset 1: AML (10X Genomics)
- **Source**: 10X Genomics
- **NCBI Project**: Not applicable
- **Organism**: Human
- **Tissue**: Bone marrow
- **Tissue State**: Single cell
- **Project Overview**: Human acute myeloid leukemia (AML) bone marrow samples including healthy controls and pre/post-transplant samples
- **Sample Count**: 6 samples
- **Total Cell Count**: 16,452 cells

### Dataset 2: PRJNA1010957 - SLE Study
- **Source**: NCBI (GSE242001)
- **NCBI Project**: PRJNA1010957
- **Organism**: Mouse
- **Tissue**: Colon
- **Tissue State**: Single nuclei, frozen
- **Project Overview**: Transcriptional profiling of peripheral blood mononuclear cells (PBMCs) from healthy individuals and patients with systemic lupus erythematosus (SLE)
- **Sample Count**: 2 samples
- **Total Cell Count**: 29,245 cells

### Dataset 3: PRJNA1095970 - COVID-19/Influenza Study
- **Source**: NCBI (GSE263228)
- **NCBI Project**: PRJNA1095970
- **Organism**: Mouse
- **Tissue**: Bone marrow
- **Tissue State**: Single cell, fresh
- **Project Overview**: Single-cell RNA-seq analysis of human lung tissues reveals distinct molecular signatures of COVID-19 and influenza virus infections
- **Sample Count**: 4 samples
- **Total Cell Count**: 41,531 cells

### Dataset 4: E-MTAB-13067 - Human Immune Cells
- **Source**: ArrayExpress
- **ArrayExpress ID**: E-MTAB-13067
- **Organism**: Human
- **Tissue**: PBMC
- **Tissue State**: Single nuclei, frozen
- **Project Overview**: Single-cell RNA-seq of human immune cells
- **Sample Count**: 2 samples
- **Total Cell Count**: 23,078 cells

### Dataset 5: E18 Mouse Brain (10X Genomics)
- **Source**: 10X Genomics
- **NCBI Project**: Not applicable
- **Organism**: Mouse
- **Tissue**: Brain
- **Tissue State**: Single cell
- **Project Overview**: Brain cells from embryonic day 18 (E18) mice
- **Sample Count**: 4 samples
- **Total Cell Count**: 36,510 cells

### Dataset 6: PRJNA1006693 - Glioblastoma Study
- **Source**: NCBI (GSE241184)
- **NCBI Project**: PRJNA1006693
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell
- **Project Overview**: Transcriptional profiling of human glioblastoma multiforme tumor samples before and after treatment
- **Sample Count**: 3 samples
- **Total Cell Count**: 28,509 cells

### Dataset 7: PRJNA1185392 - Pancreatic Cancer Study
- **Source**: NCBI (GSE281736)
- **NCBI Project**: PRJNA1185392
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell, fresh
- **Project Overview**: Single-cell RNA-seq analysis of immune cell types in the context of pancreatic cancer
- **Sample Count**: 12 samples
- **Total Cell Count**: 67,054 cells

### Dataset 8: PRJNA790856 - Macrophage Response Study
- **Source**: NCBI (GSE191288)
- **NCBI Project**: PRJNA790856
- **Organism**: Human
- **Tissue**: Thyroid
- **Tissue State**: Single cell, fresh
- **Project Overview**: Transcriptome-wide analysis of human macrophages in response to various stimuli
- **Sample Count**: 7 samples
- **Total Cell Count**: 32,082 cells

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/20324815">Log in</a> to check if you have access.

Additional details

Identifiers

DOI
10.1101/2025.05.14.654073
Other
https://github.com/FrancoResearchLab/SWANS

Related works

Is referenced by
Other: 10.1101/2025.05.14.654073 (DOI)

Funding

United States Department of Defense
W81XWH2210655
Lustgarten Foundation
The Suzi and Scott Lustgarten Endowment

Dates

Updated
2025-04/2025-07
Datasets were processed through the SWANS pipeline from April-July 2025.

Software

Repository URL
https://github.com/FrancoResearchLab/SWANS
Programming language
Python , R , Snakemake
Development Status
Active