Published April 12, 2026 | Version v1
Dataset Open

Transcriptomic Endotypes of Sepsis Identified by Consensus Clustering of Whole-Blood Gene Expression

Authors/Creators

Description

# Sepsis Immune Endotypes Transcriptomic Dataset

## Overview

This dataset contains processed data files from the manuscript **"Transcriptomic Endotypes of Sepsis Identified by Consensus Clustering of Whole-Blood Gene Expression"**.

Using consensus clustering on 5 publicly available whole-blood transcriptomic datasets (1,002 sepsis patients), we identified 3 sepsis immune endotypes:
- **C1 Immune Activation** (306 patients, 30.5%): mortality 26.1%
- **C2 Interferon Response** (447 patients, 44.6%): mortality 15.4%
- **C3 Erythroid Dysregulation** (249 patients, 24.9%): mortality 25.3%

## Directory Structure

```
sepsis_immune_zenodo/
├── discovery/               # Discovery cohorts (5 datasets)
├── validation/              # Validation cohorts (4 datasets)
├── clustering/              # Clustering results
├── differential_expression/ # Differential expression & enrichment
├── phenotype/               # Clinical phenotype data
├── README.md
├── CITATION.cff
└── LICENSE
```

## Data Dictionary

All files are in R RDS format. Load with `readRDS()`.

### discovery/ — Discovery Cohorts

Raw and processed gene expression matrices (rows = genes, columns = samples). Processed matrices contain the 9-platform universal gene set (2,456 genes).

| File | Dimensions | Platform | Size |
|------|-----------|----------|------|
| GSE65682_raw.rds | 11,518 x 479 | GPL13667 (Affymetrix HG-U219) | 41.6 MB |
| GSE65682_processed.rds | 2,456 x 479 | — | 7.4 MB |
| GSE185263_raw.rds | 35,302 x 345 | GPL16791 (Illumina HiSeq 2500) | 44.9 MB |
| GSE185263_processed.rds | 2,456 x 345 | — | 5.5 MB |
| GSE63042_raw.rds | 14,077 x 106 | GPL9115 (Illumina HumanHT-12 V3) | 3.5 MB |
| GSE63042_processed.rds | 2,456 x 106 | — | 852 KB |
| GSE95233_raw.rds | 22,880 x 51 | GPL570 (Affymetrix HG-U133 Plus 2) | 8.8 MB |
| GSE95233_processed.rds | 2,456 x 51 | — | 817 KB |
| GSE137340_raw.rds | 30,308 x 21 | GPL10558 (Illumina HumanHT-12 V4) | 4.8 MB |
| GSE137340_processed.rds | 2,456 x 21 | — | 354 KB |

### validation/ — Validation Cohorts

Full expression matrices (rows = genes, columns = samples) for LASSO classifier external validation.

| File | Dimensions | Size |
|------|-----------|------|
| GSE54514_validation_expr.rds | 16,101 x 35 | 4.1 MB |
| GSE26378_validation_expr.rds | 22,880 x 103 | 7.4 MB |
| GSE26440_validation_expr.rds | 22,880 x 130 | 9.7 MB |
| GSE32707_validation_expr.rds | 31,326 x 144 | 15.2 MB |

### clustering/ — Clustering Results

| File | Type | Dimensions | Description |
|------|------|-----------|-------------|
| combined_expression_batch_corrected.rds | matrix | 2,456 x 1,002 | ComBat batch-corrected combined expression matrix |
| expression_selected_genes.rds | matrix | 122 x 1,002 | Feature gene expression matrix after MAD filtering |
| clustering_results.rds | list (7) | — | Full ConsensusClusterPlus output |
| cluster_assignments_all_k.rds | list (9) | — | Cluster assignments for k = 2 through k = 10 |
| endotype_centroids.rds | matrix | 2,456 x 3 | Centroid expression profiles for 3 endotypes |

### differential_expression/ — Differential Expression

| File | Type | Description |
|------|------|-------------|
| differential_expression_results.rds | list (3) | limma DE results for each endotype vs. rest |
| enrichment_results.rds | list (3) | GO/KEGG functional enrichment results |

### phenotype/ — Clinical Phenotype

| File | Dimensions | Description |
|------|-----------|-------------|
| GSE65682_phenotype.rds | 802 x 13 | Clinical metadata for the GSE65682 cohort (includes survival data) |

## Usage Examples

```r
# Load batch-corrected expression matrix
expr <- readRDS("clustering/combined_expression_batch_corrected.rds")
dim(expr)  # 2456 x 1002

# Load cluster assignments (k = 3)
assignments <- readRDS("clustering/cluster_assignments_all_k.rds")
clusters_k3 <- assignments[["3"]]

# Load differential expression results
de_results <- readRDS("differential_expression/differential_expression_results.rds")
names(de_results)  # DE results per endotype

# Load validation data
val_expr <- readRDS("validation/GSE54514_validation_expr.rds")
```

## Data Processing Pipeline

1. Download raw data from GEO -> `discovery/*_raw.rds`
2. Gene symbol mapping and 9-platform intersection (2,456 genes) -> `discovery/*_processed.rds`
3. ComBat batch correction -> `clustering/combined_expression_batch_corrected.rds`
4. MAD feature selection (MAD >= 1.0) -> `clustering/expression_selected_genes.rds` (122 genes)
5. PAM + Pearson distance consensus clustering -> `clustering/clustering_results.rds`
6. limma differential expression -> `differential_expression/`

 

## License

This dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.

Files

README.md

Files (175.0 MB)

Name Size Download all
md5:d050a0c389aa3b5a5f18a24607787e89
17.1 kB Download
md5:e2eda3a159fd4f484b7c261bcda92720
17.9 kB Download
md5:e4985aff45502a25772b26b7a34ba775
18.6 MB Download
md5:eeb765dd86572f4758a99c79546ad449
409.5 kB Download
md5:cbdb79132117efd7221b5d7af955d50b
66.4 kB Download
md5:3b8f7cfa4808f014f8858a05d909fff4
5.1 kB Download
md5:295b98c3a9a23460a147681f83fa9bd4
914.5 kB Download
md5:921133f1f2b557c89e82a138ebf330c5
354.4 kB Download
md5:7a987666e1b057b93acdadc4034211aa
4.8 MB Download
md5:739fa8d40f379e2ffae91d7d51c5ebf3
5.5 MB Download
md5:3bccaea4662518e1b8004ab6fe957477
44.9 MB Download
md5:9679bec9b55f2de3d868dbeccbf4aa75
7.4 MB Download
md5:5a281a79a7248af05d00a4aed66ea144
9.7 MB Download
md5:deeb575b2742cb2ed48c49ddc0ba3cca
15.2 MB Download
md5:76bec48b802770a476ecdba39975469c
4.1 MB Download
md5:86ebe6479edc986b9770ea4e40ecfe32
852.0 kB Download
md5:2ad27ed920819f798da4ac44f92ad7c8
3.5 MB Download
md5:8b0b1651d8d892ce7c7f50e631cb6e07
7.6 kB Download
md5:d28de6841c49a1dd47f88bedc20d2a5f
7.4 MB Download
md5:490a24b8f5b4c8b1f3f64db57f4d1e84
41.6 MB Download
md5:feb479772f90077c0f158392e08778e0
817.1 kB Download
md5:4f7275b9d212370b9d4c54afe827df0a
8.8 MB Download
md5:911082df882bc69c611d5ae6919a0790
5.2 kB Preview Download