Transcriptomic Endotypes of Sepsis Identified by Consensus Clustering of Whole-Blood Gene Expression
Authors/Creators
Description
# Sepsis Immune Endotypes Transcriptomic Dataset
## Overview
This dataset contains processed data files from the manuscript **"Transcriptomic Endotypes of Sepsis Identified by Consensus Clustering of Whole-Blood Gene Expression"**.
Using consensus clustering on 5 publicly available whole-blood transcriptomic datasets (1,002 sepsis patients), we identified 3 sepsis immune endotypes:
- **C1 Immune Activation** (306 patients, 30.5%): mortality 26.1%
- **C2 Interferon Response** (447 patients, 44.6%): mortality 15.4%
- **C3 Erythroid Dysregulation** (249 patients, 24.9%): mortality 25.3%
## Directory Structure
```
sepsis_immune_zenodo/
├── discovery/ # Discovery cohorts (5 datasets)
├── validation/ # Validation cohorts (4 datasets)
├── clustering/ # Clustering results
├── differential_expression/ # Differential expression & enrichment
├── phenotype/ # Clinical phenotype data
├── README.md
├── CITATION.cff
└── LICENSE
```
## Data Dictionary
All files are in R RDS format. Load with `readRDS()`.
### discovery/ — Discovery Cohorts
Raw and processed gene expression matrices (rows = genes, columns = samples). Processed matrices contain the 9-platform universal gene set (2,456 genes).
| File | Dimensions | Platform | Size |
|------|-----------|----------|------|
| GSE65682_raw.rds | 11,518 x 479 | GPL13667 (Affymetrix HG-U219) | 41.6 MB |
| GSE65682_processed.rds | 2,456 x 479 | — | 7.4 MB |
| GSE185263_raw.rds | 35,302 x 345 | GPL16791 (Illumina HiSeq 2500) | 44.9 MB |
| GSE185263_processed.rds | 2,456 x 345 | — | 5.5 MB |
| GSE63042_raw.rds | 14,077 x 106 | GPL9115 (Illumina HumanHT-12 V3) | 3.5 MB |
| GSE63042_processed.rds | 2,456 x 106 | — | 852 KB |
| GSE95233_raw.rds | 22,880 x 51 | GPL570 (Affymetrix HG-U133 Plus 2) | 8.8 MB |
| GSE95233_processed.rds | 2,456 x 51 | — | 817 KB |
| GSE137340_raw.rds | 30,308 x 21 | GPL10558 (Illumina HumanHT-12 V4) | 4.8 MB |
| GSE137340_processed.rds | 2,456 x 21 | — | 354 KB |
### validation/ — Validation Cohorts
Full expression matrices (rows = genes, columns = samples) for LASSO classifier external validation.
| File | Dimensions | Size |
|------|-----------|------|
| GSE54514_validation_expr.rds | 16,101 x 35 | 4.1 MB |
| GSE26378_validation_expr.rds | 22,880 x 103 | 7.4 MB |
| GSE26440_validation_expr.rds | 22,880 x 130 | 9.7 MB |
| GSE32707_validation_expr.rds | 31,326 x 144 | 15.2 MB |
### clustering/ — Clustering Results
| File | Type | Dimensions | Description |
|------|------|-----------|-------------|
| combined_expression_batch_corrected.rds | matrix | 2,456 x 1,002 | ComBat batch-corrected combined expression matrix |
| expression_selected_genes.rds | matrix | 122 x 1,002 | Feature gene expression matrix after MAD filtering |
| clustering_results.rds | list (7) | — | Full ConsensusClusterPlus output |
| cluster_assignments_all_k.rds | list (9) | — | Cluster assignments for k = 2 through k = 10 |
| endotype_centroids.rds | matrix | 2,456 x 3 | Centroid expression profiles for 3 endotypes |
### differential_expression/ — Differential Expression
| File | Type | Description |
|------|------|-------------|
| differential_expression_results.rds | list (3) | limma DE results for each endotype vs. rest |
| enrichment_results.rds | list (3) | GO/KEGG functional enrichment results |
### phenotype/ — Clinical Phenotype
| File | Dimensions | Description |
|------|-----------|-------------|
| GSE65682_phenotype.rds | 802 x 13 | Clinical metadata for the GSE65682 cohort (includes survival data) |
## Usage Examples
```r
# Load batch-corrected expression matrix
expr <- readRDS("clustering/combined_expression_batch_corrected.rds")
dim(expr) # 2456 x 1002
# Load cluster assignments (k = 3)
assignments <- readRDS("clustering/cluster_assignments_all_k.rds")
clusters_k3 <- assignments[["3"]]
# Load differential expression results
de_results <- readRDS("differential_expression/differential_expression_results.rds")
names(de_results) # DE results per endotype
# Load validation data
val_expr <- readRDS("validation/GSE54514_validation_expr.rds")
```
## Data Processing Pipeline
1. Download raw data from GEO -> `discovery/*_raw.rds`
2. Gene symbol mapping and 9-platform intersection (2,456 genes) -> `discovery/*_processed.rds`
3. ComBat batch correction -> `clustering/combined_expression_batch_corrected.rds`
4. MAD feature selection (MAD >= 1.0) -> `clustering/expression_selected_genes.rds` (122 genes)
5. PAM + Pearson distance consensus clustering -> `clustering/clustering_results.rds`
6. limma differential expression -> `differential_expression/`
## License
This dataset is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
Files
README.md
Files
(175.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d050a0c389aa3b5a5f18a24607787e89
|
17.1 kB | Download |
|
md5:e2eda3a159fd4f484b7c261bcda92720
|
17.9 kB | Download |
|
md5:e4985aff45502a25772b26b7a34ba775
|
18.6 MB | Download |
|
md5:eeb765dd86572f4758a99c79546ad449
|
409.5 kB | Download |
|
md5:cbdb79132117efd7221b5d7af955d50b
|
66.4 kB | Download |
|
md5:3b8f7cfa4808f014f8858a05d909fff4
|
5.1 kB | Download |
|
md5:295b98c3a9a23460a147681f83fa9bd4
|
914.5 kB | Download |
|
md5:921133f1f2b557c89e82a138ebf330c5
|
354.4 kB | Download |
|
md5:7a987666e1b057b93acdadc4034211aa
|
4.8 MB | Download |
|
md5:739fa8d40f379e2ffae91d7d51c5ebf3
|
5.5 MB | Download |
|
md5:3bccaea4662518e1b8004ab6fe957477
|
44.9 MB | Download |
|
md5:9679bec9b55f2de3d868dbeccbf4aa75
|
7.4 MB | Download |
|
md5:5a281a79a7248af05d00a4aed66ea144
|
9.7 MB | Download |
|
md5:deeb575b2742cb2ed48c49ddc0ba3cca
|
15.2 MB | Download |
|
md5:76bec48b802770a476ecdba39975469c
|
4.1 MB | Download |
|
md5:86ebe6479edc986b9770ea4e40ecfe32
|
852.0 kB | Download |
|
md5:2ad27ed920819f798da4ac44f92ad7c8
|
3.5 MB | Download |
|
md5:8b0b1651d8d892ce7c7f50e631cb6e07
|
7.6 kB | Download |
|
md5:d28de6841c49a1dd47f88bedc20d2a5f
|
7.4 MB | Download |
|
md5:490a24b8f5b4c8b1f3f64db57f4d1e84
|
41.6 MB | Download |
|
md5:feb479772f90077c0f158392e08778e0
|
817.1 kB | Download |
|
md5:4f7275b9d212370b9d4c54afe827df0a
|
8.8 MB | Download |
|
md5:911082df882bc69c611d5ae6919a0790
|
5.2 kB | Preview Download |