There is a newer version of the record available.

Published May 6, 2025 | Version 1.2

Suco - PBMC

  • 1. ROR icon Heidelberg University

Description

Suco (Single cell universal classification omnibus) is a large standardized reference dataset for cell type classification in single cell RNA sequencing data. Suco seeks to tackle the lack of standardized datasets for classification tasks in the single cell genomics field. In other fields of artificial intelligence, like computer vision, standardized datasets such as MNIST or ImageNet have transformed the development of powerful new machine learning methods.

Suco features manual uniform standardized hierarchical cell type annotations in independently analyzed datasets. This collection of independent datasets ensures that machine learning classifiers can be tested using statistically independent data & labels. 

Here, we present the peripheral blood mononuclear cell dataset within Suco which includes > 1200 independent manual cell type cluster labels from 12 datasets totalling >500 individuals and >5 millions cells. 

______________________________________________________________________________________________________


Structure of the dataset:

.zip compressed folder containing datasets from individual studies which have been reprocessed, clustered and annotated independently by two different expert raters (human immunology)

  • Filenames: DATASET_ID.h5ad
    • The dataset ID has the following format (each line followed by ‘-X-‘ separator
      • Tissue/cell type: here PBMC
      • Disease context
      • Publication year
      • First author (optional: followed by _BATCHNAME)
      • DOI (/ in DOI is replace by _ for compatibility with file systems
    • the .h5ad files have the following structure
      • load using the scanpy python package adata = sc.read(FILE_PATH)
      • cell barcode (adata.obs_names)
      • Study-ID + '-X-' + internal barcode
      • adata.obs[‘sample_id’]
        • the sample ID should be the patient ID + '-X-' separator + internal sample ID
          • e.g. TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-2-X-Pre
            • with TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-2 being the patient ID
            • -X-  the separator
      • adata.obs['patient_id']
        • dataset id followed by an '-X-' separator and the internal patient id
        • e.g. TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-35
        • -X- is the separator
        • 35 is the internal patient
      • adata.obs[cluster_final']
        • final clustering used for the cell type annotation
        • granularity can differ between subsets --> e.g. clustering from myeloid cells can originate from myeloid subset, clustering from TNK from TNK subset and epithelial from all leukocyte subset
        • should be preceeded by the prefix used for subtyping e.g. 'TNK' for TNK cells followed by a '_' seperator and the cluster number:·  
        • e.g.
          • cluster 0 in TNK would be 'TNK_0'
          • cluster 1 in M would be 'M_1'
      • adata.obs[cluster_all']
        • containing coarse clustering format 'all_CLUSTERNUMBER'     
      • adata.obs[‘annotation’]
        • Most granular annotation based on adata.obs[‘cluster_final’]
      • adata.obs[‘annotation_all’]
        •  annotation based on adata.obs[‘cluster_all’]

______________________________________________________________________________________________________

This datasets contains completely individually reprocessed and reannotated data from the following studies. For privacy reasons barcodes have been modified. If you use this dataset you agree to acknowledge the original studies' authors and comply with their licenses:

 

  1. Zhang, Y. et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell 39, 1578-1593.e1578 (2021). https://doi.org:10.1016/j.ccell.2021.09.010
  2. Keenan, B. P. et al. Circulating monocytes associated with anti-PD-1 resistance in human biliary cancer induce T cell paralysis. Cell Reports 40 (2022). https://doi.org:10.1016/j.celrep.2022.111384
  3.  Che, L.-H. et al. A single-cell atlas of liver metastases of colorectal cancer reveals reprogramming of the tumor microenvironment in response to preoperative chemotherapy. Cell Discovery 7, 80 (2021). https://doi.org:10.1038/s41421-021-00312-y
  4.  Wang, F. et al. Single-cell and spatial transcriptome analysis reveals the cellular heterogeneity of liver metastatic colorectal cancer. Science Advances 9, eadf5464 (2023). https://doi.org:doi:10.1126/sciadv.adf5464
  5.  Liu, C. et al. Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184, 1836-1857.e1822 (2021). https://doi.org:10.1016/j.cell.2021.02.018
  6.  Ren, X. et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 184, 1895-1913.e1819 (2021). https://doi.org:10.1016/j.cell.2021.01.053
  7.  Hao, Y. et al. Integrated analysis of multimodal single-cell data. bioRxiv, 2020.2010.2012.335331 (2020). https://doi.org:10.1101/2020.10.12.335331
  8.  Terekhova, M. et al. Single-cell atlas of healthy human blood unveils age-related loss of NKG2C+GZMB-CD8+ memory T cells and accumulation of type 2 memory T cells. Immunity 56, 2836-2854.e2839 (2023). https://doi.org:10.1016/j.immuni.2023.10.013
  9. Oelen, R. et al. Single-cell RNA-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure. Nature Communications 13, 3267 (2022). https://doi.org:10.1038/s41467-022-30893-5
  10. Steele, N. G. et al. Multimodal mapping of the tumor and peripheral blood immune landscape in human pancreatic cancer. Nature Cancer 1, 1097-1112 (2020). https://doi.org:10.1038/s43018-020-00121-4

Files

Suco_1.2.zip

Files (23.4 GB)

Name Size
md5:f053f729db7cc8a8d9bd96de8acc7700
23.4 GB Preview Download

Additional details

Dates

Created
2024-09-06