Suco - PBMC
Description
Suco (Single cell universal classification omnibus) is a large standardized reference dataset for cell type classification in single cell RNA sequencing data. Suco seeks to tackle the lack of standardized datasets for classification tasks in the single cell genomics field. In other fields of artificial intelligence, like computer vision, standardized datasets such as MNIST or ImageNet have transformed the development of powerful new machine learning methods.
Suco features manual uniform standardized hierarchical cell type annotations in independently analyzed datasets. This collection of independent datasets ensures that machine learning classifiers can be tested using statistically independent data & labels.
Here, we present the peripheral blood mononuclear cell dataset within Suco which includes > 1200 manually labeled cell type clusters from 12 independent datasets totalling >500 individuals and >5 millions cells.
______________________________________________________________________________________________________
Updates compared to v1.2
- Included cell type hierarchy as nested dictionary in json file. This hierarchy can be used to query the data, e.g. with our Cytopus package wallet-maker/cytopus: Single cell omics biology annotations
- removed residual low quality cells & doublets
- increased compression to allow for faster download
______________________________________________________________________________________________________
Structure of the dataset:
.zip compressed folder containing datasets from individual studies which have been reprocessed, clustered and annotated independently by two different expert raters (human immunology)
- Filenames: DATASET_ID.h5ad
- The dataset ID has the following format (each line followed by ‘-X-‘ separator
- Tissue/cell type: here PBMC
- Disease context
- Publication year
- First author (optional: followed by _BATCHNAME)
- DOI (/ in DOI is replace by _ for compatibility with file systems
- the .h5ad files have the following structure
- load using the scanpy python package adata = sc.read(FILE_PATH)
- cell barcode (adata.obs_names)
- Study-ID + '-X-' + internal barcode
- adata.obs[‘sample_id’]
- the sample ID should be the patient ID + '-X-' separator + internal sample ID
- e.g. TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-2-X-Pre
- with TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-2 being the patient ID
- -X- the separator
- e.g. TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-2-X-Pre
- the sample ID should be the patient ID + '-X-' separator + internal sample ID
- adata.obs['patient_id']
- dataset id followed by an '-X-' separator and the internal patient id
- e.g. TIL-X-BRCA-X-scRNAseq-X-Bassez-X-2021-X-10.1038_s41591-021-01323-8-X-35
- -X- is the separator
- 35 is the internal patient
- adata.obs[cluster_final']
- final clustering used for the cell type annotation
- granularity can differ between subsets --> e.g. clustering from myeloid cells can originate from myeloid subset, clustering from TNK from TNK subset and epithelial from all leukocyte subset
- should be preceeded by the prefix used for subtyping e.g. 'TNK' for TNK cells followed by a '_' seperator and the cluster number:·
- e.g.
- cluster 0 in TNK would be 'TNK_0'
- cluster 1 in M would be 'M_1'
- adata.obs[cluster_all']
- containing coarse clustering format 'all_CLUSTERNUMBER'
- adata.obs[‘annotation’]
- Most granular annotation based on adata.obs[‘cluster_final’]
- adata.obs[‘annotation_all’]
- annotation based on adata.obs[‘cluster_all’]
- The dataset ID has the following format (each line followed by ‘-X-‘ separator
______________________________________________________________________________________________________
This datasets contains completely individually reprocessed and reannotated data from the following studies. For privacy reasons barcodes have been modified. If you use this dataset you agree to acknowledge the original studies' authors and comply with their licenses:
- Zhang, Y. et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell 39, 1578-1593.e1578 (2021). https://doi.org:10.1016/j.ccell.2021.09.010
- Keenan, B. P. et al. Circulating monocytes associated with anti-PD-1 resistance in human biliary cancer induce T cell paralysis. Cell Reports 40 (2022). https://doi.org:10.1016/j.celrep.2022.111384
- Che, L.-H. et al. A single-cell atlas of liver metastases of colorectal cancer reveals reprogramming of the tumor microenvironment in response to preoperative chemotherapy. Cell Discovery 7, 80 (2021). https://doi.org:10.1038/s41421-021-00312-y
- Wang, F. et al. Single-cell and spatial transcriptome analysis reveals the cellular heterogeneity of liver metastatic colorectal cancer. Science Advances 9, eadf5464 (2023). https://doi.org:doi:10.1126/sciadv.adf5464
- Liu, C. et al. Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell 184, 1836-1857.e1822 (2021). https://doi.org:10.1016/j.cell.2021.02.018
- Ren, X. et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 184, 1895-1913.e1819 (2021). https://doi.org:10.1016/j.cell.2021.01.053
- Hao, Y. et al. Integrated analysis of multimodal single-cell data. bioRxiv, 2020.2010.2012.335331 (2020). https://doi.org:10.1101/2020.10.12.335331
- Terekhova, M. et al. Single-cell atlas of healthy human blood unveils age-related loss of NKG2C+GZMB-CD8+ memory T cells and accumulation of type 2 memory T cells. Immunity 56, 2836-2854.e2839 (2023). https://doi.org:10.1016/j.immuni.2023.10.013
- Oelen, R. et al. Single-cell RNA-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure. Nature Communications 13, 3267 (2022). https://doi.org:10.1038/s41467-022-30893-5
- Steele, N. G. et al. Multimodal mapping of the tumor and peripheral blood immune landscape in human pancreatic cancer. Nature Cancer 1, 1097-1112 (2020). https://doi.org:10.1038/s43018-020-00121-4
Files
Files
(10.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:68a63b50a361a52e86c4ae082076e27c
|
10.3 GB | Download |
Additional details
Dates
- Created
-
2024-09-06