Published June 7, 2024 | Version v1
Dataset Open

Datasets for evaluating SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data

Description

This resource contains pre-processed A. thaliana root , the H. sapiens aortic valve datasets, PBMC Covid atlas and public 10x datasetse used in the paper, SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data. The raw datasets provided in the links below are pre-processed for quality control with respect to both cells and genes. 

A. thaliana datasets are sourced from the following locations at Single-cell Gene expression Atlas and Gene Expression Omnibus (GEO):

  1. E-GEOD-121619 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-121619/results
  2. E-GEOD-152766 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-152766/results
  3. E-GEOD-158761 :   https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-158761/results
  4. E-GEOD-123013 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-123013/results

H. sapiens datasets are obtained from the NCBI database : https://www.ncbi.nlm.nih.gov/bioproject/PRJNA562645/ 

  1. GSE152766: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152766
  2. GSE158761: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158761

All COVID atlas datasets are from: http://covid19.cancer-pku.cn . covid_atlas_data1.zip contains the h5ad files and covid_atlas_data2.zip contains the Seurat rds files.

PBMC datasets are from the following public sources:

Dataset Name Chemistry Version Web Link
10k Human PBMCs, 3' v3.1, Chromium X v3.1 https://www.10xgenomics.com/datasets/10k-human-pbmcs-3-ht-v3-1-chromium-x-3-1-high
20k Human PBMCs, 3' HT v3.1, Chromium X v3.1 https://www.10xgenomics.com/datasets/20-k-human-pbm-cs-3-ht-v-3-1-chromium-x-3-1-high-6-1-0
10k Human PBMCs, 3' v3.1, Chromium Controller v3.1 https://www.10xgenomics.com/datasets/10k-human-pbmcs-3-v3-1-chromium-controller-3-1-high
Healthy PBMC Chromium Connect (channel 1) v3.1 https://www.10xgenomics.com/datasets/peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-chromium-connect-channel-1-3-1-standard-3-1-0
Healthy PBMC Chromium Connect (channel 5) v3.1 https://www.10xgenomics.com/datasets/peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-chromium-connect-channel-5-3-1-standard-3-1-0
10k PBMCs from a Healthy Donor (v3 chemistry) v3.0 https://www.10xgenomics.com/datasets/10-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
1k PBMCs from a Healthy Donor (v2 chemistry) v2.0 https://www.10xgenomics.com/datasets/1-k-pbm-cs-from-a-healthy-donor-v-2-chemistry-3-standard-3-0-0
1k PBMCs from a Healthy Donor (v3 chemistry) v3.0 https://www.10xgenomics.com/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
Fresh 68k PBMCs (Donor A) v1.0 https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0
Frozen PBMCs (Donor A) v1.0 https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-a-1-standard-1-1-0
Frozen PBMCs (Donor B) v1.0 https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-b-1-standard-1-1-0
Frozen PBMCs (Donor C) v1.0 https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-c-1-standard-1-1-0
PBMCs from a Healthy Donor: Whole Transcriptome Analysis v3.1 https://www.10xgenomics.com/datasets/pbm-cs-from-a-healthy-donor-whole-transcriptome-analysis-3-1-standard-4-0-0
PBMC 600K v1 https://www.ebi.ac.uk/gxa/sc/experiments/E-HCAD-4/downloads
GSM4560071 v2.0 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560071
GSM4560074 v2.0 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560074
GSM4560070 v2.0 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560070

References for the Datasets :

  1. H. sapiens dataset: Kang Xu, Shangbo Xie,Yuming Huang,Tingwen Zhou, Ming Liu, Peng Zhu, Chunli Wang, Jiawei Shi, Fei Li,Frank W. Sellke and Nianguo Dong (2020) Cell-Type Transcriptome Atlas of Human Aortic Valves Reveal Cell Heterogeneity and Endothelial to Mesenchymal Transition Involved in Calcific Aortic Valve Disease.
  2. E-GEOD-152766: Shahan R, Hsu C, Nolan TM, Cole BJ, Taylor IW et al. (2020) A single cell Arabidopsisroot atlas reveals developmental trajectories in wild type and cell identity mutants.
  3. E-GEOD-121619: Jean-Baptiste K, McFaline-Figueroa JL, Alexandre CM, Dorrity MW, Saunders L et al. (2019) Dynamics of Gene Expression in Single Root Cells of Arabidopsis thaliana.
  4. E-GEOD-123013: Ryu KH, Huang L, Kang HM, Schiefelbein J. (2019) Single-Cell RNA Sequencing Resolves Molecular Relationships Among Individual Plant Cells.
  5. E-GEOD-158761: Gala HP, Lanctot A, Jean-Baptiste K, Guiziou S, Chu JC et al. (2020) A single cell view of the transcriptome during lateral root initiation in Arabidopsis thaliana.
  6. COVID Atlas Reference: Xianwen Ren, Wen Wen, Xiaoying Fan et.al. (2021) COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas
  7. PBMC data are downloaded from respective links

Files

aortic_valve_datasets.zip

Files (21.7 GB)

Name Size Download all
md5:de580ff20b86e500eee4ac5db9ce5e61
901.2 MB Preview Download
md5:603292c25d2424c9d7203262442f033a
3.0 GB Preview Download
md5:67f4d08091c123fbfccfdff09772ae41
5.1 GB Preview Download
md5:c7a886c681f67636fb5185f34af48f16
10.6 GB Preview Download
md5:19ecd11793f062ef2fb581e9ad889606
2.0 GB Preview Download
md5:17c01656132de9f22fe0cf819e15a783
56.8 MB Preview Download

Additional details

Funding

U.S. National Science Foundation
A scalable integrated multi-modal single cell analysis framework for gene regulatory and cell-cell interaction networks 2233887

Software