Human Retina Cell Atlas reference model
Authors/Creators
Description
This dataset hosts files needed to reproduce the Human Retina Cell Atlas (HRCA) reference model using scArches. The HRCA data can be accessed through several interactive browsers, including HCA Data Portal, CELLxGENE, UCSC Cell Browser, and the Broad Single Cell Portal. Please use these browsers for atlas exploration and visualization. For more information on HRCA, please refer to the HRCA paper (Li et al., bioRxiv 2023) and the Github repository at https://github.com/RCHENLAB/HRCA_reproducibility. This dataset has been used in the tutorial for the HRCA reference model at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.
Data description:
1. HRCA_snRNA_allcells_rawcounts.h5ad
This file contains the cell-by-gene count matrix for over 3.1 million single nuclei and more than 36,000 gene features of the HRCA. Gene features are represented by gene symbols. Please refer to the interactive browsers for atlas exploration, where gene features are mapped to Ensembl IDs. In the cell metadata, "sampleid" indicates sample batches of cells, and "celltype" specifies 123 retina cell types.
2. model.pt
This file is the trained reference model using scArches, incorporating 10,000 highly variable features from the full count matrix. It can be directly used for cell type annotation of new retina samples.
3. HRCA_snRNA_allcells_rawcounts_latent.h5ad
This file contains the embeddings of all 3.1 million reference single nuclei generated by the trained reference model using scArches. These embeddings can be used to compare with the embeddings of query data for exploration.
4. HRCA_reference_model_gene_id_and_symbol.csv
This file contains the mapping of Ensembl IDs to gene symbols for the 10,000 features used in the reference model. This mapping can be used to convert the gene features in a query .h5ad file from gene IDs to gene symbols, allowing cell type labels to be predicted using the trained reference model, which uses gene symbols as gene features.
5. query.h5ad
This file contains a cell-by-gene count matrix for a query dataset, designed to support reproducibility in the HRCA reference model tutorial. The "majorclass" column includes pre-annotated major cell classes. Additional details on the tutorial are available at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.
6. query_latent.h5ad
This file contains the embeddings of the query data against the trained reference model. These embeddings can be compared with the reference data embeddings for exploration and visualization.
Files
HRCA_reference_model_gene_id_and_symbol.csv
Files
(20.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:4e4988b652057f3d2b5f46e13695c833
|
235.5 kB | Preview Download |
|
md5:22096e0461577dc9af01ccd6191a96cb
|
19.5 GB | Download |
|
md5:eef2c60b0b38ef15d3f30d107780a174
|
1.3 GB | Download |
|
md5:745c4612397aa151b6e4971de4034804
|
55.6 MB | Download |
|
md5:11fc25834bea0045c88182e3a4e62156
|
32.3 MB | Download |
|
md5:bfaafc39b5840edf18e0fcea5b6d891e
|
4.5 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/RCHENLAB/HRCA_reproducibility