Published October 30, 2024 | Version v1
Dataset Open

Human Retina Cell Atlas reference model

  • 1. ROR icon University of California, Irvine

Description

This dataset hosts files needed to reproduce the Human Retina Cell Atlas (HRCA) reference model using scArches. The HRCA data can be accessed through several interactive browsers, including HCA Data Portal, CELLxGENE, UCSC Cell Browser, and the Broad Single Cell Portal. Please use these browsers for atlas exploration and visualization. For more information on HRCA, please refer to the HRCA paper (Li et al., bioRxiv 2023) and the Github repository at https://github.com/RCHENLAB/HRCA_reproducibility. This dataset has been used in the tutorial for the HRCA reference model at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.

Data description:

1. HRCA_snRNA_allcells_rawcounts.h5ad

This file contains the cell-by-gene count matrix for over 3.1 million single nuclei and more than 36,000 gene features of the HRCA. Gene features are represented by gene symbols. Please refer to the interactive browsers for atlas exploration, where gene features are mapped to Ensembl IDs. In the cell metadata, "sampleid" indicates sample batches of cells, and "celltype" specifies 123 retina cell types.

2. model.pt

This file is the trained reference model using scArches, incorporating 10,000 highly variable features from the full count matrix. It can be directly used for cell type annotation of new retina samples.

3. HRCA_snRNA_allcells_rawcounts_latent.h5ad

This file contains the embeddings of all 3.1 million reference single nuclei generated by the trained reference model using scArches. These embeddings can be used to compare with the embeddings of query data for exploration.

4. HRCA_reference_model_gene_id_and_symbol.csv

This file contains the mapping of Ensembl IDs to gene symbols for the 10,000 features used in the reference model. This mapping can be used to convert the gene features in a query .h5ad file from gene IDs to gene symbols, allowing cell type labels to be predicted using the trained reference model, which uses gene symbols as gene features.

5. query.h5ad

This file contains a cell-by-gene count matrix for a query dataset, designed to support reproducibility in the HRCA reference model tutorial. The "majorclass" column includes pre-annotated major cell classes. Additional details on the tutorial are available at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.

6. query_latent.h5ad

This file contains the embeddings of the query data against the trained reference model. These embeddings can be compared with the reference data embeddings for exploration and visualization.

Files

HRCA_reference_model_gene_id_and_symbol.csv

Files (20.9 GB)

Name Size Download all
md5:4e4988b652057f3d2b5f46e13695c833
235.5 kB Preview Download
md5:22096e0461577dc9af01ccd6191a96cb
19.5 GB Download
md5:eef2c60b0b38ef15d3f30d107780a174
1.3 GB Download
md5:745c4612397aa151b6e4971de4034804
55.6 MB Download
md5:11fc25834bea0045c88182e3a4e62156
32.3 MB Download
md5:bfaafc39b5840edf18e0fcea5b6d891e
4.5 MB Download

Additional details