utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR

Nicholas Borcherding

doi:10.5281/zenodo.17977149

Published January 9, 2026 | Version v1.0.1

Dataset Embargoed

utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR

Nicholas Borcherding¹

1. Washington University

uTILity is a comprehensive, harmonized collection of publicly available single-cell RNA sequencing data from tumor-infiltrating T cells (TILs) with paired T cell receptor (TCR) sequencing. This resource aggregates data from 28 published studies spanning 13 tissue types, 420 unique patients, and over 2.6 million cells, with 1.8 million cells having associated TCR information.

Data Processing

All datasets were uniformly processed using the following pipeline:

Quality Control: Cells with >10% mitochondrial genes and/or 2.5× standard deviation from the mean number of features were excluded. Doublets were identified using scDblFinder.
Annotation: Automated cell type annotation was performed using:
- SingleR with Human Primary Cell Atlas (HPCA) and Monaco reference datasets
- Azimuth with the PBMC reference (providing L1, L2, and L3 annotations)
TCR Integration: T cell receptor data was processed using scRepertoire, with clonotypes assigned based on CDR3 amino acid sequences and gene usage.

Seurat Objects (.rds): Fully processed R objects with gene expression, cell metadata, dimensional reductions, and TCR annotations
AnnData Files (.h5ad): Python-compatible exports for use with scanpy, scvi-tools, and related ecosystems
Processed Data: Intermediate files and per-cohort objects for users who wish to work with individual studies

Cancer Types Represented

Breast, Colorectal, Lung, Melanoma, Renal, Ovarian, HNSCC, Esophageal, Biliary, Endometrial, Merkel Cell, and multi-cancer cohorts.

Tissue Types

Tumor, Normal adjacent tissue, Peripheral blood, Lymph node, Metastatic lesions, and Juxtatumoral tissue.

Usage

This data is intended for researchers studying tumor immunology, T cell biology, and computational methods for single-cell analysis. Users can leverage the harmonized annotations and TCR data for:

Pan-cancer T cell phenotype analysis
TCR repertoire studies across cancer types
Benchmarking integration and annotation methods
Training and validating machine learning models

For analysis code and the processing pipeline, see the associated GitHub repository.

File Formats

.h5ad (Hierarchical Data Format) AnnData objects compatible with the Python single-cell ecosystem.

X: Raw count matrix (sparse CSR)
obs: Cell metadata
var: Gene metadata
obsm: Embeddings (PCA, UMAP, HARMONY, etc.)

Load in Python with:

import scanpy as sc
adata = sc.read_h5ad("adata.h5ad")

Load in R with:

library(Seurat)
obj <- as.Seurat(readRDS("adata.h5ad"))

Metadata Columns

See metadata_headers.txt in the GitHub repository for complete descriptions: https://github.com/ncborcherding/utility/blob/main/summary/metadata_headers.txt

Key columns:

orig.ident: Sample identifier (tumor type + tissue)
predicted.celltype.l1/l2/l3: Azimuth annotations
Monaco.labels / HPCA.labels: SingleR annotations
CTaa: Clonotype by CDR3 amino acid sequence
clonalFrequency: Clone count within sample
clonalProportion: Clone proportion within sample

SUGGESTED CITATION FORMAT

Borcherding, N. (2025). uTILity: Comprehensive Single-Cell Tumor-Infiltrating Lymphocyte Data with Paired TCR Sequencing (Version 1.0.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.10211240

Files

Embargoed

The files will be made publicly available on December 11, 2026.

Reason: Finalizing and publishing data

Additional details

Available: 2025-12-11

Repository URL: https://github.com/ncborcherding/utility
Programming language: R , Python

	All versions	This version
Views	3,184	65
Downloads	594	4
Data volume	115.8 TB	89.5 GB

Data Processing

Contents

Cancer Types Represented

Tissue Types

Usage

File Formats

Metadata Columns

SUGGESTED CITATION FORMAT

Files

Embargoed

Dates

Software

utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR

Authors/Creators

Description

Data Processing

Contents

Cancer Types Represented

Tissue Types

Usage

File Formats

Metadata Columns

SUGGESTED CITATION FORMAT

Files

Embargoed

Additional details

Dates

Software