Robust detection of SARS-CoV-2 exposure in population using T-cell repertoire profiling

Genomics of Adaptive Immunity Laboratory

doi:10.5281/zenodo.15077567

Published March 24, 2025 | Version v3

Dataset Restricted

Robust detection of SARS-CoV-2 exposure in population using T-cell repertoire profiling

Genomics of Adaptive Immunity Laboratory¹

1. Institute of Bioorganic Chemistry

The dataset contains processed T-cell receptor repertoire sequencing data from >1200 individuals of different sex and age. Note that only samples with good sequencing coverage are published (>10^5 reads per file).

The main aim of our study is to find TCR sequence biomarkers and develop a bioinformatic pipeline that allows building an accurate and robust classifier that distinguishes COVID-19-convalescent donors from unexposed individuals. We performed immunosequencing of the rearranged TCR α and β regions for PBMCs. For the cohort described in this study (Cohort-I) we sequence both chains of the TCR heterodimer as both of these chains are required to properly predict antigen recognition26. We ran conventional T-cell repertoire data analysis and pre-processed data to remove low-coverage samples.

Of samples in Cohort-I which passed read count threshold, 383/377 TCR α/β samples were from healthy donors (SARS-CoV-2 PCR test negative or obtained prior to pandemic) and 890/848 were from COVID-19-positive patients. The majority of samples were accompanied by information on HLA class I and II alleles. Samples were prepared and sequenced in nine batches.

The metadata for both TCR alpha and beta repertoires contains the following information:

sequencing_date - date when seguencing was performed
batch_name - one of the 9 unique batch identifiers
sample_id, patient_id - information on sample identifier and donor identifier
COVID_status, COVID_IgG, COVID_IgM, COVID_PCR - information on COVID-19 status
HLA-A.1, HLA-A.2, HLA-B.1, HLA-B.2, HLA-C.1, HLA-C.2 - MHC class I alleles
HLA-DPB1.1, HLA-DPB1.2, HLA-DQB1.1, HLA-DQB1.2, HLA-DRB1.1, HLA-DRB1.2 - MHC class II alleles
file_name - name of the corresponding file in fmba_clonotype_usage_tables.zip archive

Each file in fmba_clonotype_usage_tables.zip archive stores the information on either TCR alpha or beta repertoire. Each line in a file corresponds to the unique clonotype and each clonotype is accompanied with the following information:

count - number of reads where the clonotype was detected
freq - count of reads with the clonotype divided by thw whole number of reads in a sample
cdr3nt, cdr3aa - nucleotide and amino acid sequences of TCR's CDR3 sequence
v, d, j - the V/D/J segment name which was used for the clonotype's rearrangement
VEnd, DStart, DEnd, JStart - information on VDJ junction positions

We proceed with selecting a set of CDR3 sequences that can serve as biomarkers and form a feature list for COVID-19 status classifier. We also validate the resulting set of clonotypes in several ways. Co-occurence of specific TCR α and β clonotypes can serve as an independent validation for biomarkers and their co-association with some specific pathogen. Additional information on donor HLAs is provided to filter the set of biomarkers based on HLA restriction: association with donor HLA serves as an additional evidence for TCR specificity to a specific set of antigens presented in a given donor and allows detecting the fingerprint of past and present infection. Furthermore, clonotypes with similar sequences can be aggregated into 'metaclonotype' biomarkers based on clonotype graph analysis.

Finally, we train various COVID-19 status classifiers on selected batches from Cohort-I data using different algorithms and incorporating different feature sets. Verification of the robustness of our results was performed using independent batches of the Cohort-I and data from Cohort-II published previously.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Repository URL: https://github.com/antigenomics/tcr-covid-classifier
Programming language: Python

	All versions	This version
Views	683	62
Downloads	276	15
Data volume	54.0 GB	1.4 kB

Robust detection of SARS-CoV-2 exposure in population using T-cell repertoire profiling

Authors/Creators

Description

Files

Restricted

Additional details

Software