Published June 25, 2024 | Version v3
Dataset Open

HLApollo: Towards designing improved cancer immunotherapy targets with a superior peptide-MHC-I presentation model

Description

Based on the success of cancer immunotherapy, personalized cancer vaccines have recently emerged as the vanguard of oncology treatment. Because antigen presentation on MHC class I (MHC-I) is key to the adaptive immune response to cancerous cells, it is critical to have highly predictive computational methods to model which peptides are presented on MHC-I. Here, we introduce HLApollo, a transformer-based model with end-to-end treatment of MHC-I sequence, deconvolution of multi-allelic data, and ligand-flanking sequences. We develop negative-set switching, a novel training strategy that greatly reduces overfitting, which is key to HLApollo’s performance, leading to increases of 20.19% and 4.1% in average precision (AP) vs. next best model on MHC-I presentation and immunogenicity, respectively. Incorporating protein features derived from protein language models yielded further gains and reduced the need for gene expression measurements. We achieve excellent pan-allelic generalization, and create a framework for estimating performance on untrained alleles. This guides the clinical use of HLApollo, where rare alleles may be observed – particularly for individuals from underrepresented ancestries. Our work uses all facets of available MHC-I data to develop a highly accurate MHC-I presentation predictor that meaningfully improves immunogenicity prediction and allelic coverage, important for clinical applications of personalized neoantigen vaccines.

Files

Source Data.zip

Files (30.9 MB)

Name Size Download all
md5:f0cac356c26b372850af0f77331e49d1
30.9 MB Preview Download

Additional details

Related works

Is supplemented by
Journal article: 10.1101/2022.12.08.519673 (DOI)