Published August 22, 2022 | Version v2
Dataset Open

Data from: Investigating the human and non-obese diabetic mouse MHC class II immunopeptidome using protein language modelling.

  • 1. Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Basel, Switzerland
  • 2. NIBR Research Informatics, Novartis Institutes for Biomedical Research, Basel, Switzerland

Description

Background: Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large–scale data sets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in natural language processing (NLP) and protein language modelling (PLM) have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction. Methodology: We trained a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluated its performance on the peptide presentation prediction task. Data: The data and models used in AEGIS are contained in the uploaded tar files. Results: The transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance (see preprint). We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-Ag7 MHC class II molecule expressed by the non-obese diabetic (NOD) mice enabled the in silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications.

Notes

Description

Files

Files (1.8 GB)

Name Size Download all
md5:9aae2257a7e569dfdf6bc5ec33e368a3
1.3 GB Download
md5:2000849c334cc9acca428555cafa0a1f
519.0 MB Download

Additional details

Related works

Is derived from
Preprint: 10.1101/2022.08.19.504560 (DOI)