Published December 19, 2024 | Version v4
Dataset Open

Geometric deep learning improves generalizability of MHC-bound peptide predictions

  • 1. ROR icon Radboud University Medical Center
  • 2. Netherlands eScience Center
  • 1. ROR icon University of Amsterdam
  • 2. ROR icon Radboud University Medical Center

Description

Full dataset and trained models from the manuscript "Geometric deep learning improves generalizability of MHC-bound peptide predictions".

"outputs_and-BA_data.zip" contains the networks' outputs for each cross-validation experiment and a "full_dataset.csv" containing the initial BA data.
Note: this file has been updated (2024/11/26) due to errors in generating some of the previous csvs. In the earlier version, both MLP and CNN outputs reported were wrong. The correct values are now reported in the updated csvs.

"trained_models.zip" contains all the trained models parameters

"propedia_ssl.zip" contains all the 3D models from propedia used to train the 3D-SSL

"pdb.zip" contains 3D models generated in PANDORA and used to train CNN, GNN and EGNN. It amounts to 145665 .pdb files, one for each human binding affinity entry from the initial dataset from O'Donnell et al. The list of entries used to actually train networks after filtering can be found in outputs_and-BA_data.zip", in the "full_dataset.csv" file. 

 

CHANGELOG v4:

- In outputs_and-BA-data.zip, updated CNN_AlleleClustered_test_crossval.csv and CNN_shuffled_test_crossval.csv. These file had the wrong IDs paired with the network outputs.The IDs and labels are now consistent with the outputs.

- Updated reference from the preprint to the published article. 

 

Files

outputs_and_BA_data.zip

Files (4.8 GB)

Name Size Download all
md5:e7965eb25239345efe0bc914ca984a1c
8.2 MB Preview Download
md5:33f00da419c27ce4f930990d3c25b64c
4.3 GB Preview Download
md5:a8ae74dea860cddce707e20b4ba762df
110.3 MB Preview Download
md5:248b5bd7a6b92b48c1e59cb1a44e8527
447.3 MB Preview Download

Additional details

Software

Repository URL
https://github.com/DeepRank/3D-Vac
Programming language
Python

References

  • Marzella, D., Crocioni G. et al.Geometric deep learning improves generalizability of MHC-bound peptide predictions. Communications Biology https://doi.org/10.1038/s42003-024-07292-1 (2024).
  • O'Donnell, T. J., Rubinsteyn, A. & Laserson, U. MHCflurry 2.0: Improved Pan-Allele Prediction of MHC Class I-Presented Peptides by Incorporating Antigen Processing. Cell Syst. 11, 42–48.e7 (2020).