Geometric deep learning improves generalizability of MHC-bound peptide predictions
Creators
Contributors
Project members:
Description
Full dataset and trained models from the manuscript "Geometric deep learning improves generalizability of MHC-bound peptide predictions".
"outputs_and-BA_data.zip" contains the networks' outputs for each cross-validation experiment and a "full_dataset.csv" containing the initial BA data.
Note: this file has been updated (2024/11/26) due to errors in generating some of the previous csvs. In the earlier version, both MLP and CNN outputs reported were wrong. The correct values are now reported in the updated csvs.
"trained_models.zip" contains all the trained models parameters
"propedia_ssl.zip" contains all the 3D models from propedia used to train the 3D-SSL
"pdb.zip" contains 3D models generated in PANDORA and used to train CNN, GNN and EGNN. It amounts to 145665 .pdb files, one for each human binding affinity entry from the initial dataset from O'Donnell et al. The list of entries used to actually train networks after filtering can be found in outputs_and-BA_data.zip", in the "full_dataset.csv" file.
CHANGELOG v4:
- In outputs_and-BA-data.zip, updated CNN_AlleleClustered_test_crossval.csv and CNN_shuffled_test_crossval.csv. These file had the wrong IDs paired with the network outputs.The IDs and labels are now consistent with the outputs.
- Updated reference from the preprint to the published article.
Files
outputs_and_BA_data.zip
Additional details
Identifiers
Software
- Repository URL
- https://github.com/DeepRank/3D-Vac
- Programming language
- Python
References
- Marzella, D., Crocioni G. et al.Geometric deep learning improves generalizability of MHC-bound peptide predictions. Communications Biology https://doi.org/10.1038/s42003-024-07292-1 (2024).
- O'Donnell, T. J., Rubinsteyn, A. & Laserson, U. MHCflurry 2.0: Improved Pan-Allele Prediction of MHC Class I-Presented Peptides by Incorporating Antigen Processing. Cell Syst. 11, 42–48.e7 (2020).