CPT-1 pre-computed whole-proteome variant effect predictions and model source code
Creators
- 1. University of California, Berkeley
Description
Cross-protein transfer learning for variant effect prediction
This repository contains the variant effect predictions of CPT-1 for 18,602 human proteins, initially released with the manuscript "Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects". The proteins are split into three files.
CPT1_score_EVE_set.zip: Proteins in the EVE set (Frazer et al., 2021)
CPT1_score_no_EVE_set_1.zip & CPT1_score_no_EVE_set_2.zip: Proteins not in the EVE set. Predictions for these proteins use imputed values for features depending on the EVE MSA.
The protein names are UniProt gene names.
We also provide source code to train CPT-1 model and reproduce results in the manuscript :
source_code.zip (corresponds to GitHub repository songlab-cal/CPT version as of Jul 12, 2023)
Citation
Jagota, M.*, Ye, C.*, Albors, C., Rastogi, R., Koehl, A., Ioannidis, N., and Song, Y.S.†
"Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects", bioRxiv (2022)
*These authors contributed equally to this work.
†To whom correspondence should be addressed: yss@berkeley.edu
DOI: https://doi.org/10.1101/2022.11.15.516532