Published May 21, 2023 | Version 1.0.0
Dataset Open

CPT-1 pre-computed whole-proteome variant effect predictions and model source code

Description

Cross-protein transfer learning for variant effect prediction

This repository contains the variant effect predictions of CPT-1 for 18,602 human proteins, initially released with the manuscript "Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects". The proteins are split into three files.

CPT1_score_EVE_set.zip: Proteins in the EVE set (Frazer et al., 2021)

CPT1_score_no_EVE_set_1.zip & CPT1_score_no_EVE_set_2.zip: Proteins not in the EVE set. Predictions for these proteins use imputed values for features depending on the EVE MSA.

The protein names are UniProt gene names.

We also provide source code to train CPT-1 model and reproduce results in the manuscript :

source_code.zip (corresponds to GitHub repository songlab-cal/CPT version as of Jul 12, 2023)

 

Citation

Jagota, M.*, Ye, C.*, Albors, C., Rastogi, R., Koehl, A., Ioannidis, N., and Song, Y.S.†
"Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects", bioRxiv (2022)

*These authors contributed equally to this work.
†To whom correspondence should be addressed: yss@berkeley.edu

DOI: https://doi.org/10.1101/2022.11.15.516532

 

Files

CPT1_score_EVE_set.zip

Files (2.5 GB)

Name Size Download all
md5:3966f7b8c8f87a55e10953228e04d74f
482.9 MB Preview Download
md5:d8b1b0d4606a96e5aa6f7fc9f1d1932e
1.2 GB Preview Download
md5:6de0a3c13f6fcd6bcf7864dd4d21ead9
689.2 MB Preview Download
md5:a1d46a2b0a442677fb3450027fbe2a29
95.2 MB Preview Download