Published March 26, 2025 | Version v3
Dataset Open

PRESCOTT/ESCOTT/iGEMME mutational effect predictions of all single point mutations for ~3000 proteins

  • 1. ROR icon Sorbonne Université
  • 2. ROR icon Centre International de Recherche en Infectiologie

Description

This dataset contains all necessary data to reproduce our analyses on ~3000 proteins:

It is made up of 4 compressed datasets. 

  1. All colabfold MSAs and structures for ~3000 proteins: colabfold-sequences-structures-3000-proteins.tar.bz2
  2. All escott prediction: escott-v-1-6-0-max-two-components-colabfold-msas-entire-single-point-mutations-cvRC7.tgz
  3. All igemme predictions: escott-v-1-6-0-tjet-only-colabfold-msas-entire-single-point-mutations.tgz
  4. All gnomad v4.0.0 csv files used for prescott predictions: gnomadv4-0-0-csv-files.tgz

The dataset contains list of 500 proteins used in determining PRESCOTT coefficients in TRAINING-all_gene_names_v4_set1_no_acmg.txt. 

Furthermore, the list of 1883 proteins used to measure for testing purpose is given in TESTING-all_gene_names_v4_set1_with_acmg.txt. 

Files

TESTING-all_gene_names_v4_set1_with_acmg.txt

Files (40.5 GB)

Additional details

Related works

Is published in
Publication: 10.1186/s13059-025-03581-y (DOI)