Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function
- 1. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; University Pompeu Fabra (UPF), Barcelona, Spain
- 2. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- 3. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; University Pompeu Fabra (UPF), Barcelona, Spain; Institució Catalana de Recerca i estudis Avançats (ICREA), Barcelona, Spain; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Description
Datasets for "Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function ". Code is available at: https://github.com/lehner-lab/deep_indel_mutagenesis.
Amino acid insertions and deletions (indels) are an abundant class of genetic variants. However, compared to substitutions, the effects of indels on protein stability are not well understood and are poorly predicted. To better understand indels here we analyze new and existing large-scale deep indel mutagenesis (DIM) of structurally diverse proteins. The effects of indels on protein stability vary extensively among and within proteins and are not well predicted by existing computational methods. To address this shortcoming we present INDELi, a series of models that combine experimental or predicted substitution effects and secondary structure information to provide good prediction of the effects of indels on both protein stability and pathogenicity. Moreover, quantifying the effects of indels on protein-protein interactions suggests that insertions can be an important class of gain-of-function variants. Our results provide an overview of the impact of indels on proteins and a method to predict their effects genome-wide.
additional_files.zip
additional_dfs.rds
tsuboyama_sec_struc.rds
DiMSum.zip
DiMSum output in form of RData:
aPCA_domains_fitness_replicates.RData
aPCA_domains_variant_data_merge.RData
grb2_bind_fitness_replicates.RData
grb2_fold_fitness_replicates.RData
pdz3_bind_fitness_replicates.RData
pdz3_fold_fitness_replicates.RData
Input files to reproduce the DiMSum run. Please donwload the raw fastq at Gene Expression Omnibus under the
accession number GSE244096.:
aPCA_DiMSum_run.sh
aPCA_experimental_design_file.txt
aPCA_synonymSequencePath
aPCA_VariantIdentity
aPCA_grb2sh3_DiMSum_run.sh
aPCA_grb2sh3_experimental_design_file.txt
aPCA_grb2sh3_VariantIdentity.txt
aPCA_psd95pdz3_DiMSum_run.sh
aPCA_psd95pdz3_experimental_design_file.txt
aPCA_psd95pdz3_VariantIdentity.txt
bPCA_grb2sh3_DiMSum_run.sh
bPCA_grb2sh3_experimental_design_file.txt
bPCA_grb2sh3_VariantIdentity.txt
bPCA_psd95pdz3_DiMSum_run.sh
bPCA_psd95pdz3_experimental_design_file.txt
bPCA_psd95pdz3_VariantIdentity.txt
pre_processed_data.zip
color_scale.rds
scaled_variants_aPCA.rds
scaled_variants_bPCA.rds
tsuboyama_nat_doms_all.rds
indel_prediction_models.zip
ddmut_prediction_mean.rds
GEMME_pred.rds
ddG_ml_encoded.rds
ddG_insertions_models.R
ddG_deletions_models.R
genome_wide_prediction.zip
ddG_ml_encoded.rds
ClinVar_benchmark --> ClinVar 1aa indels for benchmark
INDELiE_predictions_human_proteome --> INDELi-E predictions for 1aa deletions and insertions across the human proteome
ESM1v_human_proteome --> average ESM-1v substitution per position across the human proteome
Files
additional_files.zip
Files
(307.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d91ea28e0ca7d8adcd09d4f723e64e09
|
27.3 MB | Preview Download |
|
md5:70c228cceff384a2255c1bb602ebed06
|
2.2 MB | Preview Download |
|
md5:13a79753800ac04257d2c31c86122fc9
|
271.7 MB | Preview Download |
|
md5:9526fb12872bce93412a3921165fa5bd
|
2.3 MB | Preview Download |
|
md5:93f8b18997db172363441f77d26a4f5a
|
3.5 MB | Preview Download |
Additional details
Dates
- Updated
-
2024-03