Published December 2, 2023 | Version 5
Computational notebook Open

Conformational ensembles of the human IDRome

Description

This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome (DOI 10.1038/s41586-023-07004-5).

conformational_ensembles.zip contains simulation trajectories and time series of conformational properties for all the 28,058 IDRs in the pLDDT-based set (also available at sid.erda.dk/sharelink/AVZAJvJnCO).

_2023_Tesei_IDRome-5.zip is a copy of github.com/KULL-Centre/_2023_Tesei_IDRome/tree/v5, which includes the following files and folders:

  • CSV file IDRome_DB.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the pLDDT-based set
  • IDRLab.ipynb: Notebooks on Google Colab to generate conformational ensembles of user-supplied sequences using the CALVADOS model
  • IDR_SVR_predictor.ipynb: Notebooks on Google Colab to predict scaling exponents and conformational entropies per residue using the SVR models
  • CSV file IDRome_DB_SPOT.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the SPOT-based set
  • seq_conf_prop.ipynb reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7
  • go_analysis.ipynb reproduces Fig. 2
  • conservation_analysis.ipynb reproduces Fig. 4
  • clinvar_fmug.ipynb reproduces Fig. 5 and Extended Data Fig. 9
  • uniprot_domains.ipynb reproduces Extended Data Fig. 1
  • svr_models.ipynb reproduces Extended Data Fig. 8
  • go_uniprot_calls.ipynb performs API calls to obtain gene ontology terms from UniProt
  • calc_seq_prop.ipynb and calc_seq_prop_SPOT.ipynb compute sequence descriptors and generate the IDRome_DB.csv and IDRome_DB_SPOT.csv files
  • CALVADOS_tests.ipynb reproduces Extended Data Fig. 3
  • AF2_PAEs.ipynb reproduces Extended Data Fig. 4
  • CD-CODE.ipynb reproduces Extended Data Fig. 6a-d
  • md_simulations/ contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azplugins (see github.com/KULL-Centre/_2023_Tesei_IDRome/README.md for installation instructions)
  • idr_selection/ contains code and data to generate the pLDDT-based and SPOT-based sets of IDRs
  • idr_orthologs/ contains code and data to generate the set of orthologs of human IDRs
  • svr_models/ contains scikit-learn SVR models generated in svr_models.ipynb
  • zscores/ contains code and data to calculate NARDINI z-scores
  • go_analyses/ contains input and output data related to the Gene Ontology analyses in go_analysis.ipynb
  • QCDPred/ contains code and data related to QCD calculations
  • clinvar_fmug_cdcode/ contains code and data related to the analysis of ClinVar, FMUG, and CD-CODE databases

Files

_2023_Tesei_IDRome-5.zip

Files (31.6 GB)

Name Size Download all
md5:e22c5db574609445abb8cda497fd756b
484.3 MB Preview Download
md5:219aacf6da094854e05e0bdaa18e7d70
31.1 GB Preview Download

Additional details

Related works

Is new version of
Preprint: 10.1101/2023.05.08.539815 (DOI)
Is supplement to
Computational notebook: https://github.com/KULL-Centre/_2023_Tesei_IDRome (URL)
Journal article: 10.1038/s41586-023-07004-5 (DOI)

Funding

InMIND – Intervention in Neurodegenerative disorders via Mechanistic INsight into liquid-like Droplets 101025063
European Commission