Published December 2, 2023
| Version 5
Computational notebook
Open
Conformational ensembles of the human IDRome
Creators
- 1. University of Copenhagen
Description
This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome (DOI 10.1038/s41586-023-07004-5).
conformational_ensembles.zip contains simulation trajectories and time series of conformational properties for all the 28,058 IDRs in the pLDDT-based set (also available at sid.erda.dk/sharelink/AVZAJvJnCO).
_2023_Tesei_IDRome-5.zip is a copy of github.com/KULL-Centre/_2023_Tesei_IDRome/tree/v5, which includes the following files and folders:
- CSV file IDRome_DB.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the pLDDT-based set
- IDRLab.ipynb: Notebooks on Google Colab to generate conformational ensembles of user-supplied sequences using the CALVADOS model
- IDR_SVR_predictor.ipynb: Notebooks on Google Colab to predict scaling exponents and conformational entropies per residue using the SVR models
- CSV file IDRome_DB_SPOT.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the SPOT-based set
- seq_conf_prop.ipynb reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7
- go_analysis.ipynb reproduces Fig. 2
- conservation_analysis.ipynb reproduces Fig. 4
- clinvar_fmug.ipynb reproduces Fig. 5 and Extended Data Fig. 9
- uniprot_domains.ipynb reproduces Extended Data Fig. 1
- svr_models.ipynb reproduces Extended Data Fig. 8
- go_uniprot_calls.ipynb performs API calls to obtain gene ontology terms from UniProt
- calc_seq_prop.ipynb and calc_seq_prop_SPOT.ipynb compute sequence descriptors and generate the IDRome_DB.csv and IDRome_DB_SPOT.csv files
- CALVADOS_tests.ipynb reproduces Extended Data Fig. 3
- AF2_PAEs.ipynb reproduces Extended Data Fig. 4
- CD-CODE.ipynb reproduces Extended Data Fig. 6a-d
- md_simulations/ contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azplugins (see github.com/KULL-Centre/_2023_Tesei_IDRome/README.md for installation instructions)
- idr_selection/ contains code and data to generate the pLDDT-based and SPOT-based sets of IDRs
- idr_orthologs/ contains code and data to generate the set of orthologs of human IDRs
- svr_models/ contains scikit-learn SVR models generated in svr_models.ipynb
- zscores/ contains code and data to calculate NARDINI z-scores
- go_analyses/ contains input and output data related to the Gene Ontology analyses in go_analysis.ipynb
- QCDPred/ contains code and data related to QCD calculations
- clinvar_fmug_cdcode/ contains code and data related to the analysis of ClinVar, FMUG, and CD-CODE databases
Files
_2023_Tesei_IDRome-5.zip
Files
(31.6 GB)
Name | Size | Download all |
---|---|---|
md5:e22c5db574609445abb8cda497fd756b
|
484.3 MB | Preview Download |
md5:219aacf6da094854e05e0bdaa18e7d70
|
31.1 GB | Preview Download |
Additional details
Related works
- Is new version of
- Preprint: 10.1101/2023.05.08.539815 (DOI)
- Is supplement to
- Computational notebook: https://github.com/KULL-Centre/_2023_Tesei_IDRome (URL)
- Journal article: 10.1038/s41586-023-07004-5 (DOI)