Published April 29, 2020
| Version v1
Dataset
Open
Project files provided as supporting information to the manuscript "An information theory-based approach for optimal model reduction of biomolecules"
Description
The dataset contains the following files:
- - adenylate.zip
- - antitrypsin.zip
- - tamapin.zip
- - analysis_notebooks.zip
Each of these refers to one of three proteins. For each CG sites number N, each compressed folder contains the following files:
- random mappings (random_mappings_${N}.txt)
- random mapping entropies (random_smaps_${N}.txt) [fig1]
- optimal mappings (lowest_mappings_${N}.txt) [fig3, fig4, figS2]
- optimal mapping entropies (lowest_smaps_${N}.txt) [fig1]
- pdb files with conservations probabilities in the beta factor column (${N}_probs.pdb) [fig4, figs2]
- SASA values (${protein_name}_SASA_residues.xvg
- transition mapping entropies (${protein_name}_transition_smaps.txt) [fig2]
- additional transition mapping entropies (${protein_name}_transition_smaps*) [figs3]
The file analysis_notebooks.zip contains the python3 notebooks employed to perform all the analysis present in the paper:
- paper_analysis_adenylate.ipynb
- paper_analysis_antitrypsin.ipynb
- paper_analysis_tamapin.ipynb
Packages required for the usage of these python 3 scripts:
- numpy
- pandas
- matplotlib
- seaborn