Dataset Open Access
Marco Giulini;
Raffaello Potestio
README file to the project files provided as supporting information to the manuscript “A deep learning approach to the structural analysis of proteins”
Dec. 30, 2018
Authors: Marco Giulini and Raffaello Potestio
==================================
The dataset contains the following files:
- datasets.zip: archive containing five .csv files, namely:
- decoys_cm.csv : all the data for 10728 protein decoys, training set
- evaluation_cm.csv : all data for 146 proteins in the evaluation set
- random_CG.csv : 1200 Coulomb matrices. 100 CG models for each protein with 120 amino acids
- 1e5g_centered_sphere.csv : 100 CG models in which the central atoms in 1e5g are not removed
- 1e5g_random_sphere.csv : 10 CG models for 10 different (random) locations for the sphere that includes atoms that have to be retained. 100 CG models in total
- decoys_labels.lab containing the labels associated to the 10728 decoys present in the training set
- evaluation_labels.lab containing the labels associated to the 146 pdb files in the evaluation set
- random_CG_labels.lab containing the labels associated to the 6 proteins with 120 amino acids
- network_development_training: a python script that performs cross validation and full training of the model
- saved_networks.zip FOLDER containing 10 networks: the architecture is included in .json files while weight parameters are inside .hs files
- pdb_files.zip FOLDER containing the PDB files that have been employed in the project, namely:
- pdb_files_len100 : pdb files with 100 amino acids
- pdb_files_len101-110 : pdb files with a number of amino acids between 101 and 110
- decoys : decoys of length 100 extracted from the above folder: name syntax == PDBNAME_decoy_STARTRES_ENDRES.pdb
EXAMPLE 6gsp.pdb will give rise to 6gsp_decoy_0_100.pdb , 6gsp_decoy_1_101.pdb , 6gsp_decoy_2_102.pdb , 6gsp_decoy_3_103.pdb , 6gsp_decoy_4_104.pdb
- pdb_files_len100 : 6 pdb files with 120 amino acids
Name | Size | |
---|---|---|
datasets.zip
md5:63ccc5da9d9c63f2cea8b80e9bee9d32 |
225.9 MB | Download |
decoys_labels.lab
md5:a8239ecde778d4a53440b88a9f77cb1b |
1.7 MB | Download |
evaluation_labels.lab
md5:0a76b791400b6357f2873c98cc1f85f5 |
38.3 kB | Download |
lowest_t16eig.lab
md5:6816b6726b86c22e975f95d4a9d46ed2 |
3.2 MB | Download |
lowest_v16eig.lab
md5:6b7c234e432645d978859aba8dcc11b0 |
41.9 kB | Download |
network_development_training.py
md5:57879a9ba89383c4376802c034bc3bd6 |
7.8 kB | Download |
pdb_files.zip
md5:109c5179d7299a7eb96cad6299b39db5 |
476.0 MB | Download |
random_CG_labels.lab
md5:bad2ca671082d61dfc39b306dd3adec9 |
78.1 kB | Download |
saved_networks.zip
md5:a5e89ad018d0469a3acfc7ea8c93c136 |
53.7 MB | Download |
M. Giulini and R. Potestio, A deep learning approach to the structural analysis of proteins, Interface Focus 9 (2019) http://doi.org/10.1098/rsfs.2019.0003
All versions | This version | |
---|---|---|
Views | 133 | 133 |
Downloads | 178 | 178 |
Data volume | 22.2 GB | 22.2 GB |
Unique views | 125 | 125 |
Unique downloads | 50 | 50 |