Dataset Open Access

Project files provided as supporting information to the manuscript "A deep learning approach to the structural analysis of proteins"

Marco Giulini; Raffaello Potestio

README file to the project files provided as supporting information to the manuscript “A deep learning approach to the structural analysis of proteins”

Dec. 30, 2018

Authors: Marco Giulini and Raffaello Potestio

==================================

The dataset contains the following files:

 

- datasets.zip: archive containing five .csv files, namely:

            - decoys_cm.csv : all the data for 10728 protein decoys, training set

            - evaluation_cm.csv : all data for 146 proteins in the evaluation set

            - random_CG.csv : 1200 Coulomb matrices. 100 CG models for each protein with 120 amino acids

            - 1e5g_centered_sphere.csv : 100 CG models in which the central atoms in 1e5g are not removed

            - 1e5g_random_sphere.csv : 10 CG models for 10 different (random) locations for the sphere that includes atoms that have to be retained. 100 CG models in total

 

- decoys_labels.lab containing the labels associated to the 10728 decoys present in the training set

- evaluation_labels.lab containing the labels associated to the 146 pdb files in the evaluation set

- random_CG_labels.lab containing the labels associated to the 6 proteins with 120 amino acids

- network_development_training: a python script that performs cross validation and full training of the model

- saved_networks.zip FOLDER containing 10 networks: the architecture is included in .json files while weight parameters are inside .hs files

 

- pdb_files.zip FOLDER containing the PDB files that have been employed in the project, namely:

            - pdb_files_len100 : pdb files with 100 amino acids

            - pdb_files_len101-110 : pdb files with a number of amino acids between 101 and 110

            - decoys : decoys of length 100 extracted from the above folder: name syntax == PDBNAME_decoy_STARTRES_ENDRES.pdb

                        EXAMPLE 6gsp.pdb will give rise to 6gsp_decoy_0_100.pdb , 6gsp_decoy_1_101.pdb , 6gsp_decoy_2_102.pdb , 6gsp_decoy_3_103.pdb  , 6gsp_decoy_4_104.pdb

            - pdb_files_len100 : 6 pdb files with 120 amino acids

 

Files (760.7 MB)
Name Size
datasets.zip
md5:63ccc5da9d9c63f2cea8b80e9bee9d32
225.9 MB Download
decoys_labels.lab
md5:a8239ecde778d4a53440b88a9f77cb1b
1.7 MB Download
evaluation_labels.lab
md5:0a76b791400b6357f2873c98cc1f85f5
38.3 kB Download
lowest_t16eig.lab
md5:6816b6726b86c22e975f95d4a9d46ed2
3.2 MB Download
lowest_v16eig.lab
md5:6b7c234e432645d978859aba8dcc11b0
41.9 kB Download
network_development_training.py
md5:57879a9ba89383c4376802c034bc3bd6
7.8 kB Download
pdb_files.zip
md5:109c5179d7299a7eb96cad6299b39db5
476.0 MB Download
random_CG_labels.lab
md5:bad2ca671082d61dfc39b306dd3adec9
78.1 kB Download
saved_networks.zip
md5:a5e89ad018d0469a3acfc7ea8c93c136
53.7 MB Download
  • M. Giulini and R. Potestio, A deep learning approach to the structural analysis of proteins, Interface Focus 9 (2019) http://doi.org/10.1098/rsfs.2019.0003

133
178
views
downloads
All versions This version
Views 133133
Downloads 178178
Data volume 22.2 GB22.2 GB
Unique views 125125
Unique downloads 5050

Share

Cite as