Published July 31, 2019 | Version v1
Dataset Open

Project files provided as supporting information to the manuscript "A deep learning approach to the structural analysis of proteins"

  • 1. University of Trento

Description

README file to the project files provided as supporting information to the manuscript “A deep learning approach to the structural analysis of proteins”

Dec. 30, 2018

Authors: Marco Giulini and Raffaello Potestio

==================================

The dataset contains the following files:

 

- datasets.zip: archive containing five .csv files, namely:

            - decoys_cm.csv : all the data for 10728 protein decoys, training set

            - evaluation_cm.csv : all data for 146 proteins in the evaluation set

            - random_CG.csv : 1200 Coulomb matrices. 100 CG models for each protein with 120 amino acids

            - 1e5g_centered_sphere.csv : 100 CG models in which the central atoms in 1e5g are not removed

            - 1e5g_random_sphere.csv : 10 CG models for 10 different (random) locations for the sphere that includes atoms that have to be retained. 100 CG models in total

 

- decoys_labels.lab containing the labels associated to the 10728 decoys present in the training set

- evaluation_labels.lab containing the labels associated to the 146 pdb files in the evaluation set

- random_CG_labels.lab containing the labels associated to the 6 proteins with 120 amino acids

- network_development_training: a python script that performs cross validation and full training of the model

- saved_networks.zip FOLDER containing 10 networks: the architecture is included in .json files while weight parameters are inside .hs files

 

- pdb_files.zip FOLDER containing the PDB files that have been employed in the project, namely:

            - pdb_files_len100 : pdb files with 100 amino acids

            - pdb_files_len101-110 : pdb files with a number of amino acids between 101 and 110

            - decoys : decoys of length 100 extracted from the above folder: name syntax == PDBNAME_decoy_STARTRES_ENDRES.pdb

                        EXAMPLE 6gsp.pdb will give rise to 6gsp_decoy_0_100.pdb , 6gsp_decoy_1_101.pdb , 6gsp_decoy_2_102.pdb , 6gsp_decoy_3_103.pdb  , 6gsp_decoy_4_104.pdb

            - pdb_files_len100 : 6 pdb files with 120 amino acids

 

Files

datasets.zip

Files (760.7 MB)

Name Size Download all
md5:63ccc5da9d9c63f2cea8b80e9bee9d32
225.9 MB Preview Download
md5:a8239ecde778d4a53440b88a9f77cb1b
1.7 MB Download
md5:0a76b791400b6357f2873c98cc1f85f5
38.3 kB Download
md5:6816b6726b86c22e975f95d4a9d46ed2
3.2 MB Download
md5:6b7c234e432645d978859aba8dcc11b0
41.9 kB Download
md5:57879a9ba89383c4376802c034bc3bd6
7.8 kB Download
md5:109c5179d7299a7eb96cad6299b39db5
476.0 MB Preview Download
md5:bad2ca671082d61dfc39b306dd3adec9
78.1 kB Download
md5:a5e89ad018d0469a3acfc7ea8c93c136
53.7 MB Preview Download

Additional details

Funding

VARIAMOLS – VAriable ResolutIon Algorithms for macroMOLecular Simulation 758588
European Commission

References