Published February 22, 2018 | Version 1.0
Dataset Open

A large data-set of CASP protein refinement simulations for machine-learning

  • 1. The Francis Crick Institute

Contributors

Contact person:

  • 1. The Francis Crick Institute

Description

The uploaded trajectory data originates from our own laboratory's refinement method in CASP11 and CASP12 for which the reference crystal structure is available in the PDB. In total the trajectory data consists of  904 trajectories with 3419 ns cumulative simulation time and 1,709,704 snapshots with a delta t =2 ps from 42 different protein systems.

File Overview

  • trajectory_data_pdbs.tar.gz : contains the PDB files of the different trajectories as well as the starting model and reference crystal structure for each target
  • casp_normalized_all_data_final.csv.gz :  contains the trajectory features calculated for each snapshot from the trajectory PDBs
  • cv_folds.csv : contains the 7 fold cross-validation assignment used to assess the performance of the model
     

 

 

 

Notes

The work was supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001003), the UK Medical Research Council (FC001003), and the Wellcome Trust (FC001003). Partial funding for this project was received from the Engineering and Physical Sciences Research Council (EP/R512667/1). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research.

Files

cv_folds.csv

Files (34.0 GB)

Name Size Download all
md5:fc06ce114a9d639effdeefde9483a023
654.1 MB Download
md5:743aeee87eb1ac442c041df7315dbde8
24.0 kB Preview Download
md5:7702f0b21f54d98fa9f478336d7a37b6
33.4 GB Download

Additional details

Funding

Wellcome Trust
Other FC001003