Published August 22, 2023 | Version v1
Dataset Open

cg2all datasets

Creators

  • 1. Michigan State University

Description

Training/validation/test sets for the cg2all development.

In the "set.tar.gz" file, there are multiple files containing lists of PDB IDs. 

  • targets.train.pdb.6k: training set, pdb.6k
  • targets.train.pdb.29k: training set, pdb.29k
  • targets.valid.pdb: validation set for both pdb.6k and pdb.29k
  • targets.test.pdb: test set for both pdb.6k and pdb.29k

In each "tgz" file, there are two subdirectories: original and augment. In "original" directory, there are PDB files, which are curated and cleaned up using process_pdb.py script. In "augment" directory, there are the same set of PDB files with different atomic coordinates. They were used to augment the original training data set in terms of sidechain's rotamer states. For the details about the augmentation procedure, please refer to our paper

Files

Files (8.7 GB)

Name Size Download all
md5:70404c21e2b2e257fa4a6a353936bd80
7.1 GB Download
md5:c199453c5ec6c3b62bfccc7f779bb4de
1.5 GB Download
md5:9f751c725ff3a046f76345b6e2afc6f7
212.1 kB Download