Published August 22, 2023
| Version v1
Dataset
Open
cg2all datasets
Description
Training/validation/test sets for the cg2all development.
In the "set.tar.gz" file, there are multiple files containing lists of PDB IDs.
- targets.train.pdb.6k: training set, pdb.6k
- targets.train.pdb.29k: training set, pdb.29k
- targets.valid.pdb: validation set for both pdb.6k and pdb.29k
- targets.test.pdb: test set for both pdb.6k and pdb.29k
In each "tgz" file, there are two subdirectories: original and augment. In "original" directory, there are PDB files, which are curated and cleaned up using process_pdb.py script. In "augment" directory, there are the same set of PDB files with different atomic coordinates. They were used to augment the original training data set in terms of sidechain's rotamer states. For the details about the augmentation procedure, please refer to our paper.
Files
Files
(8.7 GB)
Name | Size | Download all |
---|---|---|
md5:70404c21e2b2e257fa4a6a353936bd80
|
7.1 GB | Download |
md5:c199453c5ec6c3b62bfccc7f779bb4de
|
1.5 GB | Download |
md5:9f751c725ff3a046f76345b6e2afc6f7
|
212.1 kB | Download |