CheckMyBlob evaluation data set (TAMC)
Authors/Creators
- 1. Center for Biocrystallographic Research, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, 61-714, Poland
- 2. Institute of Computing Science, Poznan University of Technology, Poznan, 60-965, Poland
- 3. Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22901, USA
- 4. Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, 61-614, Poland
Description
A data set of ligands used to evaluate the CheckMyBlob method, described in the Kowiel et al. paper "Automatic recognition of ligands in electron density by machine learning methods".
This data set attempts to repeat the experimental setup from Terwilliger et al. described in "Ligand identification using electron-density map correlations". It consists of ligands from X-ray diffraction experiments with 6–150 non-H atoms. Connected PDB ligands were labeled as single alphabetically ordered strings of hetero-compound codes, whereas unknown species, water molecules, standard amino acids, and nucleotides were excluded. Finally, the data set was limited to 200 most popular ligands. The resulting data set consisted of 161,758 examples with individual ligand counts ranging from 36,535 examples for GOL (glycerol) to 114 for LMG (1,2-distearoyl-monogalactosyl-diglyceride).
For machine learning (classification) purposes, the target attribute is: res_name.
Files
tamc.csv
Files
(137.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:935150730aeb76fb183ee259b0b8cb4e
|
137.7 MB | Preview Download |