CheckMyBlob evaluation data set (CL)
Authors/Creators
- 1. Center for Biocrystallographic Research, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, 61-714, Poland
- 2. Institute of Computing Science, Poznan University of Technology, Poznan, 60-965, Poland
- 3. Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22901, USA
- 4. Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, 61-614, Poland
Description
A data set of ligands used to evaluate the CheckMyBlob method, described in the Kowiel et al. paper "Automatic recognition of ligands in electron density by machine learning methods".
This data set repeats the setup used in the study of Carolan & Lamzin titled "Automated identification of crystallographic ligands using sparse-density representations". It consists of ligands from X-ray diffraction experiments with 1.0–2.5 Å resolution. Adjacent PDB ligands were not connected. Ligands were labeled according to the PDB naming convention. The data set was limited to the 82 ligand types listed by Carolan & Lamzin. The resulting data set consists of 121,360 examples with ligand counts ranging from 42,622 examples for SO4 to 16 for SPO (spheroidene).
For machine learning (classification) purposes, the target attribute is: res_name.
Files
cl.csv
Files
(102.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b6c5af549ec6a4f8ff8120d92525278e
|
102.9 MB | Preview Download |