Published April 2, 2024 | Version v1
Dataset Open

Cryo-EM and X-ray crystallography ligands represented as 3D voxel grids for training deep learning models

Description

Ligand datasets used to train and evaluate the models studied in "Ligand Identification using Deep Learning" by Karolczak, J. et al.

The blobs_full.tar.gz and cryoem_blobs.zip files contain compressed 3D numpy arrays (*.npz) of all the ligand blobs extracted from X-ray and cryo-EM PDB deposits prior to quality filtering. The npz file names correspond to the PDB ID, chain, residue number, and ligand name of the extracted blob. The cmb_data.csv file contains the tabular data used to train the CheckMyBlob model. The X-ray data were later divided into training and testing subsets according to the xray_train.csv and xray_holdout.csv files, respectively. The ligand_mapping.csv file contains the mapping from ligand IDs to ligand group names. Finally, the cryoem_qscores.csv file contains Q-scores that were used to filter cryo-EM ligands.

Files

cryoem_blobs.zip

Files (19.4 GB)

Name Size Download all
md5:cf0851e7722848e487ab7a92afcaab37
12.0 GB Download
md5:13c1c0f23115a15ad9ae973678d34f02
508.8 MB Preview Download
md5:44194c24a72631f590a24c3699a395c9
6.9 GB Preview Download
md5:cd82e1b1b1668ec17816a75db0de3cc3
13.9 MB Preview Download
md5:a04d4273f51d7311213e0c667cec467b
470.7 kB Preview Download
md5:7f4a9b05273cc0d388a2fac4160f1feb
6.7 MB Preview Download
md5:ea40b376028fcac9fac38a23b45b4ecb
15.6 MB Preview Download