Published April 11, 2025 | Version v1
Dataset Open

Processed training dataset for finetuning pocket-based molecular generation task in Token-Mol 1.0

Creators

  • 1. ROR icon Zhejiang University

Description

Processed CrossDocked2020 dataset for training pocket-based molecular generation task in Token-Mol 1.0. 

`protein_represent.pkl` is the representation embedding of protein pocket encoded with ResGen encoder.

`mol_input.pkl` is the corresponding ligands to the pocket in the training set presented as SMILES strings, which haved been tokenized.

Files

Files (4.6 GB)

Name Size Download all
md5:382ac7e37a84acbe332df6319e13f82f
11.1 MB Download
md5:d6b84383566307b3903003844106c65c
4.5 GB Download