Published December 1, 2023 | Version v1
Dataset Open

Exploring Data-Driven Chemical SMILES Tokenization Approaches to Identify Key Protein-Ligand Binding Moieties

Description

This repository contains materials for the paper, "Exploring Data-Driven Chemical SMILES Tokenization Approaches to Identify Key Protein-Ligand Binding Moieties", published in Molecular Informatics.

`data.zip` contains vocabulary and dataset files for identifying chemical vocabularies and key chemical words associated with protein ligand binding.  

`results.zip` comprises outputs specific to vocabularies and datasets, as well as various related statistics.

 

 

Files

data.zip

Files (588.9 MB)

Name Size Download all
md5:87b210de403e5654cdabb17fa4260e02
477.7 MB Preview Download
md5:1cd87b0c6b54a4ecfb3a6682a00b9c1a
111.1 MB Preview Download

Additional details

Related works

Is published in
Publication: 10.1002/minf.202300249 (DOI)
Is supplement to
Publication: arXiv:2210.14642 (arXiv)