Published December 1, 2023
| Version v1
Dataset
Open
Exploring Data-Driven Chemical SMILES Tokenization Approaches to Identify Key Protein-Ligand Binding Moieties
Creators
Description
This repository contains materials for the paper, "Exploring Data-Driven Chemical SMILES Tokenization Approaches to Identify Key Protein-Ligand Binding Moieties", published in Molecular Informatics.
`data.zip` contains vocabulary and dataset files for identifying chemical vocabularies and key chemical words associated with protein ligand binding.
`results.zip` comprises outputs specific to vocabularies and datasets, as well as various related statistics.
Files
data.zip
Files
(588.9 MB)
Name | Size | Download all |
---|---|---|
md5:87b210de403e5654cdabb17fa4260e02
|
477.7 MB | Preview Download |
md5:1cd87b0c6b54a4ecfb3a6682a00b9c1a
|
111.1 MB | Preview Download |
Additional details
Identifiers
- arXiv
- arXiv:2210.14642
- DOI
- 10.1002/minf.202300249
Related works
- Is published in
- Publication: 10.1002/minf.202300249 (DOI)
- Is supplement to
- Publication: arXiv:2210.14642 (arXiv)