Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input
Authors/Creators
- 1. Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
- 2. Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
- 3. Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA
- 4. Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
Description
This dataset supports SMARTBind (Small Molecule Approaches to RNA Targeting Binder Discovery), a structure-agnostic ligand discovery framework that combines an RNA large language model with contrastive learning and a ligand-specific decoy enhancement strategy. Please cite the following publication when using the dataset:
Jiang, Shiyu, Amirhossein Taghavi, Tenghui Wang, Kisu Sung, Samantha M. Meyer, Noah A. Springer, Jinhang Wei, Jessica L. Childs-Disney, Chenglong Li, Mattew D. Disney, and Yanjun Li. "Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input." bioRxiv (2025): 2025-09.
Overview
The dataset contains model checkpoint and training data for SMARTBind including RNAmigos1 and HARIBOSS 10-fold random-split, HARIBOSS 5-fold cross-validation splits. All data is organized under the archive file SMARTBind_dataset.zip.
Contents
SMARTBind_weight.zip: saved checkpoint of the 10-fold SMARTBind model for binding score and binding site prediction.
10-fold cross-validation.zip: the folder contains HARIBOSS 10-fold random split dataset and RNAmigos1 10-fold rnadom split dataset.
5-fold cross-validations.zip: the folder contains five HARIBOSS 5-fold cross-validation splits, including RNA sequence-based, RNA structure-based, RNA pocket-based, pair-based, ligand-existence splittings.
5-fold cross-validations gbsubset.zip: the folder contains five HARIBOSS 5-fold cross-validation splits (GerNA-Bind subset), including RNA sequence-based, RNA structure-based, RNA pocket-based, pair-based, ligand-existence splittings.
Chemspace_Screening_Compounds_SMlLES_sampled1M.smi.zip: the virtual screening background library used for benchmarking time-dependent test set.
Decoy library.smi.zip: a chemical diverse decoy library with 92,626 entries that is curated for the ligand-specific decoy enhancement strategy.
decoys.zip: DecoyFinder- and DeepCoy-generated decoy datasets for virtual screening evaluation.
Files
SMARTBind_dataset.zip
Files
(2.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:7dab08d97433e23c72f462257ce9c1ee
|
2.6 GB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2025.09.24.678312 (DOI)