Published October 2, 2025 | Version v3
Dataset Open

Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input

  • 1. Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
  • 2. Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
  • 3. Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA
  • 4. Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA

Description

This dataset supports SMARTBind (Small Molecule Approaches to RNA Targeting Binder Discovery), a structure-agnostic ligand discovery framework that combines an RNA large language model with contrastive learning and a ligand-specific decoy enhancement strategy. Please cite the following publication when using the dataset:

Jiang, Shiyu, Amirhossein Taghavi, Tenghui Wang, Kisu Sung, Samantha M. Meyer, Noah A. Springer, Jinhang Wei, Jessica L. Childs-Disney, Chenglong Li, Mattew D. Disney, and Yanjun Li. "Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input." bioRxiv (2025): 2025-09.

Overview

The dataset contains model checkpoint and training data for SMARTBind including RNAmigos1 and HARIBOSS 10-fold random-split, HARIBOSS 5-fold cross-validation splits. All data is organized under the archive file SMARTBind_dataset.zip.

Contents

SMARTBind_weight.zip: saved checkpoint of the 10-fold SMARTBind model for binding score and binding site prediction.

10-fold cross-validation.zip: the folder contains HARIBOSS 10-fold random split dataset and RNAmigos1 10-fold rnadom split dataset.

5-fold cross-validations.zip: the folder contains five HARIBOSS 5-fold cross-validation splits, including RNA sequence-based, RNA structure-based, RNA pocket-based, pair-based, ligand-existence splittings. 

5-fold cross-validations gbsubset.zip: the folder contains five HARIBOSS 5-fold cross-validation splits (GerNA-Bind subset), including RNA sequence-based, RNA structure-based, RNA pocket-based, pair-based, ligand-existence splittings. 

Chemspace_Screening_Compounds_SMlLES_sampled1M.smi.zip: the virtual screening background library used for benchmarking time-dependent test set.

Decoy library.smi.zip: a chemical diverse decoy library with 92,626 entries that is curated for the ligand-specific decoy enhancement strategy.

decoys.zip: DecoyFinder- and DeepCoy-generated decoy datasets for virtual screening evaluation.

Files

SMARTBind_dataset.zip

Files (2.6 GB)

Name Size Download all
md5:7dab08d97433e23c72f462257ce9c1ee
2.6 GB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2025.09.24.678312 (DOI)