Published October 2, 2025 | Version v2
Dataset Open

Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input

  • 1. Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
  • 2. Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA
  • 3. Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
  • 4. Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA

Description

This dataset supports SMARTBind (Small Molecule Approaches to RNA Targeting Binder Discovery), a structure-agnostic ligand discovery framework that combines an RNA large language model with contrastive learning and a ligand-specific decoy enhancement strategy. Please cite the following publication when using the dataset:

Jiang, Shiyu, Amirhossein Taghavi, Tenghui Wang, Samantha M. Meyer, Jessica L. Childs-Disney, Chenglong Li, Mattew D. Disney, and Yanjun Li. "Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input." bioRxiv (2025): 2025-09.

Overview

The dataset contains model checkpoint and training data for SMARTBind including RNAmigos1 10-fold random-split, Hariboss 10-fold random-split, and Hariboss 5-fold sequence-based-split. All data is organized under the archive file SMARTBind_dataset.zip.

Contents

SMARTBind_weights.zip: Saved checkpoint of 10-fold SMARTBind model.

hariboss_merged_5fd.pkl: SMARTBind training data from the HARIBOSS database under 5-fold sequence-based-split cross-validation.

hariboss_merged_10fd.pkl: SMARTBind training data from the HARIBOSS database under 10-fold random-split cross-validation.

rnamigos_10fd.pkl: SMARTBind training data from the RNAmigos1 under 10-fold random-split cross-validation.

Decoy library.smi: a chemical diverse decoy library with 92,626 entries that is curated for the ligand-specific decoy enhancement strategy.

Files

SMARTBind_dataset.zip

Files (2.6 GB)

Name Size Download all
md5:16a35c3384a126c299b8e9d37650e796
2.6 GB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2025.09.24.678312 (DOI)