Published November 25, 2025 | Version v2
Dataset Open

MotifLeadDB v1

Contributors

Data manager:

Description

MotifLeadDB is centered on a strict Core dataset containing 342,489 modeled receptor–ligand complex structures across 357 protein targets. To broaden target diversity beyond the Core set, we additionally included a diverse-only extension, yielding a total of 378,918 modeled receptor–ligand complexes across 396 targets in the full Diverse dataset.

 

Structural models are organized hierarchically by receptor template, ligand scaffold, and model confidence level, and are provided in three confidence levels (Level 1–3), defined by scaffold alignment accuracy and pharmacophore conservation.

 

The dataset hierarchy is as follows:

  • Diverse: complete dataset containing all released entries, including the Core set and an additional diverse-only extension introduced to broaden target diversity (396 targets; 378,918 entries).

  • Core: subset retaining only entries supported by mutation-free templates (357 targets; 342,489 entries).

  • Core-NR: subset of Core further restricted to non-redundant ligand assignments (357 targets; 97,173 entries).

  • Core-NR-Act: subset of Core-NR further restricted to scaffold groups with the same activity type and a minimum within-group pActivity range of 0.2 (357 targets; 93,995 entries). Activity-specific tables are distributed separately for pKi, pIC50, and pKd.

  • HC-Core: subset of Core-NR-Act retaining only confidence Level 1 models (342 targets; 61,223 entries). High-confidence activity-specific tables are likewise distributed separately for pKi, pIC50, and pKd.

 

The dataset includes branch-level tables (core.csv, diverse_only.csv), an integrated full table (tables/diverse.csv), derived Core subset tables, branch-specific structure archives, and metadata tables summarizing model-, ligand-, and target-level annotations.

 

Each entry is annotated with structural quality metrics, aggregated BindingDB activity values, scaffold grouping, template quality annotations, and dataset subset labels. Template consistency was further examined using SIFTS-based pocket mapping, and ActivityDB-based annotations are also provided where available.

 

Detailed descriptions of file organization, subset definitions, and CSV column schema are provided in the accompanying README.md.

Files

README.md

Files (32.2 GB)

Name Size Download all
md5:9ff570692470d98ec051e735b34e2b7f
28.9 MB Preview Download
md5:01ecf2171a71aa1c41ff6a794211d790
29.0 GB Preview Download
md5:29b545fbef45517540b92ccb15f88095
3.2 GB Preview Download
md5:55aa744d039d26c38c769820c6e12681
13.0 kB Preview Download
md5:9167a3d8f46dc28badccd6814daa77e7
40.7 MB Preview Download