Dataset Open Access

Molecular datasets from "SMILES-Based Deep Generative Scaffold Decorator for De-Novo Drug Design"

Arús-Pous, Josep; Patronov, Atanas; Bjerrum, Esben Jannik; Tyrchan, Christian; Reymond, Jean-Louis; Chen, Hongming; Engkvist, Ola

Herein find the molecular datasets from "SMILES-Based Deep Generative Scaffold Decorator for De-Novo Drug Design". These were generated with SMILES-based scaffold decorator generative models trained with two training sets (DRD2 and ChEMBL). These generative models require a partially-built molecule (scaffold) as input and output several possible completions for each scaffold. Each dataset corresponds to a model trained with the  ChEMBL or DRD2 sets, wither multi-step (ms) or single-step (ss) and the provenance of the scaffolds (validation set, or non-dataset).

The molecules generated are annotated with a set of descriptors. The DRD2 datasets have the predicted probability of each molecule to be active on DRD2 (p) obtained from a Random Forest model. The ChEMBL model's descriptors are related to the synthesizability of the molecules (see manuscript). Also, the datasets decorated from validation set scaffolds are annotated whether they are part of the validation set (in_validation).

Files (54.8 MB)
Name Size
chembl_ms_non_dataset.csv
md5:dfcf3cd2f20249c2f105a4a4d8b4d387
4.2 MB Download
chembl_ms_validation.csv
md5:59a58bd957c45bd2ee637574c7a0a1a4
10.0 MB Download
chembl_ss_non_dataset.csv
md5:1e6a7c5587344a8629a9c1e66c4cf15b
9.6 MB Download
chembl_ss_validation.csv
md5:0db11ab6b1e38e332634ab3dc1970e61
21.0 MB Download
drd2_ms_non_dataset.csv
md5:37eb8110d2bc0d258a8129602127a1ec
2.3 MB Download
drd2_ms_validation.csv
md5:5dac0a49cde988eacfe0be87c125c678
658.3 kB Download
drd2_ss_non_dataset.csv
md5:2a7f5cd5899defb4272487caccb9f6be
4.9 MB Download
drd2_ss_validation.csv
md5:ba8fa6778b482144a7b92d7d422df8a6
2.3 MB Download
43
46
views
downloads
All versions This version
Views 4343
Downloads 4646
Data volume 277.3 MB277.3 MB
Unique views 2525
Unique downloads 1515

Share

Cite as