Published August 9, 2023 | Version v5
Dataset Open

A large comprehensive curated dataset of small molecules and their activities covering three cardiac ion channels: hERG, Cav1.2, and Nav1.5

  • 1. University of Antwerp
  • 2. Technical University of Munich
  • 3. University of Alberta

Description

The compressed data folder (dataset.rar) represents a data framework for researchers in the field of drug discovery to perform in depth analyses on a very large open-access unique and comprehensive hERG, Nav1.5, and Cav1.2 cardiotoxicity integrated database of small molecules and their activities. The database is organized as follows:

  • Each sub-folder represents a cardiac ion channel target: hERG, Nav1.5, and Cav1.2
  • Each target sub-folder consists of 3 files in CSV format: One file containing the development set (split into training and validation sets using an 80/20 ratio for hyperparameter tuning). The other 2 files contain external evaluation sets. The first test dataset consists of compounds with a structural similarity of no more than 60% (Tanimoto similarity  ≤ 0.6) to the remaining development set, while the second test dataset comprises compounds with a structural similarity of no more than 70% (Tanimoto similarity ≤ 0.7) to the remaining development set.
  • Each file contains data with 7 columns: "InChl Key" as a unique identifier of the chemical structure, "SMILES" as the string format of storage and exchange of the chemical structure, "Source" as the upstream data source from which the data was retrieved, "ChEMBL ID" as the ChEMBL identifier if the compound comes from ChEMBL database, "PubChem CID" as the PubChem compound identifier if the compound comes from PubChem database, "pIC50" as the negative logarithm of the half-maximal inhibitory concentration (IC50) to describe the potency of the compound, and "USED_AS" column specifying whether the compound was used for training or validation.

Upon usage, please cite this publication:

  • Issar Arab, Kristof Egghe, Kris Laukens, Ke Chen, Khaled Barakat, Wout Bittremieux, Benchmarking of Small Molecule Feature Representations for hERG, Nav1.5, and Cav1.2 Cardiotoxicity Prediction, Journal of Chemical Information and Modeling, (2023). doi:10.1021/acs.jcim.3c01301

 

Files

_upstream_source_composition.png

Files (2.5 MB)

Name Size Download all
md5:1ef6e68dd3f3ab622b06776d945ed3f0
1.8 MB Preview Download
md5:53082d0218efbb4ae00d44c0c532292e
739.0 kB Download