Highly curated hERG dataset of 8879 unique molecular compounds with corresponding potency values

Arab, Issar; Barakat, Khaled

doi:10.5281/zenodo.5807719

Published November 28, 2021 | Version v1

Dataset Open

Highly curated hERG dataset of 8879 unique molecular compounds with corresponding potency values

1. Technical University of Munich
2. University of Alberta

This dataset was built during a research project, in the field of Computer-Aided Drug Discovery (CADD), funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grant. The aim of the project was to build descriptor-based machine learning models for hERG cardiotoxicity liability predictions. The dataset includes a total of 8879 unique molecular compounds gathered from ChEMBL and PubChem publicly available bioactivity databases, as well as from literature mining. The list is split into 2 sets, 8380 for training and 499 for testing. All molecular compounds are represented in their SMILE format with their corresponding PIC50 potency values.

To access the full original work, please visit the following link: Manuscript

Note: Upon usage of this data, kindly cite the original manuscript describing the curation process:

Arab, Issar, and Khaled Barakat. "ToxTree: descriptor-based machine learning models for both hERG and Nav1. 5 cardiotoxicity liability predictions." arXiv preprint arXiv:2112.13467 (2021).

Refer to our latest manually curated and a much larger dataset here: link

Files

hERG_Dataset.csv

Files (732.2 kB)

Name	Size	Download all
hERG_Dataset.csv md5:e855a8ff1e00064b501f5a2235ee9c61	732.2 kB	Preview Download

	All versions	This version
Views	955	944
Downloads	563	556
Data volume	514.8 MB	508.2 MB

Highly curated hERG dataset of 8879 unique molecular compounds with corresponding potency values

Creators

Description

Files

hERG_Dataset.csv

Files (732.2 kB)