Published December 13, 2023 | Version v1.0.0
Dataset Open

ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries – Data and Models

Description

This repository contains data and models used in the following paper.

 

Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V., & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. In review.

 

The data and models are meant to be used with the ADMET-AI code, which runs the ADMET-AI web server at admet.ai.greenstonebio.com.

 

The data.zip file has the following structure.

data

  drugbank: Contains files with drugs from the DrugBank that have received regulatory approval. drugbank_approved.csv contains the full set of approved drugs along with ADMET-AI predictions, while the other files contain subsets of these molecules used for testing the speed of ADMET prediction tools.

  tdc_admet_all: Contains the data (.csv files) and RDKit features (.npz files) for all 41 single-task ADMET datasets from the Therapeutics Data Commons (TDC).

  tdc_admet_multitask: Contains the data (.csv files) and RDKit features (.npz files) for the two multi-task datasets (one regression and one classification) constructed by combining the tdc_admet_all datasets.

  tdc_admet_all.csv: A CSV file containing all 41 ADMET datasets from tdc_admet_all. This can be used to easily look up all ADMET properties for a given molecule in the TDC.

  tdc_admet_group: Contains the data (.csv files) and RDKit features (.npz files) for the 22 TDC ADMET Benchmark Group datasets with five splits per dataset.

  tdc_admet_group_raw: Contains the raw data (.csv files) used to construct the five splits per dataset in tdc_admet_group.

 

The models.zip file has the following structure. Note that the ADMET-AI website and Python package use the multi-task Chemprop-RDKit models below.

models

  tdc_admet_all: Contains Chemprop and Chemprop-RDKit models trained on all 41 single-task TDC ADMET datasets.

  tdc_admet_all_multitask: Contains Chemprop and Chemprop-RDKit models trained on the two multi-task TDC ADMET datasets (one regression and one classification).

  tdc_admet_group: Contains Chemprop and Chemprop-RDKit models trained on the 22 TDC ADMET Benchmark Group datasets.

Files

data.zip

Files (1.2 GB)

Name Size Download all
md5:deb0fc17f96d43dc41d5bde3f7224e2f
164.7 MB Preview Download
md5:8d00e7fd9b10e340201045f360c8c040
986.9 MB Preview Download