Generative AI for designing and validating easily synthesizable and structurally novel antibiotics: Data and Models
Authors/Creators
Description
This repository contains data and models used in the following paper.
Swanson, K., Liu, G., Catacutan, D., Zou, J. & Stokes, J. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nature Machine Intelligence, 2024.
The data and models are meant to be used with the SyntheMol code. More details about how to use the data and models with the code are available here.
The Data.zip file has the following structure. Note that the numbers for the Data subdirectories correspond to the supplementary data numbers in the paper (e.g., 1_training_data corresponds to Supplementary Data 1).
Data
1_training_data: The Acinetobacter baumannii inhibition data used to train antibiotic property prediction models.
2_chembl: Known antibiotic and antibacterial molecules from ChEMBL, which are used to compute the novelty of generated antibiotic candidates.
4_real_space: Data files and statistics for the Enamine REAL Space. The molecular building blocks file is version 2021 q3-4 while all other REAL Space details are computed from the full enumerated REAL space version 2022 q1-2 (downloaded on August 30, 2022).
5_generations_clogp: Compounds generated by SyntheMol using Chemprop models trained to predict cLogP.
6_generations_chemprop: Compounds generated by SyntheMol using Chemprop models trained to predict A. baumannii inhibition.
7_generations_chemprop_rdkit: Compounds generated by SyntheMol using Chemprop-RDKit models trained to predict A. baumannii inhibition.
8_generations_random_forest: Compounds generated by SyntheMol using random forest models trained to predict A. baumannii inhibition.
9_synthesized: Information on the 58 SyntheMol-generated compounds that were successfully synthesized by Enamine.
The Models.zip file contains one folder for each model used in the paper. Note that each model is technically an ensemble of ten individual models, so each directory contains ten model files.
Files
Data.zip
Additional details
Related works
- Is published in
- Journal article: 10.1038/s42256-024-00809-7 (DOI)