/02_processed: Contains all data generated by the code available at (https://github.com/CDDLeiden/PK-in-generative-drug-design) /A2ARDataset: Contains processed Papyrus data with A2AR bioactivity data (01_A2ARDataset/01_data_creation.py) /A2AR_dataset_substr: contains images of compounds matching a substructure removed from the dataset A2AR_dataset.tsv: contains the processed dataset ... /PKDataset: Contains processed Lombardo et al., 2018 PK data (02_PKDataset/01_data_creation.py) pk_dataset.tsv: contains the processed dataset removed_molecules.tsv: contains the molecules removed from the dataset (e.g. invalid SMILES) statstics.tsv: contains information about number of molecules per target, and distribution of PK parameters ... /QSPR: Contains prepared data and QSPR models /data: Contains prepared dataset for QSPR models (03_QSPR/01_dataset_preparation.py) /{target}_{descriptor set}_{filters}: Individual folders for each target, descriptor set, and filter combination for grid search (for chemprop {target}_chemprop) /Descriptors_{folder_name}_{descriptor set} Descriptors_{folder_name}_{descriptor set}.pkl: contains the descriptors {folder_name}_df.pkl: contains the processed dataset {folder_name}_meta.json: contains the metadata prep_settings.json: contains the settings used to prepare the data ... /models: Contains QSPR grid search results and trained models (03_QSPR/02_hyperparam_optimization.py) /{algorithm}_{target}_{descriptor set}_{filters}: Individual folders for each algorithm, target, descriptor set, and filter combination (for chemprop no filters) {folder_name}_meta.json: contains the metadata {folder_name}_results.tsv: contains the hyperparameter grid search results {folder_name}.json: scikit-learn model ONLY for best model (03_QSPR/03_model_training.py) {folder_name}.cv.tsv: contains the cross-validation predictions {folder_name}.ind.tsv: contains the test set predictions {folder_name}_feature_importance.csv: contains the permutation feature importance results /bootstrapping_{target}: Contains bootstrapping results for QSPR models (03_QSPR/03_model_training.py) /data /{target}_{descriptor set}_{filters}: Dataset with optimal hyperparameters for bootstrapping /Descriptors_{folder_name}_{descriptor set} Descriptors_{folder_name}_{descriptor set}.pkl: contains the descriptors {folder_name}_df.pkl: contains the processed dataset {folder_name}_meta.json: contains the metadata /models /{target}_{run id} {folder_name}_meta.json: contains the metadata {folder_name}_replica.json: replica benchmark settings metadata {folder_name}_ind.tsv: contains the bootstrapping test set predictions {folder_name}.json: scikit-learn model applicability_domain_bootstrapping.tsv: bootstrapping results split by inlier/outlier results.tsv: bootstrapping results settings.json: bootstrapping settings ... /DNDD /finetuned: Contains the finetuned models for the DNDD dataset (04_DNDD/01_finetuning.py) /{scenario_name} finetuned_fit.log: contains the training loss for each batch finetuned_fit.tsv: contains train and validation loss per epoch finetuned_smiles.tsv: contains sample of generated SMILES per epoch finetuned.pkg: contains the finetuned model ligand_corpus.tsv: contains the encoded ligand corpus used for finetuning ligand_corpus.tsv.vocab: contains the vocabulary of the ligand corpus ligand_test.tsv: contains the encoded ligand test set ligand_train.tsv: contains the encoded ligand training set train_df.tsv: contains the processed training dataset ... /grid_search: Contains the grid search results for the DNDD dataset (04_DNDD/02_grid_search.py) /{hyperparameter set} {folder_name}_agent_fit.log: contains the training loss for each batch {folder_name}_agent_fit.tsv: contains training loss and average scores per epoch {folder_name}_agent_smiles.tsv: contains sample of generated SMILES per epoch {folder_name}_agent.pkg: contains the trained model {folder_name}_sample.tsv: contains the generated SMILES with trained model ... /reinforced: Contains the reinforcement learning results (04_DNDD/03_reinforcement_learning.py) /{scenario_name}_{replicate} {scenario_name}_agent_fit.log: contains the training loss for each batch {scenario_name}_agent_fit.tsv: contains training loss and average scores per epoch {scenario_name}_agent_smiles.tsv: contains sample of generated SMILES per epoch {scenario_name}_agent.pkg: contains the trained model ... /generated: Contains the generated SMILES for the DNDD dataset (04_DNDD/04_generate.py) /{scenario_name}_{replicate} generated_10000_centroids.tsv: contains cluster centroids generated_10000_stats.tsv: contains generated molecule data summary genenerated_10000.tsv: contains generated SMILES and scores, UMAP coordinates are added with (04_DNDD/05_make_umaps.py) ... /03_figures: Contains all generated 03_figures /A2ARDataset (01_A2ARDataset/02_dataset_figures.ipynb) mean_pKi_vs_pKi.png: variance of experimental A2AR pKi measurements in Papyrus data (Supplementary Figure S1) /PKDataset (02_PKDataset/02_dataset_figures.ipynb) PKDataset_Distribution_log_sqrt.png: distribution of transformed PK parameters in Lombardo et al., 2018 data (Supplementary Figure S2) PKDataset_PhysChemDescriptors.png: distribution of physicochemical descriptors in Lombardo et al., 2018 data (Supplementary Figure S8) /QSPR (03_QSPR/04_model_figures.py) /ap_domain: applicability domain figures applicability_domain_boxplot.png: Plots of the bootstrapping test set performance for test set inliers, outliers and all samples for all individual targets (Figure 3) bootstrapping_{target}_ad_r2_rmse_boxplot.png: Individual plots of bootstrapping test set performance for test set inliers, outliers and all samples /best_model_performance: model performance figures all_scatter_plots.png: scatter plots of predicted vs. experimental values for all targets (Figure 2) {model_name}_scatter.png: scatter plot of predicted vs. experimental values for individual target /feature_importance: feature importance figures {model_name}_combined_bits.png: bit features for the 10 most important features for each target (Supplementary Figure S3) {model_name}_feature_importance.png: feature importance 10 most important features for each target (Supplementary Figure S3) {model_name}_morganfp_{bit}.png: Individual bit feature images hyperparamopt_table.csv: table of hyperparameter optimization results (Supplementary Table S2) /DNDD (04_DNDD/07_dndd_figures.ipynb) /density /{replicate} {scenario_name}.png: score distribution of generated SMILES for each scenario and replicate combined_density_{replicate}_multi.png: combined score distribution density plot for all scenarios with multiple objectives (Figure 4) combined_density_{replicate}_single.png: combined score distribution density plot for all scenarios with single objectives (Supplementary Figure S6) /grid_search: figures of hyperparameter grid search (04_DNDD/06_grid_search_figures.ipynb) sample_peformance.tsv: table of model performance per hyperparameter set (Supplementary Table S3) /molecules: figures of generated molecules {scenario_name}_{replicate}.png/.svg: 2D structure of centroids from 3 largest molecules_combined_{replicate}.png: 2D structure of centroids from 3 largest clusters for all scenarios combined (Figure 6) /reinforcement_fit: figures of reinforcement learning training {scenario_name}_{replicate}.png: plot of scores of sample molecules over training epochs reinforcement_fit.png: combined plot of scores of sample molecules over training epochs for all scenarios (Supplementary Figure S7) /umap: figures of UMAP projections umap_combined.png: UMAP projection of generated molecules for all scenarios which combined A2AR and PK targets (Figure 5) umap_pk.png: UMAP projection of generated molecules for PK targets only (Supplementary Figure S7) finetuning_fit.png: plot of training loss for finetuning (Supplementary Figure S4) generated_stats.csv: statistics of generated molecules (Table 2) physchem_properties_{replicate}.png: distribution of physicochemical properties of generated molecules (Figure 7) /QSP_modelling /example_simulations: Contains example simulations for QSP modelling (05_QSP_modelling/01_example_simulations.py) example_compounds.png: plots of tumor growth for example compounds (Figure 8B) ... /figures_rewrite: figure naming identical to the original figures from Voronova et al., 2021 ... /simulations_generated_compounds simulation_of_Tum_generatedCompounds.png: simulation of tumor growth for generated compounds (Figure 8C) All conda_env_{date}.yml contain the conda environments used to generated the data in the folder All {name}_{date}.log contain the logs of the data generation process session_info.csv: contains the R session information used to generate the plots Note. if plots are given per replicate, the first replicate (0) is used for the figures in the manuscript References - Lombardo, F., Berellini, G., & Obach, R. S. (2018). Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 1352 Drug Compounds. Drug Metabolism and Disposition, 46(11), 1466–1477. https://doi.org/10.1124/dmd.118.082966 - Voronova, V., Peskov, K., Kosinsky, Y., Helmlinger, G., Chu, L., Borodovsky, A., Woessner, R., Sachsenmeier, K., Shao, W., Kumar, R., Pouliot, G., Merchant, M., Kimko, H., & Mugundu, G. (2021). Evaluation of Combination Strategies for the A2AR Inhibitor AZD4635 Across Tumor Microenvironment Conditions via a Systems Pharmacology Model. Frontiers in Immunology, 12, 617316. https://doi.org/10.3389/fimmu.2021.617316