Published July 14, 2022 | Version v1
Dataset Open

Exploiting Pretrained Biochemical Language Models for Targeted Drug Design

  • 1. Boğaziçi University
  • 2. F. Hoffmann-La Roche AG
  • 3. İstanbul University

Description

This repository contains materials for the paper, Exploiting Pretrained Biochemical Language Models for Targeted Drug Design, which has been accepted for publication in Bioinformatics Published by Oxford University Press.

data.zip contains vocabulary files for the pretrained models, additional information regarding proteins (PFAM family, protein similarity) and interactions filtered from BindingDB which are further split into train, validation and test sets and used to train target specific molecule generation models. 

models.zip includes files for the models trained in this study.     

predictions.zip comprises the compounds generated with the targeted models and the result of their evaluation with respect to benchmarking metrics. 

docking.zip contains targets/ including PDB files of the test proteins selected for docking evaluation, ligands/ including SDF files for molecules generated with the targeted models and two decoding strategies (i.e. beam search and sampling) and complex/ including docking outputs. 

 

 

Files

data.zip

Files (1.8 GB)

Name Size Download all
md5:161a7ebde6ef43c5d621052c126be1c3
252.9 MB Preview Download
md5:3e13b0187dc1016b0fd8c5a4305208dc
42.4 MB Preview Download
md5:b20634a0bf6d8f18c51e1d26a9439606
1.5 GB Preview Download
md5:d7478ccf0f6003449ff7f3e9d63d0d01
871.8 kB Preview Download

Additional details

Related works

Is cited by
Software: https://github.com/boun-tabi/biochemical-lms-for-drug-design (URL)
Is supplement to
Journal article: 10.1093/bioinformatics/btac482 (DOI)