There is a newer version of the record available.

Published November 27, 2023 | Version v1
Software Open

Drug-Target Interaction Prediction: A Graph-Based Approach Integrating Knowledge Graph Embedding and Pretrained ProtBert Model

  • 1. ROR icon Tunis El Manar University
  • 2. ISI, kef
  • 3. ROR icon Tallinn University of Technology
  • 4. ROR icon Maersk (Denmark)
  • 5. ROR icon Bordeaux Population Health
  • 6. INSERM Bordeaux Population Health Research Center, Univ. Bordeaux

Description

We proposed an innovative approach for predicting DTIs, leveraging both contextual and local strategies. In terms of contextual information, DTIOG utilizes KGE techniques such as the DistMult model \cite{yang2014embedding} to generate drug and target embedding vectors. These vectors are derived from the knowledge graph, capturing associations and similarities between drugs and targets. Instead, DTIOG uses ProtBERT, a language model that has already been trained on protein sequences, to figure out how amino acids in proteins are put together. These embeddings, obtained through either KGE or ProtBERT, are integrated into the process of prediction. Adding to that, some recent deep learning-based models that have been pre-trained on a large corpus of protein sequences, such as ProtBERT, have been utilized to extract features of the proteins. For example, ProtBERT can be used to provide meaningful, context-aware representations of protein sequences, which are crucial for the accurate identification of lysine glutarylation sites.

 

For the local strategy, DTIOG gets information about drugs by using the RDKit  library to turn SMILES representations into molecular fingerprints. The Avalon fingerprint generator identifies specific fragments within the molecular structure, creating numerical representations for each drug. Regarding protein sequences, DTIOG processes them into feature vectors based on amino acid biochemical properties. A sliding window of size 3 categorizes amino acids into groups (i.e., non-polar, polar neutral, acidic, and basic), transforming sequences into numerical representations.

Files

Files (347.8 MB)

Name Size Download all
md5:5d7a730c112f60d529def082a8932ef2
347.8 MB Download

Additional details

Related works

References
Journal: 10.1186/s12859-023-05593-6 (DOI)