Published December 19, 2023 | Version v2
Software Open

Advancing drug–target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining

  • 1. ROR icon Tunis El Manar University
  • 2. ISI, kef
  • 3. ROR icon Tallinn University of Technology
  • 4. ROR icon Maersk (Denmark)
  • 5. ROR icon Bordeaux Population Health
  • 6. INSERM Bordeaux Population Health Research Center, Univ. Bordeaux

Description

We proposed an innovative approach for predicting DTIs, leveraging both contextual and local strategies. In terms of contextual information, DTIOG utilizes KGE techniques such as the DistMult model to generate drug and target embedding vectors. These vectors are derived from the knowledge graph, capturing associations and similarities between drugs and targets. Instead, DTIOG uses ProtBERT, a language model that has already been trained on protein sequences, to figure out how amino acids in proteins are put together. These embeddings, obtained through either KGE or ProtBERT, are integrated into the process of prediction. Adding to that, some recent deep learning-based models that have been pre-trained on a large corpus of protein sequences, such as ProtBERT, have been utilized to extract features of the proteins. For example, ProtBERT can be used to provide meaningful, context-aware representations of protein sequences, which are crucial for the accurate identification of lysine glutarylation sites.

 

For the local strategy, DTIOG gets information about drugs by using the RDKit  library to turn SMILES representations into molecular fingerprints. The Avalon fingerprint generator identifies specific fragments within the molecular structure, creating numerical representations for each drug. Regarding protein sequences, DTIOG processes them into feature vectors based on amino acid biochemical properties. A sliding window of size 3 categorizes amino acids into groups (i.e., non-polar, polar neutral, acidic, and basic), transforming sequences into numerical representations.

Files

Files (347.9 MB)

Name Size Download all
md5:ba02b3c3a6f317c722f6f2522fda0151
347.9 MB Download

Additional details

Related works

References
Journal: 10.1186/s12859-023-05593-6 (DOI)