Advancing drug–target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining

DJEDDI, Warith Eddine; HERMI, Khalil; BEN YAHIA, Sadok; DIALLO, Gayo

doi:10.5281/zenodo.10438708

Published December 19, 2023 | Version v2

Software Open

Advancing drug–target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining

1. Tunis El Manar University
2. ISI, kef
3. Tallinn University of Technology
4. Maersk (Denmark)
5. Bordeaux Population Health
6. INSERM Bordeaux Population Health Research Center, Univ. Bordeaux

We proposed an innovative approach for predicting DTIs, leveraging both contextual and local strategies. In terms of contextual information, DTIOG utilizes KGE techniques such as the DistMult model to generate drug and target embedding vectors. These vectors are derived from the knowledge graph, capturing associations and similarities between drugs and targets. Instead, DTIOG uses ProtBERT, a language model that has already been trained on protein sequences, to figure out how amino acids in proteins are put together. These embeddings, obtained through either KGE or ProtBERT, are integrated into the process of prediction. Adding to that, some recent deep learning-based models that have been pre-trained on a large corpus of protein sequences, such as ProtBERT, have been utilized to extract features of the proteins. For example, ProtBERT can be used to provide meaningful, context-aware representations of protein sequences, which are crucial for the accurate identification of lysine glutarylation sites.

For the local strategy, DTIOG gets information about drugs by using the RDKit library to turn SMILES representations into molecular fingerprints. The Avalon fingerprint generator identifies specific fragments within the molecular structure, creating numerical representations for each drug. Regarding protein sequences, DTIOG processes them into feature vectors based on amino acid biochemical properties. A sliding window of size 3 categorizes amino acids into groups (i.e., non-polar, polar neutral, acidic, and basic), transforming sequences into numerical representations.

Files

Files (347.9 MB)

Name	Size	Download all
DTIOG_v2.tar.xz md5:ba02b3c3a6f317c722f6f2522fda0151	347.9 MB	Download

Additional details

DOI: 10.1186/s12859-023-05593-6

References: Journal: 10.1186/s12859-023-05593-6 (DOI)

	All versions	This version
Views	756	189
Downloads	186	71
Data volume	76.5 GB	25.7 GB

Files (347.9 MB)

Identifiers

Related works

Advancing drug–target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining

Authors/Creators

Description

Files

Files (347.9 MB)

Additional details

Identifiers

Related works