Published January 29, 2023 | Version 1.0.0
Dataset Open

Relevant Datasets and Software Used for Paper "KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description"

  • 1. Pennsylvania State University
  • 2. Northwestern University

Description

This repository contains relevant datasets and software used in a paper "KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description". They are used to run the code of KGML-xDTD stored on Github and support the results of this paper.

About the datasets

1. bkg_rtxkg2c_v2.7.3.tar.gz

This tar.gz file contains three sub-folders: tsv_files, scripts, and relevant_dbs. The "tsv_files" sub-folder has the input files that the neo4j software uses. The "scripts" sub-folder contains a shell script with a relevant python script to construct the biomedical knowledge graph. The "relevant_dbs" sub-folder stores two auxiliary databases that KGML-xDTD needs to use. 

2. indication_paths.yaml

This file contains the DrugMechDB MOA paths that we used to evaluate the predicted MOA paths by KGML-xDTD. It is downloaded from the official GitHub repository of DrugMechDB.

3. training_data.tar.gz

This tar.gz file contains the processed training data of four data sources (e.g., MyChem, SemMedDB, NDF-RT, RepoDB) mentioned in the paper. These processed drug-disease pairs have been matched to the identifiers of biological entities used in our biomedical knowledge graph and respectively split into true positive (tp) sets and true negative (tn) sets. We also provide the names of these drug identifiers and disease identifiers under a sub-folder "translated _to_name".

About the software

neo4j-community-3.5.26.tar.gz

This tar.gz is the Neo4j community version 3.5.26 downloaded from Neo4j Download Center. Although the newer versions are available, due to their big changes in the Neo4j setting that are not compatible with our scripts on Github, we provide the version that we used in our research. If you would like to use the newer version, modifications to our script will be required to import the biomedical knowledge graph into your local Neo4j database with the new setting.

Files

Files (6.1 GB)

Name Size Download all
md5:1d172851bf96c6ac1f7aec763da2635e
6.0 GB Download
md5:797f74d8e6a00455ea0272af3ad3a38b
5.7 MB Download
md5:2d47dcbbac509f4e82159f8dbb6a4640
128.4 MB Download
md5:ab86c336e48792603bf8ce28eb854e4f
10.8 MB Download