Link-prediction on Biomedical Knowledge Graphs
Description
Release of code and experimental data from the paper Towards Linking Graph Topology to Model Performance for Biomedical Knowledge Graph Completion (Machine Learning for Life and Material Sciences workshop @ ICML2024) and The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models.
train/scripts
). We release results for tail predictions on the test split. In particular, each test query (h,r,?)
is scored against all entities in the KG and we compute the rank of the score of the correct completion (h,r,t)
, after masking out scores of other (h,r,t')
triples contained in the graph.experimental_data.zip
, the following files are provided.datasets/{dataset}
: a folder for each dataset, containing{dataset}_preprocessing.ipynb
: a Jupyter notebook for downloading and preprocessing the datasets. In particular, this generates the custom label->ID mapping for entities and relations, and the numerical tensor of(h_ID,r_ID,t_ID)
triples for all edges in the graph, which can be used to compute graph topological metrics (e.g., using kg-topology-toolbox) and compare them with the edge prediction accuracy.test_ranks.csv
: csv table with columns["h", "r", "t"]
specifying the head, relation, tail IDs of the test triples, and columns["DistMult", "TransE", "RotatE", "TripleRE", "ConvE"]
with the rank of the ground-truth tail in the ordered list of predictions made by the five KGE models;entity_dict.csv
: list of entity labels, ordered by entity ID (as generated in the preprocessing notebook);relation_dict.csv
: list of relation labels, ordered by relation ID (as generated in the preprocessing notebook).
train
: code to reproduce training (and validation) of the five KGE models, using the BESS-KGE distribution framework.train/scripts
: executable scripts, with specifications of the final hyperparameters for all models and datasets.
notebooks
: Jupyter notebooks for data analysis and generation of all the figures in the paper.
The separate top_100_tail_predictions.zip
archive contains, for each of the test queries in the corresponding test_ranks.csv
table, the IDs of the top-100 tail predictions made by each of the five KGE models, ordered by decreasing likelihood. The predictions are released in a .npz
archive of numpy arrays (one array of shape (n_test_triples, 100)
for each of the KGE models).
Files
experimental_data.zip
Files
(1.5 GB)
Name | Size | Download all |
---|---|---|
md5:7fa9170146a6e3a0c94589a7f7b2ac29
|
65.5 MB | Preview Download |
md5:0e226efa1ab4b0778f26822c35204830
|
1.4 GB | Preview Download |
Additional details
Dates
- Available
-
2025-06-27
Software
- Repository URL
- https://github.com/graphcore-research/kg-topology-toolbox
- Programming language
- Python
- Development Status
- Active