Published June 5, 2023
| Version v1
Dataset
Open
BioKG with attributes and decoupled benchmarks
Description
BioKG is a biomedical knowledge graph containing relationships between proteins, molecules, diseases, and others. It was originally proposed by Walsh et al. (2020) in "BioKG: A Knowledge Graph for Relational Learning On Biological Data".
We enrich this dataset with the aim of incorporating multimodal data associated with biomedical entities:
- Proteins: Protein embeddings computed with ProtTrans from aminoacid sequences
- Molecules: Molecule embeddings computed with MolTrans from SMILES representations
- Diseases: Textual descriptions retrieved from MeSH
Furthermore, we decouple the benchmarks provided by Walsh et al. from the edges in the knowledge graph, which ensures that there is no direct data leakage between the benchmarks and the triples used to train link prediction models.
Files
Files
(2.4 GB)
Name | Size | Download all |
---|---|---|
md5:0d545cf722fa16bed4ed459fdeb489cb
|
2.4 GB | Download |