Published March 31, 2026 | Version v1
Dataset Open

BioSensGraph

Description

This dataset contains a heterogeneous biomolecular knowledge graph designed for interaction prediction tasks. The graph integrates multiple types of biological entities, including proteins, peptides, nucleic acids, and small molecules, along with experimentally derived and similarity-based relationships. The dataset is distributed as CSV files organized into node and edge tables. The node tables describe molecular entities with associated metadata, while the edge tables define relationships between entities.
The dataset includes six node files (AA.csv, DNA.csv, RNA.csv, NucleicMixed.csv, NucleicAmbigous.csv, SmallMolecule.csv) and two edge files (interacts_with.csv, has_similarity.csv). Nodes and edges are linked via a shared identifier (id), enabling reconstruction of the full property graph. The dataset can be used for link prediction, graph representation learning, and biomolecular interaction modeling.

Files

AA.csv

Files (548.1 MB)

Name Size Download all
md5:9d9f5c61ff2fa1a941a8081247a3525d
40.1 MB Preview Download
md5:a8cb0997c4fe957fa065d7c96ece1e99
54.3 kB Preview Download
md5:8898a85d76737dc3bae5ec50f5d3c5a5
137.4 MB Preview Download
md5:3d43a905a2d1d8f6bbfd6407e99d85c8
123.7 MB Preview Download
md5:aacd58338706f1b13d5d86038d83d155
1.8 kB Preview Download
md5:90458fc5e639dbd5f2af982f4c0daeb8
2.9 kB Preview Download
md5:e9bf7c95542a2daa31890a2fbd58e19b
76.7 kB Preview Download
md5:2a1c9e56e161407782bb3c0eff74231b
246.7 MB Preview Download