RUNA-1: a typed biosemiotic knowledge-graph embedding for European ecology (v1.1.0)
Description
RUNA-1 is a typed knowledge-graph embedding (PyKEEN, model: BoxE, embedding_dim 128) over open European ecological data, in which species, discretized environmental-state nodes, and detected-community nodes share one geometric space where proximity encodes ecological and biosemiotic relatedness.
On top of conventional ecological relations (predation, pollination, mycorrhizae, parasitism, etc., mapped where possible to the OBO Relations Ontology), it adds biosemiotic sign-relations trained as ordinary typed edges: indicatorOf (a species as a sign of an environmental state, from EIVE/Ellenberg indicator values) and keystoneSignProducerIn (keystone sign-producer within a detected interaction community, from graph centrality). The relation perceivesSignal is defined in the schema but deliberately not populated (no real perception dataset at scale).
Priority claim: the first trained relational embedding that operationalizes biosemiotic relations for ecology.
Validation (honest scope, v1.1.0): the conventional layer is validated by held-out link prediction (filtered MRR 0.288, Hits@10 0.48). TWO biosemiotic indicator axes are now independently validated, non-circularly against real GBIF occurrence × environment: temperature (model Spearman rho = 0.545 vs CHELSA bio1) and soil pH (rho = 0.335 vs SoilGrids pH, exceeding the expert-input ceiling of 0.213 — the embedding denoises the EIVE values across the interaction graph). Moisture and nutrients were tested but their candidate proxies (annual precipitation; total soil N) proved invalid (the EIVE-value-vs-reality ceiling is itself ~0), so they remain unvalidated pending better proxies; light/salinity untested. keystoneSignProducerIn still relies on a within-rule (circular) check; perceivesSignal is not built. Biosemiotic edges are interpretive derived hypotheses with stated provenance, not ground truth.
Contents: the frozen reconciled triple set + per-source components, the relation schema, the full derivation/ETL code, the trained BoxE model, the multi-axis independent-validation niche data + results, documentation, and a SHA-256 manifest. Derived from GLOBI, Mangal, EIVE 1.0, GBIF, CHELSA, and SoilGrids (see README for attribution).
Notes
Files
Files
(72.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e09bf9999bc12453d82eceae43183d72
|
72.8 MB | Download |
Additional details
Related works
- Is compiled by
- https://www.globalbioticinteractions.org/ (URL)
- Is derived from
- 10.1016/j.ecoinf.2014.08.005 (DOI)
- 10.5281/zenodo.7427088 (DOI)
- Is supplement to
- https://runa.agnt.eco/ (URL)
- References
- 10.3897/zookeys.367.6185 (DOI)