There is a newer version of the record available.

Published June 10, 2026 | Version 1.1.0

RUNA-1: a typed biosemiotic knowledge-graph embedding for European ecology (v1.1.0)

Authors/Creators

  • 1. AGNT ECO

Description

RUNA-1 is a typed knowledge-graph embedding (PyKEEN, model: BoxE, embedding_dim 128) over open European ecological data, in which species, discretized environmental-state nodes, and detected-community nodes share one geometric space where proximity encodes ecological and biosemiotic relatedness.

On top of conventional ecological relations (predation, pollination, mycorrhizae, parasitism, etc., mapped where possible to the OBO Relations Ontology), it adds biosemiotic sign-relations trained as ordinary typed edges: indicatorOf (a species as a sign of an environmental state, from EIVE/Ellenberg indicator values) and keystoneSignProducerIn (keystone sign-producer within a detected interaction community, from graph centrality). The relation perceivesSignal is defined in the schema but deliberately not populated (no real perception dataset at scale).

Priority claim: the first trained relational embedding that operationalizes biosemiotic relations for ecology.

Validation (honest scope, v1.1.0): the conventional layer is validated by held-out link prediction (filtered MRR 0.288, Hits@10 0.48). TWO biosemiotic indicator axes are now independently validated, non-circularly against real GBIF occurrence × environment: temperature (model Spearman rho = 0.545 vs CHELSA bio1) and soil pH (rho = 0.335 vs SoilGrids pH, exceeding the expert-input ceiling of 0.213 — the embedding denoises the EIVE values across the interaction graph). Moisture and nutrients were tested but their candidate proxies (annual precipitation; total soil N) proved invalid (the EIVE-value-vs-reality ceiling is itself ~0), so they remain unvalidated pending better proxies; light/salinity untested. keystoneSignProducerIn still relies on a within-rule (circular) check; perceivesSignal is not built. Biosemiotic edges are interpretive derived hypotheses with stated provenance, not ground truth.

Contents: the frozen reconciled triple set + per-source components, the relation schema, the full derivation/ETL code, the trained BoxE model, the multi-axis independent-validation niche data + results, documentation, and a SHA-256 manifest. Derived from GLOBI, Mangal, EIVE 1.0, GBIF, CHELSA, and SoilGrids (see README for attribution).

Notes

v1.1.0 broadens the independent validation from v1's thermal-only to TWO non-circular axes: temperature (Spearman 0.545 vs CHELSA) and soil pH (0.335 vs SoilGrids, above the expert ceiling 0.213). Moisture and nutrients were tested and their candidate proxies (precipitation, total soil N) found invalid (EIVE-vs-reality ceiling ~0) — unvalidated, not failed; light/salinity untested; keystoneSignProducerIn still circular; perceivesSignal not built. Same model as v1 (BoxE on the reconciled master); this is added validation evidence. Novelty is the relation schema + frozen triples + derivation code, hashed in MANIFEST.sha256.

Files

Files (72.8 MB)

Name Size Download all
md5:e09bf9999bc12453d82eceae43183d72
72.8 MB Download

Additional details

Related works