Published December 23, 2025 | Version v1
Project deliverable Open

Deliverable 3.2: Federated workflow execution methods. First release

  • 1. ROR icon Ontotext (Bulgaria)
  • 2. ROR icon University of Padua
  • 3. ROR icon Aalborg University

Description

This deliverable introduces the Hereditary Data Network (HDN), a privacy-by-design federation architecture for medical data analytics within the HEREDITARY project. HDN addresses the need to perform cross-institutional analyses and model development over sensitive clinical, genomic, and imaging data without centralizing patient-level information, in compliance with regulations such as the GDPR, the Data Governance Act, and the emerging AI Act.

HDN provides a unified semantic view of consortium data through the Hereditary Ontology (HERO) ontology and an ontology-mediated query interface. Researchers express information needs as SPARQL queries over a stable conceptual schema, while participating institutions retain full control over storage technologies, local schemas, and disclosure policies. Architecturally, HDN follows a hub-and-spoke model: a central orchestrator manages a vetted catalog of query templates, validates requests, enforces disclosure-level constraints at the semantic boundary, and dispatches instantiated queries to institutional endpoints. Each endpoint operates an Ontology-Based Data Access (OBDA)-based stack that maps ontology-level queries to its local schema, executes them under local privacy rules, and returns only admissible aggregated or record-level results.

The deliverable makes four main contributions. First, it formalizes the evolution from the initial Ontology-Based Data Federation (OBDF) architecture to a native federation design that embeds privacy enforcement into the core protocol, rather than as an external layer. Second, it details the logical and reference implementations of HDN Central and HDN Endpoints, including interaction protocols, query lifecycle, and privacy controls. Third, it presents a benchmark comparing HDN against the legacy OBDF approach, showing improved scalability, more robust behavior as the number of endpoints grows, and better alignment with institutional privacy constraints. Finally, it demonstrates the applicability of HDN through three use cases: (1) federated queries on ALS clinical data at different disclosure levels, (2) a distributed SQL-based implementation of a machine learning algorithm, the Cox survival model, and (3) integration with Ontotext’s LinkedLifeData Inventory for FAIR-compliant external datasets. Together, these results show that HDN provides a practical and extensible foundation for federated analytics in HEREDITARY and prepares the ground for tighter integration with federated learning workflows in future project phases.

Files

D3.2_V2.80.pdf

Files (7.1 MB)

Name Size Download all
md5:f5b168a8e82c43cd8d967cd93144958d
7.1 MB Preview Download

Additional details

Funding

European Commission
HEREDITARY - HetERogeneous sEmantic Data integratIon for the guT-bRain interplaY 101137074