Published December 31, 2025 | Version v1
Project deliverable Open

Deliverable 4.3: Learning models and spatio-temporal harmonization

  • 1. ROR icon HES-SO Valais-Wallis
  • 2. ROR icon University of Padua

Description

Research in HEREDITARY involves understanding properties, relationships, and patterns in large and complex heterogeneous data, to obtain knowledge from biomedical datasets. We aim to link different modalities together to discover novel correlations, biological pathways, and ultimately to stratify patients towards a personalized medicine approach.

Work Package 4 of the HEREDITARY project develops tools for harmonizing heterogeneous biomedical data before multimodal integration. The goal is to minimise biases introduced by acquisition instruments and subject variability that can impact downstream analysis.

Reporting on the progress of Tasks 4.3, 4.4, and 4.5, this deliverable aims to provide practical harmonization strategies and self-supervised learning models that improve the quality, fairness, and comparability of biomedical data. As biomedical datasets grow in size and diversity, differences in acquisition settings, preprocessing choices, and subject characteristics can introduce variability that affects downstream results. Addressing these issues early helps avoid misleading conclusions and supports more stable multimodal models.

The methods presented here address this need by providing clear and straightforward steps for harmonization across several key domains, including electroencephalography (EEG), optical coherence tomography (OCT), magnetic resonance imaging (MRI), and genomic/transcriptomic data.

Results indicate that standardized, lightweight preprocessing often works well, and that subject-based evaluation prevents inflated performance. In terms of advanced learning, self-supervised methods show particular promise for learning generalizable representations in imaging modalities. MRI studies highlight the need for clearer reporting of acquisition parameters and resolution settings to ensure comparability. OCT datasets vary widely, but some are well-suited for harmonized analysis, and genomics SSL models are improving rapidly. Additionally, analysis of clinical data has been shown to reveal meaningful patient groups. Overall, simple and transparent workflows lead to more reliable results.

These methods help reduce unnecessary variability and support more consistent analyses, while also laying the groundwork for improved multimodal modeling and
more reliable clinical insights.

The next phase will focus on integrating these harmonization approaches into a unified, multimodal pipeline that can be applied across partners. This includes refining preprocessing and evaluation steps, integrating self-supervised models into routine workflows, and validating the methods on additional datasets. Furthermore, we will leverage these harmonized datasets to support federated learning, ensuring that model quality remains high even when training across distributed sources.

Files

D4.3_Learning models and spatio-temporal harmonization.pdf.pdf

Files (2.9 MB)

Additional details

Funding

European Commission
HEREDITARY - HetERogeneous sEmantic Data integratIon for the guT-bRain interplaY 101137074