Published April 26, 2025 | Version v1.0.0
Software Open

ankilab/HANCOCK_MultimodalDataset: Primary release

Description

Release Overview

The v1.0 release organizes the code into clear, logical folders:

Environment setup: A Conda environment.yml defines all dependencies for smooth installation and reproducibility.

Data loaders & explorers: Jupyter notebooks guide users through loading the HANCOCK dataset (demographics, pathology, blood, surgical reports, WSIs) and performing exploratory analyses using pandas and matplotlib.

Preprocessing & feature extraction: Scripts for cleaning clinical text, normalizing lab values, and extracting histopathological features from whole‐slide images via OpenSlide and custom pipelines.

Core Modules and Workflows

To facilitate rigorous machine-learning experimentation, the release includes:

Train/Test split generation using a genetic‐algorithm approach to ensure balanced cohorts across modalities.

Multimodal fusion pipelines that integrate tabular, imaging, and textual features into unified PyTorch datasets and DataLoaders.

Model training & evaluation notebooks showcasing baseline classifiers (e.g., random forests, XGBoost) and deep‐learning architectures, complete with hyperparameter tuning and performance metrics (AUC, calibration curves).

Documentation & Citation

Comprehensive usage instructions, code comments, and example workflows are detailed in the README.md, with links to the public dataset portal (www.hancock.research.fau.eu)

Files

ankilab/HANCOCK_MultimodalDataset-v1.0.0.zip

Files (14.3 MB)

Name Size Download all
md5:f6cd72c8c5139ab6cf4895cb87885a1a
14.3 MB Preview Download

Additional details

Related works