ankilab/HANCOCK_MultimodalDataset: Primary release
Description
Release Overview
The v1.0 release organizes the code into clear, logical folders:
Environment setup: A Conda environment.yml defines all dependencies for smooth installation and reproducibility.
Data loaders & explorers: Jupyter notebooks guide users through loading the HANCOCK dataset (demographics, pathology, blood, surgical reports, WSIs) and performing exploratory analyses using pandas and matplotlib.
Preprocessing & feature extraction: Scripts for cleaning clinical text, normalizing lab values, and extracting histopathological features from whole‐slide images via OpenSlide and custom pipelines.
Core Modules and Workflows
To facilitate rigorous machine-learning experimentation, the release includes:
Train/Test split generation using a genetic‐algorithm approach to ensure balanced cohorts across modalities.
Multimodal fusion pipelines that integrate tabular, imaging, and textual features into unified PyTorch datasets and DataLoaders.
Model training & evaluation notebooks showcasing baseline classifiers (e.g., random forests, XGBoost) and deep‐learning architectures, complete with hyperparameter tuning and performance metrics (AUC, calibration curves).
Documentation & Citation
Comprehensive usage instructions, code comments, and example workflows are detailed in the README.md, with links to the public dataset portal (www.hancock.research.fau.eu)
Files
ankilab/HANCOCK_MultimodalDataset-v1.0.0.zip
Files
(14.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f6cd72c8c5139ab6cf4895cb87885a1a
|
14.3 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/ankilab/HANCOCK_MultimodalDataset/tree/v1.0.0 (URL)
Software
- Repository URL
- https://github.com/ankilab/HANCOCK_MultimodalDataset