VLM3D: Vision-Language Modeling in 3D Medical Imaging

Hamamci, Ibrahim Ethem; Er, Sezgin; Shit, Suprosanna; De la Rosa, Ezequiel; Sekuboyina, Anjany; Xu, Murong; Prabhakar, Chinmay; Menze, Bjoern; Bluethgen, Christian; Simsek, Ayse Gulnihan; Durugol, Omer Faruk; Simsek, Neslihan; Akan, Gulhan Ertan; Akan, Melih; Ozdemir, Mehmet Kemal; Wang, Chenyu; Dai, Weicheng; Batmanghelich, Kayhan; Zhang, Xiaoman; Baharoon, Mohammed; Luo, Luyang; Rajpurkar, Pranav; Bassi, Pedro R. A. S.; Chen, Jieneng; Chen, Yixiong; Li, Wenxuan; Yuille, Alan; Zhou, Zongwei; Reynaud, Hadrien; Kainz, Bernhard; Wu, Chaoyi; Xie, Weidi; Hou, Benjamin; Lu, Zhiyong; Xu, Daguang; Yang, Dong; Guo, Pengfei; Edgar, Marc

doi:10.5281/zenodo.19847782

Published April 28, 2026 | Version v1

Other Open

VLM3D: Vision-Language Modeling in 3D Medical Imaging

1. University of Zurich
2. University Hospital of Zurich
3. Istanbul Medipol University
4. Boston University
5. Harvard University
6. Johns Hopkins University
7. Imperial College London
8. Shanghai Jiao Tong University
9. National Institutes of Health
10. NVIDIA, USA

VLM3D 2026 is a large-scale benchmark for vision language modeling in 3D medical imaging, evaluating two modalities within the same edition: chest CT and brain MRI. The challenge focuses on clinically grounded tasks that reflect real radiology workflows, including radiology report generation, multi-abnormality classification, localization and segmentation, and text-conditional 3D image synthesis. Our goal is to accelerate reproducible and generalizable 3D multimodal foundation models by providing standardized datasets, evaluation code, and container-based benchmarking.

This is the second edition of VLM3D. In the first edition (MICCAI 2025), the benchmark evaluated chest CT only; although Boston external validation was mentioned, the Boston external test set was not utilized for official evaluation/ranking, and no expert radiologist human evaluation was performed. In VLM3D 2026, we evaluate two modalities within the same benchmark (chest CT and brain MRI), introduce a new brain MRI track via MR-RATE, include mandatory external validation using a closed Boston University test set, and conduct expert radiologist human evaluation for the top-performing methods. Results will be reported with track-specific and task-specific leaderboards, alongside mandatory external generalization reporting to quantify robustness under dataset shift.

Files

325-VLM3D_Vision-Language_Modeling_in_3D_Medical_Imaging_2026-04-22T16-37-13.pdf

Files (160.3 kB)

Name	Size	Download all
325-VLM3D_Vision-Language_Modeling_in_3D_Medical_Imaging_2026-04-22T16-37-13.pdf md5:502db751e1268b152c841e84cbc3f992	160.3 kB	Preview Download

	All versions	This version
Views	243	243
Downloads	134	134
Data volume	25.0 MB	25.0 MB

VLM3D: Vision-Language Modeling in 3D Medical Imaging

Authors/Creators

Description

Files

325-VLM3D_Vision-Language_Modeling_in_3D_Medical_Imaging_2026-04-22T16-37-13.pdf

Files (160.3 kB)