VLM3D: Vision-Language Modeling in 3D Medical Imaging
Authors/Creators
- Hamamci, Ibrahim Ethem1
- Er, Sezgin1
- Shit, Suprosanna1
- De la Rosa, Ezequiel1
- Sekuboyina, Anjany1
- Xu, Murong1
- Prabhakar, Chinmay1
- Menze, Bjoern1
- Bluethgen, Christian2
- Simsek, Ayse Gulnihan3
- Durugol, Omer Faruk3
- Simsek, Neslihan3
- Akan, Gulhan Ertan3
- Akan, Melih3
- Ozdemir, Mehmet Kemal3
- Wang, Chenyu4
- Dai, Weicheng4
- Batmanghelich, Kayhan4
- Zhang, Xiaoman5
- Baharoon, Mohammed5
- Luo, Luyang5
- Rajpurkar, Pranav5
- Bassi, Pedro R. A. S.6
- Chen, Jieneng6
- Chen, Yixiong6
- Li, Wenxuan6
- Yuille, Alan6
- Zhou, Zongwei6
- Reynaud, Hadrien7
- Kainz, Bernhard7
- Wu, Chaoyi8
- Xie, Weidi8
- Hou, Benjamin9
- Lu, Zhiyong9
- Xu, Daguang10
- Yang, Dong10
- Guo, Pengfei10
- Edgar, Marc10
Description
VLM3D 2026 is a large-scale benchmark for vision language modeling in 3D medical imaging, evaluating two modalities within the same edition: chest CT and brain MRI. The challenge focuses on clinically grounded tasks that reflect real radiology workflows, including radiology report generation, multi-abnormality classification, localization and segmentation, and text-conditional 3D image synthesis. Our goal is to accelerate reproducible and generalizable 3D multimodal foundation models by providing standardized datasets, evaluation code, and container-based benchmarking.
This is the second edition of VLM3D. In the first edition (MICCAI 2025), the benchmark evaluated chest CT only; although Boston external validation was mentioned, the Boston external test set was not utilized for official evaluation/ranking, and no expert radiologist human evaluation was performed. In VLM3D 2026, we evaluate two modalities within the same benchmark (chest CT and brain MRI), introduce a new brain MRI track via MR-RATE, include mandatory external validation using a closed Boston University test set, and conduct expert radiologist human evaluation for the top-performing methods. Results will be reported with track-specific and task-specific leaderboards, alongside mandatory external generalization reporting to quantify robustness under dataset shift.
Files
325-VLM3D_Vision-Language_Modeling_in_3D_Medical_Imaging_2026-04-22T16-37-13.pdf
Files
(160.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:502db751e1268b152c841e84cbc3f992
|
160.3 kB | Preview Download |