Published April 28, 2026 | Version v1
Other Open

VLM3D: Vision-Language Modeling in 3D Medical Imaging

Description

VLM3D 2026 is a large-scale benchmark for vision language modeling in 3D medical imaging, evaluating two modalities within the same edition: chest CT and brain MRI. The challenge focuses on clinically grounded tasks that reflect real radiology workflows, including radiology report generation, multi-abnormality classification, localization and segmentation, and text-conditional 3D image synthesis. Our goal is to accelerate reproducible and generalizable 3D multimodal foundation models by providing standardized datasets, evaluation code, and container-based benchmarking.

This is the second edition of VLM3D. In the first edition (MICCAI 2025), the benchmark evaluated chest CT only; although Boston external validation was mentioned, the Boston external test set was not utilized for official evaluation/ranking, and no expert radiologist human evaluation was performed. In VLM3D 2026, we evaluate two modalities within the same benchmark (chest CT and brain MRI), introduce a new brain MRI track via MR-RATE, include mandatory external validation using a closed Boston University test set, and conduct expert radiologist human evaluation for the top-performing methods. Results will be reported with track-specific and task-specific leaderboards, alongside mandatory external generalization reporting to quantify robustness under dataset shift.

Files

325-VLM3D_Vision-Language_Modeling_in_3D_Medical_Imaging_2026-04-22T16-37-13.pdf