Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D)

Hamamci, Ibrahim Ethem; Shit, Suprosanna; Sekuboyina, Anjany; Xu, Murong; Prabhakar, Chinmay; Menze, Bjoern; Bluethgen, Christian; Er, Sezgin; Simsek, Ayse Gulnihan; Durugol, Omer Faruk; Esirgun, Sevval Nil; Dasdelen, Muhammed Furkan; Simsek, Neslihan; Akan, Gulhan Ertan; Wang, Chenyu; Dai, Weicheng; Batmanghelich, Kayhan; Zhang, Xiaoman; Rajpurkar, Pranav; Bassi, Pedro R. A. S.; Li, Wenxuan; Yuille, Alan; Zhou, Zongwei; Reynaud, Hadrien; Kainz, Bernhard; Wu, Chaoyi; Xie, Weidi; Hou, Benjamin; Lu, Zhiyong; Xu, Daguang; Yang, Dong; Guo, Pengfei

doi:10.5281/zenodo.15052708

Published March 19, 2025 | Version v1

Other Open

Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D)

1. University of Zurich, Switzerland
2. University Hospital Zurich, Switzerland
3. Istanbul Medipol University, Turkey
4. Boston University, USA
5. Harvard University, USA
6. Johns Hopkins University, USA
7. Imperial College London, UK
8. Shanghai Jiao Tong University, China
9. National Institutes of Health (NIH), USA
10. NVIDIA, USA

Three-dimensional (3D) medical imaging, particularly chest computed tomography (CT), plays a vital role in diagnosing thoracic abnormalities by offering detailed insights into complex anatomical structures. However, interpreting 3D CT data is a time-consuming and challenging process, especially given the increasing global demand for CT scans. While artificial intelligence (AI) has demonstrated significant potential in automating radiological tasks such as report generation and abnormality detection, its application to 3D medical imaging remains limited due to the lack of large-scale, paired datasets and the computational challenges associated with processing 3D data.

To address these challenges, we introduce the Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D), built around the open-source CT-RATE dataset. CT-RATE pairs over 50,000 3D chest CT volumes with corresponding radiology reports, annotated for 18 clinically significant abnormalities. The VLM3D Challenge presents participants with four tasks: (1) radiology report generation from chest CT volumes, (2) multi-abnormality classification, (3) self-supervised multi-abnormality localization, and (4) text-conditional 3D chest CT generation.

These tasks address key aspects of radiological diagnostics and treatment planning. Radiology report generation automates the creation of accurate and detailed reports from CT volumes, reducing radiologists' workloads. Multi-abnormality classification enables quicker and more precise detection of pathologies in CT scans. The self-supervised multi-abnormality localization task focuses on identifying specific pathological regions, including pericardial effusion, pleural effusion, consolidation, lung opacity, and lung nodules, without the use of ground-truth labels during training. This task will be evaluated using a manually labeled dataset to assess how effectively models can localize abnormalities in a self-supervised manner. Text-conditional 3D chest CT generation supports the creation of realistic CT volumes from textual descriptions, with applications in data augmentation, education, and generative modeling research.

The challenge evaluation will utilize two datasets: an internal closed test set containing 2,000 cases and an external closed test set from Boston University Hospital with 1,024 cases. This dual evaluation framework ensures that models are assessed on diverse and clinically relevant data, supporting their real-world applicability. The training and validation datasets are derived from the CT volumes in the CT-RATE dataset.

The VLM3D Challenge aims to advance AI in 3D medical imaging by providing a benchmark for vision-language models. Biomedically, it seeks to improve diagnostic accuracy, streamline radiological workflows, and enhance patient care. Technically, it encourages the development of scalable and generalizable AI systems that integrate visual and textual modalities, fostering innovation and collaboration in research and clinical applications.

Files

174-Challenge_for_Vision-Language_Modeling_in_3D_Medical_Imaging_2025-03-17T09-18-37.pdf

Files (188.8 kB)

Name	Size	Download all
174-Challenge_for_Vision-Language_Modeling_in_3D_Medical_Imaging_2025-03-17T09-18-37.pdf md5:582f9a98d60024a2bbf988b4d05366a3	188.8 kB	Preview Download

	All versions	This version
Views	2,304	2,304
Downloads	1,845	1,845
Data volume	426.6 MB	426.6 MB

Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D)

Authors/Creators

Description

Files

174-Challenge_for_Vision-Language_Modeling_in_3D_Medical_Imaging_2025-03-17T09-18-37.pdf

Files (188.8 kB)