Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D)
Creators
- Hamamci, Ibrahim Ethem1
- Shit, Suprosanna1
- Sekuboyina, Anjany1
- Xu, Murong1
- Prabhakar, Chinmay1
- Menze, Bjoern1
- Bluethgen, Christian2
- Er, Sezgin3
- Simsek, Ayse Gulnihan3
- Durugol, Omer Faruk3
- Esirgun, Sevval Nil3
- Dasdelen, Muhammed Furkan3
- Simsek, Neslihan3
- Akan, Gulhan Ertan3
- Wang, Chenyu4
- Dai, Weicheng4
- Batmanghelich, Kayhan4
- Zhang, Xiaoman5
- Rajpurkar, Pranav5
- Bassi, Pedro R. A. S.6
- Li, Wenxuan6
- Yuille, Alan6
- Zhou, Zongwei6
- Reynaud, Hadrien7
- Kainz, Bernhard7
- Wu, Chaoyi8
- Xie, Weidi8
- Hou, Benjamin9
- Lu, Zhiyong9
- Xu, Daguang10
- Yang, Dong10
- Guo, Pengfei10
- 1. University of Zurich, Switzerland
- 2. University Hospital Zurich, Switzerland
- 3. Istanbul Medipol University, Turkey
- 4. Boston University, USA
- 5. Harvard University, USA
- 6. Johns Hopkins University, USA
- 7. Imperial College London, UK
- 8. Shanghai Jiao Tong University, China
- 9. National Institutes of Health (NIH), USA
- 10. NVIDIA, USA
Description
Three-dimensional (3D) medical imaging, particularly chest computed tomography (CT), plays a vital role in diagnosing thoracic abnormalities by offering detailed insights into complex anatomical structures. However, interpreting 3D CT data is a time-consuming and challenging process, especially given the increasing global demand for CT scans. While artificial intelligence (AI) has demonstrated significant potential in automating radiological tasks such as report generation and abnormality detection, its application to 3D medical imaging remains limited due to the lack of large-scale, paired datasets and the computational challenges associated with processing 3D data.
To address these challenges, we introduce the Challenge for Vision-Language Modeling in 3D Medical Imaging (VLM3D), built around the open-source CT-RATE dataset. CT-RATE pairs over 50,000 3D chest CT volumes with corresponding radiology reports, annotated for 18 clinically significant abnormalities. The VLM3D Challenge presents participants with four tasks: (1) radiology report generation from chest CT volumes, (2) multi-abnormality classification, (3) self-supervised multi-abnormality localization, and (4) text-conditional 3D chest CT generation.
These tasks address key aspects of radiological diagnostics and treatment planning. Radiology report generation automates the creation of accurate and detailed reports from CT volumes, reducing radiologists' workloads. Multi-abnormality classification enables quicker and more precise detection of pathologies in CT scans. The self-supervised multi-abnormality localization task focuses on identifying specific pathological regions, including pericardial effusion, pleural effusion, consolidation, lung opacity, and lung nodules, without the use of ground-truth labels during training. This task will be evaluated using a manually labeled dataset to assess how effectively models can localize abnormalities in a self-supervised manner. Text-conditional 3D chest CT generation supports the creation of realistic CT volumes from textual descriptions, with applications in data augmentation, education, and generative modeling research.
The challenge evaluation will utilize two datasets: an internal closed test set containing 2,000 cases and an external closed test set from Boston University Hospital with 1,024 cases. This dual evaluation framework ensures that models are assessed on diverse and clinically relevant data, supporting their real-world applicability. The training and validation datasets are derived from the CT volumes in the CT-RATE dataset.
The VLM3D Challenge aims to advance AI in 3D medical imaging by providing a benchmark for vision-language models. Biomedically, it seeks to improve diagnostic accuracy, streamline radiological workflows, and enhance patient care. Technically, it encourages the development of scalable and generalizable AI systems that integrate visual and textual modalities, fostering innovation and collaboration in research and clinical applications.
Files
174-Challenge_for_Vision-Language_Modeling_in_3D_Medical_Imaging_2025-03-17T09-18-37.pdf
Files
(188.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:582f9a98d60024a2bbf988b4d05366a3
|
188.8 kB | Preview Download |