Cross-Domain Robustness of Vision-Language Models on Perturbed Medical and Autonomous Driving Benchmarks
Description
Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and po
Research goal: How do vision-language models perform in cross-domain robustness evaluations when tested on perturbed multimodal benchmarks from domains like medical imaging or autonomous driving, using metrics such as BLEU score for captioning and AUC-ROC for authenticity detection?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.
Notes
Files
paper.pdf
Files
(85.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:37f1eef0d0fb04b8ad5fe9fd7a8a6061
|
85.5 kB | Preview Download |