Correlation Between Visual Noise Robustness and Performance Degradation in Multimodal LLMs Across Medical and General Tasks

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20656425

Published June 12, 2026 | Version v1

Report Open

Correlation Between Visual Noise Robustness and Performance Degradation in Multimodal LLMs Across Medical and General Tasks

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work,

Research goal: How does the robustness of state-of-the-art multimodal LLMs to visual noise correlate with their performance degradation on medical decision-making benchmarks versus general vision-language tasks?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.4/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.4/10.

Files

paper.pdf

Files (90.5 kB)

Name	Size	Download all
paper.pdf md5:3972f76144ad576436b15db0634d97a5	90.5 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Correlation Between Visual Noise Robustness and Performance Degradation in Multimodal LLMs Across Medical and General Tasks

Authors/Creators

Description

Notes

Files

paper.pdf

Files (90.5 kB)