Artifact VisionBreaker: Fuzzing VLM Machines

anon.

doi:10.5281/zenodo.18413613

Published January 29, 2026 | Version v1

Publication Restricted

Artifact VisionBreaker: Fuzzing VLM Machines

anon.

Abstract. Vision Language Models (VLMs) are deep learning architectures that integrate vision and language by mapping images and text into a shared latent space, typically employing dual encoders and contrastive learning. These models power applications such as image captioning, visual question answering, cross-modal retrieval, medical imaging analysis, and robotics. However, their robustness remains a critical concern, particularly in adversarial settings where input manipulations can degrade performance or expose vulnerabilities.
In this research, we systematically evaluate the robustness of VLMs against adversarial attacks by advancing adversarial attacks to incorporate perturbations in different directions. We analyze existing robustness studies that exploit corrupted images to pinpoint how multiple attacks that push the boundaries of the models work together.
Leveraging these insights, we propose novel attack strategies targeting specific vulnerabilities within VLMs. By benchmarking these attacks against state-of-the-art models, we aim to provide a deeper understanding of their robustness and propose mitigation strategies to enhance VLM security in high-stakes applications.

TODO: Add all instructions. Later.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

	All versions	This version
Views	33	33
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Artifact VisionBreaker: Fuzzing VLM Machines

Authors/Creators

Description

Files

Restricted