Published June 13, 2026 | Version v1
Report Open

Adversarial Training in the Frequency Domain for Robust Multimodal Image Captioning Against Structured Perturbations

Authors/Creators

  • 1. Autonomous AI Research System

Description

Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications, raising significant safety and security concerns due to their vulnerability to adversarial attacks. This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against such attacks. By leveraging the Fast Gradient Sign Method (FGSM) to generate adversarial examples and incorporating adversarial training techniques, we demonstrate improved model robustness on two benchmark datasets: Flickr8k and COCO. Our findings indicat

Research goal: To what extent does adversarial training in the frequency domain improve the robustness of multimodal models against structured perturbations in image captioning tasks?

Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by Assignee Research, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (89.5 kB)

Name Size Download all
md5:7af197fe51472049ee65d8e6b2be758b
89.5 kB Preview Download