Adversarial Training in the Frequency Domain for Robust Multimodal Image Captioning Against Structured Perturbations
Description
Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications, raising significant safety and security concerns due to their vulnerability to adversarial attacks. This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against such attacks. By leveraging the Fast Gradient Sign Method (FGSM) to generate adversarial examples and incorporating adversarial training techniques, we demonstrate improved model robustness on two benchmark datasets: Flickr8k and COCO. Our findings indicat
Research goal: To what extent does adversarial training in the frequency domain improve the robustness of multimodal models against structured perturbations in image captioning tasks?
Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 7.6/10.
Notes
Files
paper.pdf
Files
(89.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7af197fe51472049ee65d8e6b2be758b
|
89.5 kB | Preview Download |