Published January 31, 2024 | Version v1
Conference paper Open

Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization

  • 1. CERTH-ITI

Description

Recent image manipulation localization and detection techniques usually leverage forensic artifacts and traces that are produced by a noise-sensitive filter, such as SRM and Bayar convolution. In this paper, we showcase that different filters commonly used in such approaches excel at unveiling different types of manipulations and provide complementary forensic traces. Thus, we explore ways of merging the outputs of such filters and aim to leverage the complementary nature of the artifacts produced to perform image manipulation localization and detection (IMLD). We propose two distinct methods: one that produces independent features from each forensic filter and then fuses them (this is referred to as late fusion) and one that performs early mixing of different modal outputs and produces early combined features (this is referred to as early fusion). We demonstrate that both approaches achieve competitive performance for both image manipulation localization and detection, outperforming state-of-the-art models across several datasets.

Files

208_mmm2024_triaridis.pdf

Files (1.5 MB)

Name Size Download all
md5:9411a8999279feff73b71accf9c075be
1.5 MB Preview Download

Additional details

Funding

European Commission
CRiTERIA – Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks 101021866