Published October 2025 | Version v1
Conference paper Open

Disturbing Image Detection Using LMM-Elicited Emotion Embeddings

  • 1. ROR icon Centre for Research and Technology Hellas

Description

In this paper we deal with the task of Disturbing Image Detection (DID), exploiting knowledge encoded in Large Multimodal Models (LMMs). Specifically, we propose to exploit LMM knowledge in a two-fold manner: first by extracting generic semantic descriptions, and second by extracting elicited emotions. Subsequently, we use the CLIP's text encoder in order to obtain the text embeddings of both the generic semantic descriptions and LMM-elicited emotions. Finally, we use the aforementioned text embeddings along with the corresponding CLIP's image embeddings for performing the DID task. The proposed method significantly improves the baseline classification accuracy, achieving state-of-the-art performance on the augmented Disturbing Image Detection dataset.

Files

ICIPW2024.pdf

Files (355.8 kB)

Name Size Download all
md5:362a01389a0bd74d6fe46fc1a1ef4db7
355.8 kB Preview Download

Additional details

Funding

European Commission
TransMIXR - Ignite the Immersive Media Sector by Enabling New Narrative Visions 101070109
European Commission
AI4TRUST - AI-based-technologies for trustworthy solutions against disinformation 101070190