AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models

doi:10.18653/v1/2021.semeval-1.140

Published August 5, 2021 | Version v1

Conference paper Open

AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models

1. CNR-ISTI

This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.

Files

2021_457536_published.pdf

Files (743.4 kB)

Name	Size	Download all
2021_457536_published.pdf md5:4985a6dab2ad6c2044d59109390ebfb7	743.4 kB	Preview Download

Additional details

AI4Media – A European Excellence Centre for Media, Society and Democracy 951911: European Commission
AI4EU – A European AI On Demand Platform and Ecosystem 825619: European Commission

	All versions	This version
Views	58	58
Downloads	54	54
Data volume	43.1 MB	43.1 MB

AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models

Creators

Description

Files

2021_457536_published.pdf

Files (743.4 kB)

Additional details

Funding