Published February 2, 2024 | Version v1
Conference paper Open

Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023

  • 1. ROR icon Centre for Research and Technology Hellas

Description

Matching images to articles is challenging and can be considered a special version of the cross-media
retrieval problem. This notebook paper presents our solution for the MediaEval NewsImages 2023
benchmarking task. We investigate the performance of pre-trained cross-modal networks. Specifically, we
investigate two pre-trained CLIP model variations and fine-tuned one for domain adaptation. Additionally,
we utilize a data augmentation technique and a method for revising the similarities produced by either
one of the networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report
the official results for our submitted runs and additional experiments we conducted to evaluate our runs
internally. We conclude that fine-tuning benefits the performance, and it is important to consider the
data’s nature when selecting the appropriate pre-trained CLIP model.

Files

MediaEval_2023.pdf

Files (629.9 kB)

Name Size Download all
md5:abf8229cf48bf9d391867301b65c1349
629.9 kB Preview Download

Additional details

Funding

European Commission
AI4TRUST – AI-based-technologies for trustworthy solutions against disinformation 101070190
European Commission
CRiTERIA – Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks 101021866