Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023
Creators
Description
Matching images to articles is challenging and can be considered a special version of the cross-media
retrieval problem. This notebook paper presents our solution for the MediaEval NewsImages 2023
benchmarking task. We investigate the performance of pre-trained cross-modal networks. Specifically, we
investigate two pre-trained CLIP model variations and fine-tuned one for domain adaptation. Additionally,
we utilize a data augmentation technique and a method for revising the similarities produced by either
one of the networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report
the official results for our submitted runs and additional experiments we conducted to evaluate our runs
internally. We conclude that fine-tuning benefits the performance, and it is important to consider the
data’s nature when selecting the appropriate pre-trained CLIP model.
Files
MediaEval_2023.pdf
Files
(629.9 kB)
Name | Size | Download all |
---|---|---|
md5:abf8229cf48bf9d391867301b65c1349
|
629.9 kB | Preview Download |
Additional details
Funding
- European Commission
- AI4TRUST – AI-based-technologies for trustworthy solutions against disinformation 101070190
- European Commission
- CRiTERIA – Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks 101021866