Incorporating Textual Similarity in Video Captioning Schemes

Gkountakos, Konstantinos; Dimou, Anastasios; Papadopoulos, Georgios Th.; Daras, Petros

doi:10.1109/ICE.2019.8792602

Published August 12, 2019 | Version v1

Conference paper Open

Incorporating Textual Similarity in Video Captioning Schemes

1. CERTH

The problem of video captioning has been heavily investigated from the research community the last years and, especially, since Recurrent Neural Networks (RNNs) have been introduced. Aforementioned approaches of video captioning, are usually based on sequence-to-sequence models that aim to exploit the visual information by detecting events, objects, or via matching entities to words. However, the exploitation of the contextual information that can be extracted from the vocabulary has not been investigated yet, except from approaches that make use of parts of speech such as verbs, nouns, and adjectives. The proposed approach is based on the assumption that textually similar captions should represent similar visual content. Specifically, we propose a novel loss function that penalizes/rewards the wrong/correct predicted words based on the semantic cluster that they belong to. The proposed method is evaluated using two widely-known datasets in the video captioning domain, Microsoft Research - Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus (MSVD). Finally, experimental analysis proves that the proposed method outperforms the baseline approach in most cases.

Files

Incorporating Textual Similarity in Video Captioning Schemes.pdf

Files (392.8 kB)

Name	Size	Download all
Incorporating Textual Similarity in Video Captioning Schemes.pdf md5:1e2476051c9520268499e8c78c185b2c	392.8 kB	Preview Download

Additional details

European Commission
ANITA - Advanced tools for fighting oNline Illegal TrAfficking 787061

	All versions	This version
Views	201	198
Downloads	261	261
Data volume	103.7 MB	103.7 MB

Incorporating Textual Similarity in Video Captioning Schemes

Creators

Description

Files

Incorporating Textual Similarity in Video Captioning Schemes.pdf

Files (392.8 kB)

Additional details

Funding