A Dataset and Metric for Textual Video Content Description

Arzberger, Stefan J.; Raith, Paul; Marion, Jaks; Bailer, Werner

doi:10.1145/3746027.3758224

Published October 27, 2025 | Version v1

Publication Open

A Dataset and Metric for Textual Video Content Description

1. Joanneum Research

Obtaining textual descriptions of the visual content of images and videos is often required in multimedia analysis and retrieval. Traditional video captioning approaches are usually evaluated on very short captions using rather simple metrics from NLP, while multimodal large language model (MLLM)-based approaches are mostly evaluated with question answering, which is query specific. We provide a dataset (FM-V2T) with 258 video clips from a media archive, annotated with detailed manually curated descriptions in English and German (long and short). We propose an LLM-based metric, which assesses the entailment and contradiction of facts extracted from a description with a reference, addressing shortcomings of existing metrics small changes with semantic impact and comparing descriptions with substantially different lengths. We provide experimental results on the reliability of the metric, and apply it to baseline results of three MLLM-based approaches on the FM-V2T dataset, comparing it with other metrics.

Files

ACM_MM2025_Video2Text-5.pdf

Files (684.6 kB)

Name	Size	Download all
ACM_MM2025_Video2Text-5.pdf md5:75342768a59b5a5de27888c07b07e489	684.6 kB	Preview Download

Additional details

European Commission
XRECO - XR mEdia eCOsystem 101070250
Austrian Research Promotion Agency
FAIRmedia

Views

Downloads

Show more details

	All versions	This version
Views	66	66
Downloads	60	60
Data volume	60.9 MB	60.9 MB

More info on how stats are collected....

DOI

Resource type

Publication

Publisher

ACM

Conference

Proceedings of ACM International Conference on Multimedia (ACM MM), Dublin, 27-31 October 2025

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 7, 2025
Modified: October 7, 2025

A Dataset and Metric for Textual Video Content Description

Authors/Creators

Description

Files

ACM_MM2025_Video2Text-5.pdf

Files (684.6 kB)

Additional details

Funding