Published July 21, 2025 | Version v1
Model Open

SD-VSum Pretrained Model, trained on the S‑VideoXum Dataset

  • 1. ROR icon Centre for Research and Technology Hellas

Description

This zenodo entry provides the pretrained checkpoint for the SD‑VSum model trained on the S‑VideoXum dataset. SD‑VSum is a script‑driven video summarization method that aligns user‑provided textual scripts with video content via cross‑modal attention, producing highly personalized summaries. This specific checkpoint achieves the performance reported in “SD‑VSum: A Method and Dataset for Script‑Driven Video Summarization” (2025).

Related resources:

- The SD-VSum model and the S-VideoXum dataset are proposed in our paper: M. Mylonas, E. Apostolidis, V. Mezaris, "SD-VSum: A Method and Dataset for Script-Driven Video Summarization", ACM Multimedia 2025, Dublin, Ireland. Preprint: https://arxiv.org/abs/2505.03319

- S-VideoXum dataset on Zenodo: https://zenodo.org/records/15349075

- Code & Model Architecture on GitHub: https://github.com/IDT-ITI/SD-VSum/tree/main

If you find these resources interesting or useful in your research, please cite the following paper: M. Mylonas, E. Apostolidis, V. Mezaris, "SD-VSum: A Method and Dataset for Script-Driven Video Summarization", ACM Multimedia 2025, Dublin, Ireland.

Files

README.md

Files (180.8 MB)

Name Size Download all
md5:66dc003e8a25b38d70c83eec0b4553d9
2.0 kB Preview Download
md5:af433ba769c436c6eec1c54dde5cf651
180.8 MB Download

Additional details

Dates

Available
2025-07