SD-VSum Pretrained Model, trained on the S‑VideoXum Dataset
Authors/Creators
Description
This zenodo entry provides the pretrained checkpoint for the SD‑VSum model trained on the S‑VideoXum dataset. SD‑VSum is a script‑driven video summarization method that aligns user‑provided textual scripts with video content via cross‑modal attention, producing highly personalized summaries. This specific checkpoint achieves the performance reported in “SD‑VSum: A Method and Dataset for Script‑Driven Video Summarization” (2025).
Related resources:
- The SD-VSum model and the S-VideoXum dataset are proposed in our paper: M. Mylonas, E. Apostolidis, V. Mezaris, "SD-VSum: A Method and Dataset for Script-Driven Video Summarization", ACM Multimedia 2025, Dublin, Ireland. Preprint: https://arxiv.org/abs/2505.03319
- S-VideoXum dataset on Zenodo: https://zenodo.org/records/15349075
- Code & Model Architecture on GitHub: https://github.com/IDT-ITI/SD-VSum/tree/main
If you find these resources interesting or useful in your research, please cite the following paper: M. Mylonas, E. Apostolidis, V. Mezaris, "SD-VSum: A Method and Dataset for Script-Driven Video Summarization", ACM Multimedia 2025, Dublin, Ireland.
Files
README.md
Files
(180.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:66dc003e8a25b38d70c83eec0b4553d9
|
2.0 kB | Preview Download |
|
md5:af433ba769c436c6eec1c54dde5cf651
|
180.8 MB | Download |
Additional details
Dates
- Available
-
2025-07