Published July 4, 2024 | Version 1.0
Dataset Open

Sound-VECaps

  • 1. ROR icon University of Surrey

Description

This is the dataset for Sound-VECaps, a large-scale audio dataset with visual-enhanced captions. 

We also release the dataset for AudioCaps-Enhanced, the visual-enhanced AudioCaps testing dataset as the new benchmark. 

Files

Sound-VECaps_full.csv

Files (746.7 MB)

Name Size Download all
md5:b0afe7ea79a5d82ab2a556f673807fe7
814.4 kB Preview Download
md5:999cd376b980f7afca19219bd5bec2cf
1.2 MB Preview Download
md5:a82fc2c600e1e332c816e07e71d82281
327.6 MB Preview Download
md5:e975cb5bbf37538962a5c6434ba97c8f
417.1 MB Preview Download