Text to audio grounding (TAG) dataset: AudioGrounding

Xuenan Xu

doi:10.5281/zenodo.10033508

Published April 19, 2023 | Version 2.2

Dataset Open

Text to audio grounding (TAG) dataset: AudioGrounding

Xuenan Xu

AudioGrounding dataset, including audio files and timestamp annotations.

Changes in version 2: The train/validation/test sets are re-split. The validation and test annotations are refined.

----------------------------------------------------------

References

[1] Xuenan Xu, Heinrich Dinkel, Mengyue Wu and Kai Yu. "Text-to-audio grounding: Building correspondence between captions and sound events." In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 606-610.

[2] Xuenan Xu, Mengyue Wu, and Kai Yu. "Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning." In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2023, pp. 1-5.

Files

audio.zip

Files (2.5 GB)

Name	Size
audio.zip md5:0634c9a8eb12a844f3772a1bc22ed1f4	2.5 GB	Preview Download
test.json md5:e67693e1ec7c8a040db1aeddf1b40118	475.0 kB	Preview Download
train.json md5:09fe45cfab1be904f41f2182f261b7ab	4.0 MB	Preview Download
val.json md5:917058633e744d08a7cf48d4161be960	486.9 kB	Preview Download

	All versions	This version
Views	1,388	332
Downloads	1,305	477
Data volume	1.5 TB	552.3 GB

Text to audio grounding (TAG) dataset: AudioGrounding

Authors/Creators

Description

Files

audio.zip

Files (2.5 GB)