TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Description
TACOS is a collection of 12,358 audio recordings, annotated with 47,748 temporally strong audio captions (i.e., textual descriptions of sound events and their corresponding temporal onsets and offsets). Each audio file is additionally paired with a weak caption, which was automatically generated from the strong captions using OpenAI’s gpt-4o-mini-2024-07-18.
If you use TACOS in your work, please cite our paper (preprint available on arXiv).
A usage example, and more deatils are provided in our GitHub repository (available soon).
Licensing
The audio recordings contained in audio.zip
and the associated metadata in metadata.csv
were sourced from the FreeSound platform [1]. Each recording and its corresponding metadata are governed by their individual licenses, which are specified—along with the creator's information—in the metadata.csv
file.
The captions provided in annotations_strong.csv
and annotations_weak.csv
are licensed under the CC BY 4.0, which permits use under the condition of appropriate attribution.
References
[1] Frederic Font, Gerard Roma, and Xavier Serra. Freesound technical demo. In Proceedings of the 21st ACM Multimedia Conference (MM '13), Barcelona, Spain, October 21-25, 2013
Files
annotations_strong.csv
Files
(1.6 GB)
Name | Size | Download all |
---|---|---|
md5:882d5a5a28f59441f4c7b4ed17ebab05
|
4.1 MB | Preview Download |
md5:1445727f4d470273a7395ddcb9e34b93
|
1.3 MB | Preview Download |
md5:a1b11ed7a88d6b95109decb577c128fd
|
1.6 GB | Preview Download |
md5:bee28197d13e859dbc36021c2a928f07
|
112.8 kB | Preview Download |
md5:17e8df44f2d217dc7cf4b75767ee5ab3
|
8.1 MB | Preview Download |
md5:16f5b5eab84753a571e9944739508854
|
21.8 kB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: arXiv:2505.07609 (arXiv)