Published May 10, 2025 | Version v1
Dataset Open

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

  • 1. ROR icon Johannes Kepler University of Linz

Description

TACOS is a collection of 12,358 audio recordings, annotated with 47,748 temporally strong audio captions (i.e., textual descriptions of sound events and their corresponding temporal onsets and offsets). Each audio file is additionally paired with a weak caption, which was automatically generated from the strong captions using OpenAI’s gpt-4o-mini-2024-07-18.

If you use TACOS in your work, please cite our paper (preprint available on arXiv).

A usage example, and more deatils are provided in our GitHub repository (available soon).

Licensing

The audio recordings contained in audio.zip and the associated metadata in metadata.csv were sourced from the FreeSound platform [1]. Each recording and its corresponding metadata are governed by their individual licenses, which are specified—along with the creator's information—in the metadata.csv file.

The captions provided in annotations_strong.csv and annotations_weak.csv are licensed under the CC BY 4.0, which permits use under the condition of appropriate attribution.

References

[1] Frederic Font, Gerard Roma, and Xavier Serra. Freesound technical demo. In Proceedings of the 21st ACM Multimedia Conference (MM '13), Barcelona, Spain, October 21-25, 2013

Files

annotations_strong.csv

Files (1.6 GB)

Name Size Download all
md5:882d5a5a28f59441f4c7b4ed17ebab05
4.1 MB Preview Download
md5:1445727f4d470273a7395ddcb9e34b93
1.3 MB Preview Download
md5:a1b11ed7a88d6b95109decb577c128fd
1.6 GB Preview Download
md5:bee28197d13e859dbc36021c2a928f07
112.8 kB Preview Download
md5:17e8df44f2d217dc7cf4b75767ee5ab3
8.1 MB Preview Download
md5:16f5b5eab84753a571e9944739508854
21.8 kB Preview Download

Additional details

Related works

Is described by
Preprint: arXiv:2505.07609 (arXiv)