BAF: an audio fingerprinting dataset for broadcast monitoring
Creators
- 1. BMAT Licensing S.L.. Music Technology Group, Universitat Pompeu Fabra
- 2. BMAT Licensing S.L.
- 3. Music Technology Group, Universitat Pompeu Fabra
- 4. Epidemic Sound
- 5. IPEM, Ghent University
Description
Overview
Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound's private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.
It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.
Purpose of the dataset
This dataset aims to become the standard dataset to evaluate Audio Fingerprinting algorithms since it’s built on real data, without the use of any data-augmentation techniques. It is also the first dataset to address background music fingerprinting, which is a real problem in royalties distribution.
Dataset use
This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis.
About the data
All audio files are monophonic, 8kHz, 128kb/s, pcm_s16le encoded in .wav. Annotations mark which tracks sound (either in foreground or background) in each query (if any) and also the specific times where it starts and ends sound in the query.
Note that there are 88 queries that do not have any matches.
For more information check the dedicated Github repository: https://github.com/guillemcortes/baf-dataset and the dataset datasheet included in the files.
Dataset contents
The dataset is structured following this schema
baf-dataset/
├── baf_datasheet.pdf
├── annotations.csv
├── changelog.md
├── cross_annotations.csv
├── queries_info.csv
├── queries
│ ├── query_0001.wav
│ ├── query_0002.wav
│ ├── …
│ └── query_3425.wav
├── queries_info.csv
└── references
├── ref_0001.wav
├── ref_0002.wav
├── …
└── ref_2000.wav
There are two folders named queries and references containing the wav files of TV broadcast recordings and the reference tracks, respectively.
annotations.csv file contains the annotations made by the 6 annotators, giving the following information:
query | reference | query_start | query_end | annotator |
---|---|---|---|---|
query_0692.wav | ref_1235.wav | 0.0 | 59.904 | annotator_6 |
cross_annotations.csv contains the resulting annotations after merging the overlapping annotations in annotations.csv file. x_tag has three different values:
-
single: the segment has only been annotated by one annotator.
-
majority: the segment has been annotated by two annotators.
-
unanimity: the segment has been annotated by the three annotators.
query | reference | query_Start | query_end | annotators | x_tag |
---|---|---|---|---|---|
query_0693.wav | ref_1834.wav | 37.53 | 38.07 | ['annotator_3'] | single |
query_0693.wav | ref_1834.wav | 18.18 | 37.48 | ['annotator_3', 'annotator_5', 'annotator_3'] | unanimity |
query_0693.wav | ref_1834.wav | 37.48 | 37.53 | ['annotator_5', 'annotator_3'] | majority |
queries_info.csv contains information about the queries as a citation reference. It contains the country, the channel and the date where the broadcast happened.
filename | country | channel | datetime |
---|---|---|---|
query_0001.wav | Norway | Discovery Channel | 2021-02-26 14:45:26 |
changelog.md contains a curated, chronologically ordered list of notable changes for each version of the dataset.
baf_datasheet.pdf contains standardized documentation for datasets
Ownership of the data
Next, we specify the ownership of all the data included in BAF: Broadcast Audio Fingerprinting dataset. For licensing information, please refer to the “License” section.
Reference tracks
The reference tracks are owned by Epidemic Sound AB, which has given a worldwide, revocable, non-exclusive, royalty-free licence to use and reproduce this data collection consisting of 2,000 low-quality monophonic 8kHz downsampled audio recordings.
Query tracks
The query tracks come from publicly available TV broadcast emissions so the ownership of each recording belongs to the channel that emitted the content. We publish them under the right of quotation provided by the Berne Convention.
Annotations
Guillem Cortès together with Alex Ciurana and Emilio Molina from BMAT Music Licensing S.L. have managed the annotation therefore the annotations belong to BMAT.
Accessing the dataset
The dataset is available upon request. Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. Bear in mind that this information is important for the evaluation of every access request.
License
This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis. Given the different ownership of the elements of the dataset, the dataset is licensed under the following conditions:
-
User’s access request
-
Research only, non-commercial purposes
-
No adaptations nor derivative works
-
Attribution to Epidemic Sound and the authors as it is indicated in the ”citation” section.
Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset.
Acknowledgments
With the support of Ministerio de Ciencia Innovación y universidades through Retos-Colaboración call, reference: RTC2019-007248-7, and also with the support of the Industrial Doctorates Plan of the Secretariat of Universities and Research of the Department of Business and Knowledge of the Generalitat de Catalunya. Reference: DI46-2020.