Google's Audioset: Reformatted

Bakhtin

doi:10.5281/zenodo.7096702

Published September 18, 2022 | Version v1.0.0

Dataset Open

Google's Audioset: Reformatted

Bakhtin¹

1. Alexander

Google's AudioSet consistently reformatted

During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
 Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.

This dataset reformatting aims to unify the formats of the datasets so that it is possible
to analyse them in the same pipelines, and also make the dataset files compatible
with  psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.

For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 

-Changes in dataset

All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
as a separator). All files have a header as the first line.

-New fields and filenames

Fields are renamed according to the following table, to be compatible with psds_eval:

Old field -> New field
YTID -> filename
segment_id -> filename
start_seconds -> onset
start_time_seconds -> onset
end_seconds -> offset
end_time_seconds -> offset
positive_labels -> event_label
label -> event_label
present -> present

For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.

Files are renamed according to the following table to ensure consisted naming
of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:

Old name -> New name
balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
eval_segments.csv -> audioset_weak_eval.tsv
audioset_train_strong.tsv -> audioset_strong_train.tsv
audioset_eval_strong.tsv -> audioset_strong_eval.tsv
audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)

-Strong dataset changes

Only changes to the Strong dataset are renaming of fields and reordering of columns,
so that both Weak and Strong version have `filename` and `event_label` as first 
two columns.

-Weak dataset changes

-- Labels are given one per line, instead of comma-separated and quoted list

-- To make sure that `filename` format is the same as in Strong version, the following
format change is made:
The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.

-Class labels changes

Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.

-License

Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)

Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

Files

Files (204.8 MB)

Name	Size	Download all
audioset_strong_eval.tsv md5:938b33448072b32a80907cf55ae9d508	5.6 MB	Download
audioset_strong_eval_posneg.tsv md5:7a011a4e17927605e302ee79160ab189	48.6 MB	Download
audioset_strong_train.tsv md5:48e2bcd74136f486e35266319a05b92b	37.7 MB	Download
audioset_weak_eval.tsv md5:437f82368720d0e8b7052d4f72d0a0d6	1.4 MB	Download
audioset_weak_train_balanced.tsv md5:daadbe7f9ab5c0dae26236b4f7ff53ec	1.5 MB	Download
audioset_weak_train_unbalanced.tsv md5:a115b4f1cd012cf763f864256567baaa	109.9 MB	Download
class_labels.tsv md5:90acb7926a5fa532ccf15972181e5aca	13.6 kB	Download

Additional details

Is derived from: Dataset: https://research.google.com/audioset/index.html (URL)
Is identical to: Dataset: https://github.com/bakhtos/GoogleAudioSetReformatted (URL)

Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., ... & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 776-780). IEEE.
Hershey, S., Ellis, D. P., Fonseca, E., Jansen, A., Liu, C., Moore, R. C., & Plakal, M. (2021, June). The benefit of temporally-strong labels in audio event classification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 366-370). IEEE.

	All versions	This version
Views	1,110	1,097
Downloads	438	437
Data volume	14.5 GB	14.5 GB

Google's Audioset: Reformatted

Files

Files (204.8 MB)

Additional details

Related works

References

Google's Audioset: Reformatted

Creators

Description

Files

Files (204.8 MB)

Additional details

Related works

References