Published September 18, 2022 | Version v1.0.0
Dataset Open

Google's Audioset: Reformatted

Creators

  • 1. Alexander

Description

Google's AudioSet consistently reformatted

During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
 Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.

This dataset reformatting aims to unify the formats of the datasets so that it is possible
to analyse them in the same pipelines, and also make the dataset files compatible
with  psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.

For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 

-Changes in dataset

All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
as a separator). All files have a header as the first line.

-New fields and filenames

Fields are renamed according to the following table, to be compatible with psds_eval:

Old field -> New field
YTID -> filename
segment_id -> filename
start_seconds -> onset
start_time_seconds -> onset
end_seconds -> offset
end_time_seconds -> offset
positive_labels -> event_label
label -> event_label
present -> present

For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.

Files are renamed according to the following table to ensure consisted naming
of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:

Old name -> New name
balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
eval_segments.csv -> audioset_weak_eval.tsv
audioset_train_strong.tsv -> audioset_strong_train.tsv
audioset_eval_strong.tsv -> audioset_strong_eval.tsv
audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)

-Strong dataset changes

Only changes to the Strong dataset are renaming of fields and reordering of columns,
so that both Weak and Strong version have `filename` and `event_label` as first 
two columns.

-Weak dataset changes

-- Labels are given one per line, instead of comma-separated and quoted list

-- To make sure that `filename` format is the same as in Strong version, the following
format change is made:
The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.

-Class labels changes

Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.

-License

Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)

Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

Files

Files (204.8 MB)

Name Size Download all
md5:938b33448072b32a80907cf55ae9d508
5.6 MB Download
md5:7a011a4e17927605e302ee79160ab189
48.6 MB Download
md5:48e2bcd74136f486e35266319a05b92b
37.7 MB Download
md5:437f82368720d0e8b7052d4f72d0a0d6
1.4 MB Download
md5:daadbe7f9ab5c0dae26236b4f7ff53ec
1.5 MB Download
md5:a115b4f1cd012cf763f864256567baaa
109.9 MB Download
md5:90acb7926a5fa532ccf15972181e5aca
13.6 kB Download

Additional details

Related works

References

  • Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., ... & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 776-780). IEEE.
  • Hershey, S., Ellis, D. P., Fonseca, E., Jansen, A., Liu, C., Moore, R. C., & Plakal, M. (2021, June). The benefit of temporally-strong labels in audio event classification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 366-370). IEEE.