Published October 14, 2024 | Version v1
Dataset Open

Beat This! Spectrograms for Beat and Downbeat Tracking

  • 1. ROR icon Johannes Kepler University of Linz

Description

This collection contains mel spectrograms and annotations of 16 datasets for beat and downbeat tracking. All datasets have been used in "Beat This! Accurate beat tracking without DBN postprocessing" (Foscarin/Schlüter/Widmer, ISMIR 2024) and prior publications by other authors, but for many of these datasets, audio data is not publicly available. By publishing the spectrograms, we invite other researchers to improve the state of the art in beat and downbeat tracking.

Datasets

Spectrograms for the following datasets are included in the collection:

  • asap: "ASAP: a dataset of aligned scores and performances for piano transcription" (Foscarin et al., ISMIR 2020)
  • ballroom: "An experimental comparison of audio tempo induction algorithms" (Gouyon et al., TASLP 2006) for the audio and "Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio" (Krebs/Böck/Widmer, ISMIR 2013) for the annotations
  • beatles: "Evaluation methods for musical audio beat tracking algorithms" (Davies/Degara/Plumbley, Tech. Rep., QMU, 2019)
  • candombe: "Beat and Downbeat Tracking Based on Rhythmic Patterns Applied to the Uruguayan Candombe Drumming" (Nunes et al., ISMIR 2015) 
  • filosax: "Filosax: A dataset of annotated jazz saxophone recordings" (Foster/Dixon, ISMIR 2021)
  • groove_midi: "Learning to groove with inverse sequence transformations" (Gillick et al., ICML 2019)
  • gtzan: "Musical genre classification of audio signals" (Tzanetakis/Cook, TSAP 2002) for the audio and "Swing ratio estimation" (Marchand/Peters, DAFx 2015) for the annotations
  • guitarset: "GuitarSet: A dataset for guitar transcription" (Xi et al., ISMIR 2018)
  • hainsworth: "Particle filtering applied to musical tempo tracking" (Hainsworth/Macleod, JASP 2004)
  • harmonix: "The Harmonix set: Beats, downbeats, and functional segment annotations of western popular music" (Nieto et al., ISMIR 2019) for the original and "Modeling Beats and Downbeats with a Time-Frequency Transformer" (Hung et al., ICASSP 2022) for the version included here
  • hjdb: "One in the jungle: Downbeat detection in hardcore, jungle, and drum and bass" (Hockman/Davies/Fujinaga, ISMIR 2012)
  • jaah: "Audio-aligned jazz harmony dataset for automatic chord transcription and corpus-based research" (Eremenko et al., ISMIR 2018)
  • rwc: "RWC music database: Popular, classical and jazz music databases" (Goto et al., ISMIR 2002) for the audio and "AIST annotation for the RWC music
    database" (Goto, ISMIR 2006) for the annotations
  • simac: "A computational approach to rhythm description — Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing" (Gouyon, PhD thesis, UPF, 2005)
  • smc: "Selective sampling for beat tracking evaluation" (Holzapfel et al., TASLP 2012)
  • tapcorrect: "Towards Automatically Correcting Tapped Beat Annotations for Music Recordings" (Driedger et al., ISMIR 2019)

If given, links in the above list point to locations for obtaining the original audio.

Annotations

The corresponding annotations are available on https://github.com/CPJKU/beat_this_annotations. A snapshot of v1.0 is included in this collection as beat_this_annotations.zip, but you may want to use a later release.

Spectrograms

Spectrograms are computed from monophonic audio at a sample rate of 22050 Hz with a window size of 1024 and hop size of 441 samples (yielding 50 frames per second), processed with a mel filterbank of 128 bands from 30 Hz to 11 kHz, and magnitudes scaled with ln(1+1000x). They are provided in half-precision floating-point format. Spectrograms can be reproduced with torchaudio 2.3.1 from a 22050 Hz waveform tensor (resampled with soxr.resample(), if needed) via:

melspect = torchaudio.transforms.MelSpectrogram(sample_rate=22050, n_fft=1024, hop_length=441, f_min=30, f_max=11000, n_mels=128, mel_scale='slaney', normalized='frame_length', power=1)(waveform).mul(1000).log1p()

Format

For each dataset, a compressed .zip file is provided, which in turn holds an uncompressed .npz file. The .npz file holds a set of numpy arrays in subdirectories named after the annotations. Each subdirectory contains a spectrogram of the original audio file ("track.npy"), 11 pitch-shifted versions from -5 to +6 semitones ("track_ps-5.npy" to "track_ps6.npy") and 10 time-stretched versions from -20% to +20% ("track_ts-20.npy" to "track_ts20.npy"), except for gtzan.npz, which is designated for testing and only holds the original audio files. The .npz files can be loaded in numpy via np.load(), or unzipped into a set of .npy files that can again be loaded via np.load(). We also provide code to load .npz files as memory maps for more efficiency.

Files

gtzan.zip

Files (141.2 GB)

Name Size Download all
md5:e57297ed45f08936b49e5982ff7188f9
32.9 GB Preview Download
md5:8c2bc5363dd505d9122cbc65af0a58a1
4.8 GB Preview Download
md5:a11e22e5f9aec7b14d8dd92b6251117c
7.2 MB Preview Download
md5:f9e04136199887a0f6a7db71fd71c09e
6.5 GB Preview Download
md5:6c30e2114f358e543a7decf955b28c0c
2.1 GB Preview Download
md5:839deac5549836974ce1484217931cc8
4.0 GB Preview Download
md5:27838e54859ed41d596dc9dcdeeb483c
8.8 GB Preview Download
md5:39a7dfe6a6b0a5279a94d770506db879
306.9 MB Preview Download
md5:2bd210bf3e994065641410f2c0bb00fe
1.4 GB Preview Download
md5:78b4736564cb03faeb00544571cbf807
2.7 GB Preview Download
md5:16822b45ecb82caa94163e1608d9990a
45.1 GB Preview Download
md5:8d1b27d6dd45e095f29ddd5c66f1520c
2.7 GB Preview Download
md5:11ad4cb5f8bd5f9ece09812cfcd17a92
5.7 GB Preview Download
md5:2584073979b766a9537a74c0bf98d8b9
13.7 GB Preview Download
md5:4d760af6b8b576334cbb6faa711e7888
2.7 GB Preview Download
md5:32c2640f854ba29fb86be9ac6b84532f
2.1 GB Preview Download
md5:60cf19ed281eabf6cca96f1412cf3fbb
5.7 GB Preview Download

Additional details

Related works

Is published in
Conference paper: arXiv:2407.21658 (arXiv)

Software

Repository URL
https://github.com/CPJKU/beat_this
Programming language
Python
Development Status
Active