Beat This! Spectrograms for Beat and Downbeat Tracking

Schlüter, Jan; Foscarin, Francesco

doi:10.5281/zenodo.13922116

Published October 14, 2024 | Version v1

Dataset Open

Beat This! Spectrograms for Beat and Downbeat Tracking

1. Johannes Kepler University of Linz

This collection contains mel spectrograms and annotations of 16 datasets for beat and downbeat tracking. All datasets have been used in "Beat This! Accurate beat tracking without DBN postprocessing" (Foscarin/Schlüter/Widmer, ISMIR 2024) and prior publications by other authors, but for many of these datasets, audio data is not publicly available. By publishing the spectrograms, we invite other researchers to improve the state of the art in beat and downbeat tracking.

Datasets

Spectrograms for the following datasets are included in the collection:

asap: "ASAP: a dataset of aligned scores and performances for piano transcription" (Foscarin et al., ISMIR 2020)
ballroom: "An experimental comparison of audio tempo induction algorithms" (Gouyon et al., TASLP 2006) for the audio and "Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio" (Krebs/Böck/Widmer, ISMIR 2013) for the annotations
beatles: "Evaluation methods for musical audio beat tracking algorithms" (Davies/Degara/Plumbley, Tech. Rep., QMU, 2019)
candombe: "Beat and Downbeat Tracking Based on Rhythmic Patterns Applied to the Uruguayan Candombe Drumming" (Nunes et al., ISMIR 2015)
filosax: "Filosax: A dataset of annotated jazz saxophone recordings" (Foster/Dixon, ISMIR 2021)
groove_midi: "Learning to groove with inverse sequence transformations" (Gillick et al., ICML 2019)
gtzan: "Musical genre classification of audio signals" (Tzanetakis/Cook, TSAP 2002) for the audio and "Swing ratio estimation" (Marchand/Peters, DAFx 2015) for the annotations
guitarset: "GuitarSet: A dataset for guitar transcription" (Xi et al., ISMIR 2018)
hainsworth: "Particle filtering applied to musical tempo tracking" (Hainsworth/Macleod, JASP 2004)
harmonix: "The Harmonix set: Beats, downbeats, and functional segment annotations of western popular music" (Nieto et al., ISMIR 2019) for the original and "Modeling Beats and Downbeats with a Time-Frequency Transformer" (Hung et al., ICASSP 2022) for the version included here
hjdb: "One in the jungle: Downbeat detection in hardcore, jungle, and drum and bass" (Hockman/Davies/Fujinaga, ISMIR 2012)
jaah: "Audio-aligned jazz harmony dataset for automatic chord transcription and corpus-based research" (Eremenko et al., ISMIR 2018)
rwc: "RWC music database: Popular, classical and jazz music databases" (Goto et al., ISMIR 2002) for the audio and "AIST annotation for the RWC music
database" (Goto, ISMIR 2006) for the annotations
simac: "A computational approach to rhythm description — Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing" (Gouyon, PhD thesis, UPF, 2005)
smc: "Selective sampling for beat tracking evaluation" (Holzapfel et al., TASLP 2012)
tapcorrect: "Towards Automatically Correcting Tapped Beat Annotations for Music Recordings" (Driedger et al., ISMIR 2019)

If given, links in the above list point to locations for obtaining the original audio.

Annotations

The corresponding annotations are available on https://github.com/CPJKU/beat_this_annotations. A snapshot of v1.0 is included in this collection as beat_this_annotations.zip, but you may want to use a later release.

Spectrograms

Spectrograms are computed from monophonic audio at a sample rate of 22050 Hz with a window size of 1024 and hop size of 441 samples (yielding 50 frames per second), processed with a mel filterbank of 128 bands from 30 Hz to 11 kHz, and magnitudes scaled with ln(1+1000x). They are provided in half-precision floating-point format. Spectrograms can be reproduced with torchaudio 2.3.1 from a 22050 Hz waveform tensor (resampled with soxr.resample(), if needed) via:

melspect = torchaudio.transforms.MelSpectrogram(sample_rate=22050, n_fft=1024, hop_length=441, f_min=30, f_max=11000, n_mels=128, mel_scale='slaney', normalized='frame_length', power=1)(waveform).mul(1000).log1p()

Format

For each dataset, a compressed .zip file is provided, which in turn holds an uncompressed .npz file. The .npz file holds a set of numpy arrays in subdirectories named after the annotations. Each subdirectory contains a spectrogram of the original audio file ("track.npy"), 11 pitch-shifted versions from -5 to +6 semitones ("track_ps-5.npy" to "track_ps6.npy") and 10 time-stretched versions from -20% to +20% ("track_ts-20.npy" to "track_ts20.npy"), except for gtzan.npz, which is designated for testing and only holds the original audio files. The .npz files can be loaded in numpy via np.load(), or unzipped into a set of .npy files that can again be loaded via np.load(). We also provide code to load .npz files as memory maps for more efficiency.

Files

gtzan.zip

Files (141.2 GB)

Name	Size	Download all
asap.zip md5:e57297ed45f08936b49e5982ff7188f9	32.9 GB	Preview Download
ballroom.zip md5:8c2bc5363dd505d9122cbc65af0a58a1	4.8 GB	Preview Download
beat_this_annotations.zip md5:a11e22e5f9aec7b14d8dd92b6251117c	7.2 MB	Preview Download
beatles.zip md5:f9e04136199887a0f6a7db71fd71c09e	6.5 GB	Preview Download
candombe.zip md5:6c30e2114f358e543a7decf955b28c0c	2.1 GB	Preview Download
filosax.zip md5:839deac5549836974ce1484217931cc8	4.0 GB	Preview Download
groove_midi.zip md5:27838e54859ed41d596dc9dcdeeb483c	8.8 GB	Preview Download
gtzan.zip md5:39a7dfe6a6b0a5279a94d770506db879	306.9 MB	Preview Download
guitarset.zip md5:2bd210bf3e994065641410f2c0bb00fe	1.4 GB	Preview Download
hainsworth.zip md5:78b4736564cb03faeb00544571cbf807	2.7 GB	Preview Download
harmonix.zip md5:16822b45ecb82caa94163e1608d9990a	45.1 GB	Preview Download
hjdb.zip md5:8d1b27d6dd45e095f29ddd5c66f1520c	2.7 GB	Preview Download
jaah.zip md5:11ad4cb5f8bd5f9ece09812cfcd17a92	5.7 GB	Preview Download
rwc.zip md5:2584073979b766a9537a74c0bf98d8b9	13.7 GB	Preview Download
simac.zip md5:4d760af6b8b576334cbb6faa711e7888	2.7 GB	Preview Download
smc.zip md5:32c2640f854ba29fb86be9ac6b84532f	2.1 GB	Preview Download
tapcorrect.zip md5:60cf19ed281eabf6cca96f1412cf3fbb	5.7 GB	Preview Download

Additional details

Is published in: Conference paper: arXiv:2407.21658 (arXiv)

Repository URL: https://github.com/CPJKU/beat_this
Programming language: Python
Development Status: Active

	All versions	This version
Views	678	678
Downloads	1,841	1,841
Data volume	61.4 TB	61.4 TB

Beat This! Spectrograms for Beat and Downbeat Tracking

Datasets

Annotations

Spectrograms

Format

Files

gtzan.zip

Files (141.2 GB)

Additional details

Related works

Software

Beat This! Spectrograms for Beat and Downbeat Tracking

Creators

Description

Datasets

Annotations

Spectrograms

Format

Files

gtzan.zip

Files (141.2 GB)

Additional details

Related works

Software