There is a newer version of the record available.

Published September 21, 2025 | Version v1
Conference paper Open

Simple and Effective Semantic Song Segmentation

Description

We propose a simple, yet effective approach to semantic song segmentation. Our model is a convolutional neural network trained to jointly predict frame-wise boundary activation functions and segment label probabilities. The input features consist of a log-magnitude log-frequency spectrogram and self-similarity lag matrices, combining modern deep learning approaches with hand-crafted features. To evaluate our approach, we first examine commonly used datasets and find substantial overlap (up to 22%) between training and testing sets (SALAMI vs. RWC-Pop). As this overlap invalidates meaningful comparisons, we propose using the previously unexplored McGill Billboard dataset for testing. We carefully eliminate duplicate entries between McGill Billboard and other datasets through both audio fingerprinting and string-matching of song titles and artist names. Using the resulting set of 719 tracks, we demonstrate the effectiveness of our approach.

Files

000084.pdf

Files (162.5 kB)

Name Size Download all
md5:6cc8647c29731ab0d5d6a32005135791
162.5 kB Preview Download