Published September 21, 2025
| Version v1
Conference paper
Open
Simple and Effective Semantic Song Segmentation
Authors/Creators
Description
We propose a simple, yet effective approach to semantic song segmentation. Our model is a convolutional neural network trained to jointly predict frame-wise boundary activation functions and segment label probabilities. The input features consist of a log-magnitude log-frequency spectrogram and self-similarity lag matrices, combining modern deep learning approaches with hand-crafted features.
To evaluate our approach, we first examine commonly used datasets and find substantial overlap (up to 22%) between training and testing sets (SALAMI vs. RWC-Pop). As this overlap invalidates meaningful comparisons, we propose using the previously unexplored McGill Billboard dataset for testing. We carefully eliminate duplicate entries between McGill Billboard and other datasets through both audio fingerprinting and string-matching of song titles and artist names. Using the resulting set of 719 tracks, we demonstrate the effectiveness of our approach.
Files
000084.pdf
Files
(162.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6cc8647c29731ab0d5d6a32005135791
|
162.5 kB | Preview Download |