Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

Keren Shao; Ke Chen; Taylor Berg-Kirkpatrick; Shlomo Dubnov

doi:10.5281/zenodo.10265373

Published November 4, 2023 | Version v1

Conference paper Open

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.

Files

000078.pdf

Files (1.6 MB)

Name	Size	Download all
000078.pdf md5:8d45d520d17a8f017cc0abe3a7ab7fdd	1.6 MB	Preview Download

164

Views

170

Downloads

Show more details

	All versions	This version
Views	164	164
Downloads	170	170
Data volume	313.2 MB	313.2 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 24th International Society for Music Information Retrieval Conference, 657-663. Milan, Italy.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2023) , Milan, Italy, November 5-9, 2023

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 5, 2023
Modified: July 10, 2024

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

Creators

Description

Files

000078.pdf

Files (1.6 MB)