Published September 23, 2018 | Version v1
Conference paper Open

Improving Peak-picking Using Multiple Time-step Loss Functions

Description

The majority of state-of-the-art methods for music information retrieval (MIR) tasks now utilise deep learning methods reliant on minimisation of loss functions such as cross entropy. For tasks that include framewise binary classification (e.g., onset detection, music transcription) classes are derived from output activation functions by identifying points of local maxima, or peaks. However, the operating principles behind peak picking are different to that of the cross entropy loss function, which minimises the absolute difference between the output and target values for a single frame. To generate activation functions more suited to peak-picking, we propose two versions of a new loss function that incorporates information from multiple time-steps: 1) multi-individual, which uses multiple individual time-step cross entropies; and 2) multi-difference, which directly compares the difference between sequential time-step outputs. We evaluate the newly proposed loss functions alongside standard cross entropy in the popular MIR tasks of onset detection and automatic drum transcription. The results highlight the effectiveness of these loss functions in the improvement of overall system accuracies for both MIR tasks. Additionally, directly comparing the output from sequential time-steps in the multidifference approach achieves the highest performance.

Files

25_Paper.pdf

Files (4.7 MB)

Name Size Download all
md5:612bbdec851966ab6769b5dce0ea445f
4.7 MB Preview Download