Published September 23, 2018 | Version v1
Conference paper Open

OpenMIC-2018: An Open Data-set for Multiple Instrument Recognition

Description

Identification of instruments in polyphonic recordings is a challenging, but fundamental problem in music information retrieval. While there has been significant progress in developing predictive models for this and related classification tasks, we as a community lack a common data-set which is large, freely available, diverse, and representative of naturally occurring recordings. This limits our ability to measure the efficacy of computational models. This article describes the construction of a new, open data-set for multi-instrument recognition. The dataset contains 20,000 examples of Creative Commons-licensed music available on the Free Music Archive. Each example is a 10-second excerpt which has been partially labeled for the presence or absence of 20 instrument classes by annotators on a crowd-sourcing platform. We describe in detail how the instrument taxonomy was constructed, how the dataset was sampled and annotated, and compare its characteristics to similar, previous data-sets. Finally, we present experimental results and baseline model performance to motivate future work.

Files

248_Paper.pdf

Files (752.8 kB)

Name Size Download all
md5:f0973cb0d321c2b2df09ab95642896f7
752.8 kB Preview Download