Dataset Open Access


Humphrey, Eric J.; Durand, Simon; McFee, Brian

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Humphrey, Eric J.</dc:creator>
  <dc:creator>Durand, Simon</dc:creator>
  <dc:creator>McFee, Brian</dc:creator>
  <dc:description>The OpenMIC-2018 dataset is made available through a collaboration between Spotify and MARL@NYU. Additionally, the cost of annotation was sponsored by Spotify, whose contributions to open-source research can be found online at the developer site, engineering blog, and public GitHub.

If you use this dataset, please cite the following work:

Humphrey, Eric J., Durand, Simon, and McFee, Brian. "OpenMIC-2018: An Open Dataset for Multiple Instrument Recognition." in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018. [pdf]

The dataset is made available by Spotify AB under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The full terms of this license are included alongside this dataset.

This dataset contains the following:

	10 second snippets of audio, in a directory format like 'audio/{0:3}/{0}.ogg'.format(sample_key)
	VGGish features as JSON objects, in a directory format like 'vggish/{0:3}/{0}.json'.format(sample_key)
	MD5 checksums for each OGG and JSON file
	Anonymized individual responses, in 'openmic-2018-individual-responses.csv'
	Aggregated labels, in 'openmic-2018-aggregated-labels.csv'
	Track metadata, with licenses for each audio recording, in 'openmic-2018-metadata.csv'
	A Python-friendly NPZ file of features and labels, 'openmic-2018.npz'
	Sample partitions for train and test, in 'partitions/*.txt'
All versions This version
Views 6,1456,149
Downloads 5,0935,093
Data volume 13.4 TB13.4 TB
Unique views 5,2935,297
Unique downloads 2,4272,427


Cite as