mirdata: Software for Reproducible Usage of Datasets

Rachel Bittner; Magdalena Fuentes; David Rubinstein; Andreas Jansson; Keunwoo Choi; Thor Kell

doi:10.5281/zenodo.3527750

Published November 4, 2019 | Version v1

Conference paper Open

mirdata: Software for Reproducible Usage of Datasets

There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the "same" datasets problematic. In this paper, we first show how (often unknown) differences in datasets can lead to significantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user's data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-specific analysis.

Files

ismir2019_paper_000009.pdf

Files (482.1 kB)

Name	Size	Download all
ismir2019_paper_000009.pdf md5:99c06ce90d26c93eb7c0096a262d9cdc	482.1 kB	Preview Download

Views

427

Downloads

Show more details

	All versions	This version
Views	1,209	1,204
Downloads	427	426
Data volume	222.3 MB	221.8 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 20th International Society for Music Information Retrieval Conference, 99-106. Delft, The Netherlands.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2019) , Delft, The Netherlands, November 4-8, 2019

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 4, 2019
Modified: July 22, 2024

mirdata: Software for Reproducible Usage of Datasets

Authors/Creators

Description

Files

ismir2019_paper_000009.pdf

Files (482.1 kB)