Beyond Benchmarks: A Toolkit for Music Audio Representation Evaluation
Description
Numerous cutting-edge approaches employed in Music Information Retrieval (MIR) tasks are now leveraging representation learning. This technique entails learning meaningful representations of the desired data through a source task, which can act as compact, efficient inputs to separate downstream tasks. With the growing interest in developing general audio representations that are useful for multiple tasks, the need for thorough, consistent, and fair evaluation is more pertinent than ever.
However, evaluation efforts so far are often fragmented, owing to differences in data availability and computational resources, missing implementation details, or lack of agreed-upon design choices. Public benchmarks often opt for a fixed evaluation setup that provides consistency in exchange for a narrower-scoped investigation of MIR systems.
In this master’s thesis project, we present a toolkit for reproducible music audio representation evaluation. The toolkit provides an easy and configurable way to run evaluation experiments for MIR systems utilizing representation learning. It provides a variety of MIR datasets and tasks for evaluating performance given different input representations, embedding extraction frequency, downstream models, and audio perturbations. It also includes tools for exploring and visualizing evaluation results under different experimental setups. The toolkit is primarily focused on aiding the development of music audio representations while ensuring every evaluation experiment is transparent and can be faithfully reproduced. We use the toolkit to conduct an extensive evaluation of multiple representations from widely used music embedding models for a variety of MIR tasks, datasets, and deformation scenarios.
Files
Christos-Plachouras-Master-Thesis-2023.pdf
Files
(1.1 MB)
Name | Size | Download all |
---|---|---|
md5:71e97b6f1f69caefa985e66956ab6d37
|
1.1 MB | Preview Download |