Conference paper Open Access
Frank Zalkow; Meinard Müller
{ "description": "Many music information retrieval tasks involve the comparison of a symbolic score representation with an audio recording. A typical strategy is to compare score\u2013audio pairs based on a common mid-level representation, such as chroma features. Several recent studies demonstrated the effectiveness of deep learning models that learn task-specific mid-level representations from temporally aligned training pairs. However, in practice, there is often a lack of strongly aligned training data, in particular for real-world scenarios. In our study, we use weakly aligned score\u2013audio pairs for training, where only the beginning and end of a score excerpt is annotated in an audio recording, without aligned correspondences in between. To exploit such weakly aligned data, we employ the Connectionist Temporal Classification (CTC) loss to train a deep learning model for computing an enhanced chroma representation. We then apply this model to a cross-modal retrieval task, where we aim at finding relevant audio recordings of Western classical music, given a short monophonic musical theme in symbolic notation as a query. We present systematic experiments that show the effectiveness of the CTC-based model for this theme-based retrieval task.", "license": "https://creativecommons.org/licenses/by/4.0/legalcode", "creator": [ { "@type": "Person", "name": "Frank Zalkow" }, { "@type": "Person", "name": "Meinard M\u00fcller" } ], "headline": "Using weakly aligned score\u2013audio pairs to train deep chroma models for cross-modal music retrieval", "image": "https://zenodo.org/static/img/logos/zenodo-gradient-round.svg", "datePublished": "2020-10-11", "url": "https://zenodo.org/record/4245400", "@type": "ScholarlyArticle", "@context": "https://schema.org/", "identifier": "https://doi.org/10.5281/zenodo.4245400", "@id": "https://doi.org/10.5281/zenodo.4245400", "workFeatured": { "url": "https://www.ismir2020.net/", "alternateName": "ISMIR 2020", "location": "Montreal, Canada", "@type": "Event", "name": "International Society for Music Information Retrieval Conference" }, "name": "Using weakly aligned score\u2013audio pairs to train deep chroma models for cross-modal music retrieval" }
All versions | This version | |
---|---|---|
Views | 128 | 128 |
Downloads | 51 | 51 |
Data volume | 42.4 MB | 42.4 MB |
Unique views | 115 | 115 |
Unique downloads | 45 | 45 |