<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Frank Zalkow</dc:creator>
  <dc:creator>Meinard Müller</dc:creator>
  <dc:date>2020-10-11</dc:date>
  <dc:description>Many music information retrieval tasks involve the comparison of a symbolic score representation with an audio recording. A typical strategy is to compare score–audio pairs based on a common mid-level representation, such as chroma features. Several recent studies demonstrated the effectiveness of deep learning models that learn task-specific mid-level representations from temporally aligned training pairs. However, in practice, there is often a lack of strongly aligned training data, in particular for real-world scenarios. In our study, we use weakly aligned score–audio pairs for training, where only the beginning and end of a score excerpt is annotated in an audio recording, without aligned correspondences in between. To exploit such weakly aligned data, we employ the Connectionist Temporal Classification (CTC) loss to train a deep learning model for computing an enhanced chroma representation. We then apply this model to a cross-modal retrieval task, where we aim at finding relevant audio recordings of Western classical music, given a short monophonic musical theme in symbolic notation as a query. We present systematic experiments that show the effectiveness of the CTC-based model for this theme-based retrieval task.</dc:description>
  <dc:identifier>https://doi.org/10.5281/zenodo.4245400</dc:identifier>
  <dc:identifier>oai:zenodo.org:4245400</dc:identifier>
  <dc:publisher>ISMIR</dc:publisher>
  <dc:relation>https://zenodo.org/communities/ismir</dc:relation>
  <dc:relation>https://doi.org/10.5281/zenodo.4245399</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>Creative Commons Attribution 4.0 International</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:source>ISMIR 2020, International Society for Music Information Retrieval Conference, Montreal, Canada, October 11-16, 2020</dc:source>
  <dc:title>Using weakly aligned score–audio pairs to train deep chroma models for cross-modal music retrieval</dc:title>
  <dc:type>info:eu-repo/semantics/conferencePaper</dc:type>
</oai_dc:dc>