Thesis Open Access

SATB Voice Segregation For Monoaural Recordings

Pétermann, Darius A,


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.4091247</identifier>
  <creators>
    <creator>
      <creatorName>Pétermann, Darius A,</creatorName>
      <affiliation>Universitat Pompeu Fabra</affiliation>
    </creator>
  </creators>
  <titles>
    <title>SATB Voice Segregation For Monoaural Recordings</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>source separation, singing voice, SATB recording, convolutional neural networks, choir music.</subject>
  </subjects>
  <contributors>
    <contributor contributorType="Supervisor">
      <contributorName>Chandna, Pritish</contributorName>
      <givenName>Pritish</givenName>
      <familyName>Chandna</familyName>
      <affiliation>Universitat Pompeu Fabra</affiliation>
    </contributor>
    <contributor contributorType="Supervisor">
      <contributorName>Bonada, Jordi</contributorName>
      <givenName>Jordi</givenName>
      <familyName>Bonada</familyName>
      <affiliation>Universitat Pompeu Fabra</affiliation>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2020-09-15</date>
  </dates>
  <resourceType resourceTypeGeneral="Text">Thesis</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4091247</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4091246</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/mtgupf</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/smc-master</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/3.0/legalcode">Creative Commons Attribution 3.0 Unported</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced&amp;nbsp;setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass&amp;nbsp;(SATB), each with its own range of fundamental frequencies (F0s). The task of&amp;nbsp;source separation for this choral setting entails separating the SATB mixture into&amp;nbsp;its constituent parts. Source separation for musical mixtures is well studied and&amp;nbsp;many Deep Learning-based methodologies have been proposed for the same. However,&lt;br&gt;
most of the research has been focused on a typical case which consists in&lt;br&gt;
separating vocal, percussion and bass sources from a mixture, each of which has a&amp;nbsp;distinct spectral structure. In contrast, the simultaneous and harmonic nature of&amp;nbsp;ensemble singing leads to high structural similarity and overlap between the spectral&amp;nbsp;components of the sources in a choral mixture, making source separation for&amp;nbsp;choirs a harder task than the typical case. This, along with the lack of an appropriate&amp;nbsp;consolidated dataset has led to a dearth of research in the field so far. In&amp;nbsp;this work we first assess how well some of the recently developed methodologies for&amp;nbsp;musical source separation perform for the case of SATB choirs. We then propose a&amp;nbsp;novel domain-specific adaptation for conditioning the recently proposed U-Net architecture&lt;br&gt;
for musical source separation using the fundamental frequency contour of&lt;br&gt;
each of the singing groups and demonstrate that our proposed approach surpasses&amp;nbsp;results from domain-agnostic architectures. Lastly we assess our approach using&amp;nbsp;different evaluation methodologies, going from objective to subjective-based ones,&amp;nbsp;and provide a comparative analysis of the various results.&lt;/p&gt;</description>
  </descriptions>
</resource>
198
138
views
downloads
All versions This version
Views 198198
Downloads 138138
Data volume 1.7 GB1.7 GB
Unique views 165165
Unique downloads 116116

Share

Cite as