Thesis Open Access

SATB Voice Segregation For Monoaural Recordings

Pétermann, Darius A,


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">source separation, singing voice, SATB recording, convolutional neural networks, choir music.</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
    <subfield code="c">Universitat Pompeu Fabra</subfield>
  </datafield>
  <controlfield tag="005">20201016002656.0</controlfield>
  <controlfield tag="001">4091247</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Universitat Pompeu Fabra</subfield>
    <subfield code="4">ths</subfield>
    <subfield code="a">Chandna, Pritish</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Universitat Pompeu Fabra</subfield>
    <subfield code="4">ths</subfield>
    <subfield code="a">Bonada, Jordi</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">12076163</subfield>
    <subfield code="z">md5:0d052cb5f9231ccff0caf7a2ffc406c4</subfield>
    <subfield code="u">https://zenodo.org/record/4091247/files/2020-Darius-Petermann.pdf</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-09-15</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="p">user-mtgupf</subfield>
    <subfield code="p">user-smc-master</subfield>
    <subfield code="o">oai:zenodo.org:4091247</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Universitat Pompeu Fabra</subfield>
    <subfield code="a">Pétermann, Darius A,</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">SATB Voice Segregation For Monoaural Recordings</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-mtgupf</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-smc-master</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/3.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 3.0 Unported</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced&amp;nbsp;setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass&amp;nbsp;(SATB), each with its own range of fundamental frequencies (F0s). The task of&amp;nbsp;source separation for this choral setting entails separating the SATB mixture into&amp;nbsp;its constituent parts. Source separation for musical mixtures is well studied and&amp;nbsp;many Deep Learning-based methodologies have been proposed for the same. However,&lt;br&gt;
most of the research has been focused on a typical case which consists in&lt;br&gt;
separating vocal, percussion and bass sources from a mixture, each of which has a&amp;nbsp;distinct spectral structure. In contrast, the simultaneous and harmonic nature of&amp;nbsp;ensemble singing leads to high structural similarity and overlap between the spectral&amp;nbsp;components of the sources in a choral mixture, making source separation for&amp;nbsp;choirs a harder task than the typical case. This, along with the lack of an appropriate&amp;nbsp;consolidated dataset has led to a dearth of research in the field so far. In&amp;nbsp;this work we first assess how well some of the recently developed methodologies for&amp;nbsp;musical source separation perform for the case of SATB choirs. We then propose a&amp;nbsp;novel domain-specific adaptation for conditioning the recently proposed U-Net architecture&lt;br&gt;
for musical source separation using the fundamental frequency contour of&lt;br&gt;
each of the singing groups and demonstrate that our proposed approach surpasses&amp;nbsp;results from domain-agnostic architectures. Lastly we assess our approach using&amp;nbsp;different evaluation methodologies, going from objective to subjective-based ones,&amp;nbsp;and provide a comparative analysis of the various results.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4091246</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4091247</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">thesis</subfield>
  </datafield>
</record>
198
138
views
downloads
All versions This version
Views 198198
Downloads 138138
Data volume 1.7 GB1.7 GB
Unique views 165165
Unique downloads 116116

Share

Cite as