Dataset Open Access

Synth-Salience Choral Set

Helena Cuesta; Emilia Gómez

JSON Export

  "files": [
      "links": {
        "self": ""
      "checksum": "md5:6b83cef701b3c4703af741b55618569a", 
      "bucket": "3635b711-5cdb-4382-af03-0e97c59b96ad", 
      "key": "", 
      "type": "zip", 
      "size": 2278328229
  "owners": [
  "doi": "10.5281/zenodo.6534429", 
  "stats": {
    "version_unique_downloads": 1.0, 
    "unique_views": 62.0, 
    "views": 75.0, 
    "version_views": 75.0, 
    "unique_downloads": 1.0, 
    "version_unique_views": 62.0, 
    "volume": 2278328229.0, 
    "version_downloads": 1.0, 
    "downloads": 1.0, 
    "version_volume": 2278328229.0
  "links": {
    "doi": "", 
    "conceptdoi": "", 
    "bucket": "", 
    "conceptbadge": "", 
    "html": "", 
    "latest_html": "", 
    "badge": "", 
    "latest": ""
  "conceptdoi": "10.5281/zenodo.6534428", 
  "created": "2022-05-13T12:05:11.980080+00:00", 
  "updated": "2022-05-14T01:50:13.888634+00:00", 
  "conceptrecid": "6534428", 
  "revision": 2, 
  "id": 6534429, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.6534429", 
    "description": "<p>The <strong>Synth-salience Choral Set</strong> (SSCS) is a publicly available dataset for voice assignment based on pitch salience.&nbsp;</p>\n\n<p>The dataset was created to support research on voice assignment based on pitch salience.&nbsp;By definition, an &ldquo;ideal&rdquo; pitch salience representation of a music recording is zero everywhere where there is no perceptible pitch, and has a positive value that reflects the pitches&rsquo; perceived energy at the frequency bins of the corresponding F0 values. In practice, for a normalized synthetic pitch salience function we assume a value equal to the maximum energy (salience), i. e., 1, in the time-frequency bins that correspond to the notes present in a song, and 0 elsewhere. We obtain such a synthetic pitch salience representation directly by processing the digital (MusicXML, MIDI) score of a music piece, using the desired time and frequency quantization, i. e., a time-frequency grid.&nbsp;</p>\n\n<p>To build the SSCS, we collect scores of four-part (SATB) a cappella choral music from the <a href=\"\">Choral Public Domain Library (CPDL)</a>&nbsp;using their API. We assemble a collection of <strong>5381 scores</strong> in MusicXML format, which we subsequently convert into MIDI files for an easier parsing.</p>\n\n<p><br>\nEach song in the dataset comprises five CSV files: one with the polyphonic pitch salience representation of the four voices (*_mix.csv) and four additional files with the monophonic pitch salience representation of each voice separately (*_S/A/T/B.csv). In both cases, the asterisk refers to the name of the song, which is shared between all representations from the same song.<br>\nBesides the pitch salience files, we provide a metadata CSV file (sscs_metadata.csv) which indicates the associated CPDL URL for each song in the dataset.&nbsp;Note that this dataset contains the input/output features used in the cited&nbsp;study, i.e., salience functions, and not audio files nor scores. However, the accompanying&nbsp;metadata file allows researchers to access the associated open access scores for each example in the dataset.</p>\n\n<p>When using this dataset for your research, please cite:</p>\n\n<p>Helena Cuesta and Emilia G&oacute;mez (2022).&nbsp;<strong>Voice Assignment in Vocal Quartets using Deep Learning Models based on Pitch Salience</strong>. Transactions of the International Society for Music Information Retrieval (TISMIR).&nbsp;<em>To appear.</em></p>\n\n<p>Helena Cuesta (2022). <strong>Data-driven Pitch Content Description of Choral Singing Recordings</strong>. PhD thesis. Universitat Pompeu Fabra, Barcelona.</p>\n\n<p>&nbsp;</p>", 
    "license": {
      "id": "CC-BY-4.0"
    "title": "Synth-Salience Choral Set", 
    "journal": {
      "title": "Transactions of the International Society for Music Information Retrieval (TISMIR)"
    "relations": {
      "version": [
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "6534428"
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "6534429"
    "version": "1.0.0", 
    "publication_date": "2022-05-10", 
    "creators": [
        "orcid": "0000-0001-8531-4487", 
        "affiliation": "Universitat Pompeu Fabra", 
        "name": "Helena Cuesta"
        "affiliation": "Joint Research Centre", 
        "name": "Emilia G\u00f3mez"
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    "related_identifiers": [
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.6534428", 
        "relation": "isVersionOf"
All versions This version
Views 7575
Downloads 11
Data volume 2.3 GB2.3 GB
Unique views 6262
Unique downloads 11


Cite as