<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Fabian-Robert Stöter</dc:creator>
  <dc:creator>Soumitro Chakrabarty</dc:creator>
  <dc:creator>Emanuël Habets</dc:creator>
  <dc:creator>Bernd Edler</dc:creator>
  <dc:date>2018-04-16</dc:date>
  <dc:description>&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;LibriCount10 0dB Dataset&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;This is the description to the LibriCount10 synthetic dataset for speaker count estimation.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Therefore for each recording we provide the ground truth number of speakers within the file name, where `k` in, `k_uniquefile.wav` is the maximum number of concurrent speakers with the 5 seconds of recording.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the &amp;lt;a href="http://www.openslr.org/12/"&amp;gt;LibriSpeech&amp;lt;/a&amp;gt; `CleanTest` dataset.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;All recordings are of 5s durations, and all speakers are active for the most part of the recording.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation `json` file with the same name as the recording.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Metadata&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;In the annotation file we provide information about the speakers sex, their unique speaker_id, and vocal activity within the mixture recording in samples. Note that these were automatically generated using a &amp;lt;a href="https://github.com/wiseman/py-webrtcvad"&amp;gt;voice activity detection&amp;lt;/a&amp;gt; system.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;In the following example the annotation shows a speaker count of 3 speakers as can be extracted from the number of elements in the list:&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code class="language-json"&amp;gt;[
    {
        "sex": "F",
        "activity": [[0, 51076], [51396, 55400], [56681, 80000]], 
        "speaker_id": 1221
    },
    {
        "sex": "F",
        "activity": [[0, 51877], [56201, 80000]],
        "speaker_id": 3570
    },
    {
        "sex": "M",
        "activity": [[0, 15681], [16161, 68213], [73498, 80000]], 
        "speaker_id": 5105
    }
]&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;br&amp;gt;
&amp;nbsp;&amp;lt;/p&amp;gt;</dc:description>
  <dc:identifier>https://doi.org/10.5281/zenodo.1216072</dc:identifier>
  <dc:identifier>oai:zenodo.org:1216072</dc:identifier>
  <dc:publisher>Zenodo</dc:publisher>
  <dc:relation>https://doi.org/10.5281/zenodo.1216071</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>Creative Commons Attribution 4.0 International</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:source>ICASSP 2018, Calgary, Canada</dc:source>
  <dc:subject>audio</dc:subject>
  <dc:subject>dataset</dc:subject>
  <dc:subject>speaker count estimation</dc:subject>
  <dc:title>LibriCount, a dataset for speaker count estimation</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
</oai_dc:dc>