The IFA Spoken Language Corpus
Contributors
Annotator:
Project member (2):
Rights holder:
Sponsor:
Supervisor:
Description
The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers (out of 10) in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker), speech acquisition and preparation took around 3 person-weeks per speaker. Hand segmentation took 1,000 hours of labeling altogether. The asymptotic segmentation speed was about one word, or four boundaries, per minute. An evaluation showed that the Median Absolute Difference of the segment boundaries was 6 ms between labelers, and 4 ms within labelers. Label differences (substitutions, insertions, and deletions) were found in 8% of the segments between labelers and 5% within labelers. Compiled data are available in relational database format for querying with SQL.
The IFA Spoken Language corpus is currently in version 1.0. This is the "reference" version and the first I consider consistent enough to be usefull. However, the annotations (labeling) still contains errors. This means that there are inconsistencies in a few percent of the labels (e.g., wrong word assignment of syllables/phonemes, stress errors, etc.).
Summary information:
Net time in seconds (excluding all pauses)
| Gender | Age | ID | Recorded sentences (sec) | Segmented sentences (sec) |
| F | 20 | N | 3736 | 2760 |
| F | 28 | G | 4180 | 3978 |
| F | 40 | L | 3112 | 2485 |
| F | 60 | E | 4181 | 3245 |
| M | 15 | R | 2125 | 1439 |
| M | 40 | K | 2720 | 1891 |
| M | 56 | H | 2894 | 2368 |
| M | 66 | O | 3781 | 1696 |
| Total | 26733 | 19867 | ||
| 7:26 hours | 5:31 hours |
Speech in tokens (total)
| Gender | Recorded | Segmented | ||||
| Sentences | Words | Sentences | Words | Syllables | Phonemes | |
| 4F / 4M | 6128 | 73067 | 4492 | 51782 | 74702 | 187544 |
The IFA spoken language corpus is constructed using the Praat speech editting and analysis program. All speech material is accessible with praat.
The Dutch Language Organization (Nederlandse Taalunie) holds all copyrights (unless explicitely stated otherwise) and makes the complete corpus available under the GNU General Public License (see below).
Methods
Audio files
Audio files are stored in AIFC format (16 bit, 44100 Hz). Recording microphones were coded as hm for head-mounted and fm for fixed microphone. Two-channel recordings were split into chunks ("paragraphs") for storage and processing. Chunks were split into single-channel sentences (fm and hm) for word and phoneme segmentation.
Recording equipment
Speech was recorded in a quiet, sound treated room. Recording equipment and a cueing computer were in a separated control room. Two-channel recordings were made with a head-mounted dynamic microphone (hm, Shure SM10A) on one channel and a fixed HF condenser microphone (fm, Sennheiser MKH 105) on the other. Recording was done directly to a Philips Audio CD-recorder, i.e., 16 bit linear coding at 44.1 kHz stereo. A standard sound source (white noise and pure 400 Hz tone) of 78 dB was recorded from a fixed position relative to the fixed microphone to be able to mark the recording level. These reference source recordings are stored with the speech as G[12]N and G[12]T. The head mounted microphone did not allow precise repositioning between sessions, and was even known to move during the sessions (which was noted).
Speakers
Speakers were selected at the Institute of Phonetic Sciences in Amsterdam (IFA) and consisted mostly of staff and students. Non-staff speakers were paid. In total, 18 speakers (9F/9M) completed both recording sessions. All speakers were mother-tongue speakers and none reported speaking or hearing problems. Recordings of 10 speakers (5F/5M) were selected and split into chunks (paragraphs), based on distribution of sex and age, and the quality of the recordings. Recordings of 4 women and 4 men were selected for phonemic segmentation. The ages of the selected speakers ranges from 15 to 66 years of age
Speaking styles
Eight speaking "styles" were recorded from each speaker.
From informal to formal these were:
- Informal story telling face-to-face to an "interviewer" (I)
- Retelling a previously read narrative story without sight contact (R)
And reading aloud: - A narrative story (T)
- A random list of all sentences of the narrative stories (S)
- "Pseudo-sentences" constructed by replacing all words in a sentence with randomly selected words from the text with the same POS tag (PS)
- Lists of selected words from the texts (W)
- Lists of all distinct syllables from the word lists (Sy)
- A collection of idiomatic (the Alphabet, the numbers 0-12) and "diagnostic" sequences (isolated vowels, /hVd/ and /VCV/ lists) (Pr)
The last style was presented in a fixed order, all other lists (S, PS, W, Sy) were (pseudo-)randomized for each speaker before presentation.
Each speaker read aloud from two separate text collections based on narrative texts. During the first recording session, each speaker read from the same two texts (Fixed text type). These texts were based on the Dutch version of "The north wind and the sun", and on a translation of the fairy tale "Jorinde und Joringel". During the second session, each speaker read from texts based on the informal story told during the first recording session (Variable text type). A non-overlapping selection of words was made from each text type (W). Words were selected to maximize coverage of phonemes and diphones and also included the 50 most frequent words from the texts. The word lists were automatically transcribed into phonemes using a simple CELEX* word list lookup and were split into syllables. The syllables were transcribed back into a pseudo-orthography which was readable for Dutch subjects (Sy). The 70 "pseudo-sentences" (PS) were based on the Fixed texts and corrected for syntactic number and gender. They were "semantically unpredictable" and only marginally grammatical.
* Burnage, G. "CELEX - A Guide for Users." Nijmegen: Centre for Lexical Information, University of Nijmegen. 1990.
Table of contents
Name MD5 Size
# All documentation, forms etc
Additional Documents.zip md5:abadd44992ff5ec406ea4060020c56a9 519.1 kB
# Articles describing the IFA corpus
Articles.zip md5:7ded58a30fb87e0c181266c8f23e9acb 647.3 kB
# Audio data for "Can standard analysis tools be used on decompressed speech?"
COCOSDA 2002 compressed audio.zip md5:87951aa19784a624c65993722c032f5e 700.2 MB
# All data in the form of .tsv database files (tab separated values)
DatabaseFiles.zip md5:ee42e7317d8cc7c504536b01d9a7aecd 402.8 MB
# The protocol files for the labeling
LabelProtocol.zip md5:38dba02e2551ddaf8f4451c930423e35 325.7 kB
# All the annotation files as Praat TextGrid files: ASPEX, CELEX, Phonemes, POS, SPEX, Transliterations etc.
Labels-chunks.zip md5:80200e4952786e24c97c540123fb1bf4 626.0 kB
Labels-sentences.zip md5:c1e052b8b9ccf87a348008acfdd65d96 34.2 MB
Labels-validation.zip md5:b902ce5c9f76dd0607fa9087c1cc0170 286.5 kB
# All Transcriptions, scripts, and other auxiliary files
SLcorpus.zip md5:8c730878d4ef986bd092d1eea8234d32 4.1 MB
# Audio files: Chunks
SLspeech-chunks-F20N.zip md5:a1a9c02278bfcf163f4ba27220a8a761 644.8 MB
SLspeech-chunks-F24I.zip md5:325c6f84610a706be8b3072d91152979 624.2 MB
SLspeech-chunks-F28G.zip md5:3cea531286f508267136eb2c87a8b0f8 682.0 MB
SLspeech-chunks-F40L.zip md5:ea04e9a331dcdb678ee91eca24185247 551.9 MB
SLspeech-chunks-F60E.zip md5:0274ed0d369ce0ba272f69c469465800 731.1 MB
SLspeech-chunks-M15R.zip md5:8526497910a6722bce58eed0e520eefb 380.5 MB
SLspeech-chunks-M40K.zip md5:118da90e364e683ed80cbb3935fa2d06 513.1 MB
SLspeech-chunks-M56H.zip md5:239c1a50a55cd98c93e1dd5d9d7664f0 531.2 MB
SLspeech-chunks-M58D.zip md5:69d9825b19769a5bac49131fc8d2f37d 664.6 MB
SLspeech-chunks-M66O.zip md5:b81312dcc55642adcc566001fe216d84 702.8 MB
# Audio files: Sentences
SLspeech-sentences-fm-F20N.zip md5:ece88127c0d4e39732a2282f05205a26 358.2 MB
SLspeech-sentences-fm-F28G.zip md5:0a2b3cf1acf129335e990cf6d114fcdc 331.9 MB
SLspeech-sentences-fm-F40L.zip md5:c71d3714b4bd4d01dc1bb625e867cce0 258.4 MB
SLspeech-sentences-fm-F60E.zip md5:de2dafa6d57faf1949c8c4174600f74d 359.4 MB
SLspeech-sentences-fm-M15R.zip md5:57b0bd8e36eba55c8809122eeb20ef27 178.1 MB
SLspeech-sentences-fm-M40K.zip md5:0b52a72e7aac1a229c9e14a722f400b7 230.1 MB
SLspeech-sentences-fm-M56H.zip md5:56a2db59db3132f9f3094ab09bba036f 248.4 MB
SLspeech-sentences-fm-M66O.zip md5:6248b1c12791fbbb693e567638fe0518 308.7 MB
SLspeech-sentences-hm-F20N.zip md5:f631c7e6e6dcf81864a05c7f24b51f15 321.1 MB
SLspeech-sentences-hm-F28G.zip md5:ed29bcd23afe9d6b1bf3687adb5fd743 283.6 MB
SLspeech-sentences-hm-F40L.zip md5:dea73e749829d52429f82f2e14b1b706 245.6 MB
SLspeech-sentences-hm-F60E.zip md5:b4a3168e323dc840e13dcd164708d42e 319.8 MB
SLspeech-sentences-hm-M15R.zip md5:66ca4c3607d1dc460ed67ece3b1e8145 159.2 MB
SLspeech-sentences-hm-M40K.zip md5:b7fd9bec968f4240a3d412c8d093a24f 215.1 MB
SLspeech-sentences-hm-M56H.zip md5:7daa99cb65af3978b73f4c0ef1c394c9 212.3 MB
SLspeech-sentences-hm-M66O.zip md5:2644c73095f5b695b6e5379e7df2cf77 277.7 MB
Files
Articles.zip
Files
(11.5 GB)
| Name | Size | |
|---|---|---|
|
md5:abadd44992ff5ec406ea4060020c56a9
|
519.1 kB | Preview Download |
|
md5:9dcd8e32ecdcc93e3fa3fa49f6b0220b
|
925.2 kB | Preview Download |
|
md5:87951aa19784a624c65993722c032f5e
|
700.2 MB | Preview Download |
|
md5:ee42e7317d8cc7c504536b01d9a7aecd
|
402.8 MB | Preview Download |
|
md5:38dba02e2551ddaf8f4451c930423e35
|
325.7 kB | Preview Download |
|
md5:80200e4952786e24c97c540123fb1bf4
|
626.0 kB | Preview Download |
|
md5:c1e052b8b9ccf87a348008acfdd65d96
|
34.2 MB | Preview Download |
|
md5:b902ce5c9f76dd0607fa9087c1cc0170
|
286.5 kB | Preview Download |
|
md5:8c730878d4ef986bd092d1eea8234d32
|
4.1 MB | Preview Download |
|
md5:a1a9c02278bfcf163f4ba27220a8a761
|
644.8 MB | Preview Download |
|
md5:325c6f84610a706be8b3072d91152979
|
624.2 MB | Preview Download |
|
md5:3cea531286f508267136eb2c87a8b0f8
|
682.0 MB | Preview Download |
|
md5:ea04e9a331dcdb678ee91eca24185247
|
551.9 MB | Preview Download |
|
md5:0274ed0d369ce0ba272f69c469465800
|
731.1 MB | Preview Download |
|
md5:8526497910a6722bce58eed0e520eefb
|
380.5 MB | Preview Download |
|
md5:118da90e364e683ed80cbb3935fa2d06
|
513.1 MB | Preview Download |
|
md5:239c1a50a55cd98c93e1dd5d9d7664f0
|
531.2 MB | Preview Download |
|
md5:69d9825b19769a5bac49131fc8d2f37d
|
664.6 MB | Preview Download |
|
md5:b81312dcc55642adcc566001fe216d84
|
702.8 MB | Preview Download |
|
md5:ece88127c0d4e39732a2282f05205a26
|
358.2 MB | Preview Download |
|
md5:0a2b3cf1acf129335e990cf6d114fcdc
|
331.9 MB | Preview Download |
|
md5:c71d3714b4bd4d01dc1bb625e867cce0
|
258.4 MB | Preview Download |
|
md5:de2dafa6d57faf1949c8c4174600f74d
|
359.4 MB | Preview Download |
|
md5:57b0bd8e36eba55c8809122eeb20ef27
|
178.1 MB | Preview Download |
|
md5:0b52a72e7aac1a229c9e14a722f400b7
|
230.1 MB | Preview Download |
|
md5:56a2db59db3132f9f3094ab09bba036f
|
248.4 MB | Preview Download |
|
md5:6248b1c12791fbbb693e567638fe0518
|
308.7 MB | Preview Download |
|
md5:f631c7e6e6dcf81864a05c7f24b51f15
|
321.1 MB | Preview Download |
|
md5:ed29bcd23afe9d6b1bf3687adb5fd743
|
283.6 MB | Preview Download |
|
md5:dea73e749829d52429f82f2e14b1b706
|
245.6 MB | Preview Download |
|
md5:b4a3168e323dc840e13dcd164708d42e
|
319.8 MB | Preview Download |
|
md5:66ca4c3607d1dc460ed67ece3b1e8145
|
159.2 MB | Preview Download |
|
md5:b7fd9bec968f4240a3d412c8d093a24f
|
215.1 MB | Preview Download |
|
md5:7daa99cb65af3978b73f4c0ef1c394c9
|
212.3 MB | Preview Download |
|
md5:2644c73095f5b695b6e5379e7df2cf77
|
277.7 MB | Preview Download |
Additional details
Funding
- Dutch Research Council
- Hoe efficiënt is spraak 355-75-001
References
- van Son, R. J. J. H., Binnenpoorte, D., van den Heuvel, H., & Pols, L. C. (2001). The IFA Corpus: a Phonemically Segmented Dutch" Open Source" Speech Database. Proc. EUROSPEECH 2001, Aalborg, Denmark, Vol. 3, 2051− 2054.
- Van Son, R. J. J. H., & Pols, L. C. (2001). Structure and access of the open source IFA Corpus. In Proceedings of the IRCS workshop on Linguistic Databases, Philadelphia (pp. 245-253).
- Pols, L. C., & van Son, R. J. J. H. (2002). Accessing the IFA-corpus. Book in honor of the 70-th anniversary of Prof. LV Bondarko, 316-320.
- Van Son, R. J. J. H. (2002). Can standard analysis tools be used on decompressed speech?. In COCOSDA 2002 Workshop of the International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques.
- Van Son, R. J. J. H., & Pols, L. C. (2002). Evidence for efficiency in vowel production. In INTERSPEECH (pp. 37-40).
- Van Son, R. J. (2005). A study of pitch, formant, and spectral estimation errors introduced by three lossy speech compression algorithms. Acta acustica united with acustica, 91(4), 771-778.