The IFA Spoken Language Corpus

van Son, Rob J. J. H.

doi:10.5281/zenodo.14904090

Published September 3, 2001 | Version 1.0

Video/Audio Open

The IFA Spoken Language Corpus

van Son, Rob J. J. H. (Project leader)¹

1. The Netherlands Cancer Institute

Contributors

Annotator:

Binnenpoorte, Diana²

Other (3):

Project member (2):

1. University of Amsterdam
2. Radboud University Nijmegen
3. Dutch Research Council
4. Dutch Language Union

The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers (out of 10) in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker), speech acquisition and preparation took around 3 person-weeks per speaker. Hand segmentation took 1,000 hours of labeling altogether. The asymptotic segmentation speed was about one word, or four boundaries, per minute. An evaluation showed that the Median Absolute Difference of the segment boundaries was 6 ms between labelers, and 4 ms within labelers. Label differences (substitutions, insertions, and deletions) were found in 8% of the segments between labelers and 5% within labelers. Compiled data are available in relational database format for querying with SQL.

The IFA Spoken Language corpus is currently in version 1.0. This is the "reference" version and the first I consider consistent enough to be usefull. However, the annotations (labeling) still contains errors. This means that there are inconsistencies in a few percent of the labels (e.g., wrong word assignment of syllables/phonemes, stress errors, etc.).

Summary information:

Net time in seconds (excluding all pauses)

Gender	Age	ID	Recorded sentences (sec)	Segmented sentences (sec)
F	20	N	3736	2760
F	28	G	4180	3978
F	40	L	3112	2485
F	60	E	4181	3245
M	15	R	2125	1439
M	40	K	2720	1891
M	56	H	2894	2368
M	66	O	3781	1696
Total			26733	19867
			7:26 hours	5:31 hours

Speech in tokens (total)

Gender	Recorded		Segmented
	Sentences	Words	Sentences	Words	Syllables	Phonemes
4F / 4M	6128	73067	4492	51782	74702	187544

The IFA spoken language corpus is constructed using the Praat speech editting and analysis program. All speech material is accessible with praat.

The Dutch Language Organization (Nederlandse Taalunie) holds all copyrights (unless explicitely stated otherwise) and makes the complete corpus available under the GNU General Public License (see below).

Methods

Audio files

Audio files are stored in AIFC format (16 bit, 44100 Hz). Recording microphones were coded as hm for head-mounted and fm for fixed microphone. Two-channel recordings were split into chunks ("paragraphs") for storage and processing. Chunks were split into single-channel sentences (fm and hm) for word and phoneme segmentation.

Recording equipment

Speech was recorded in a quiet, sound treated room. Recording equipment and a cueing computer were in a separated control room. Two-channel recordings were made with a head-mounted dynamic microphone (hm, Shure SM10A) on one channel and a fixed HF condenser microphone (fm, Sennheiser MKH 105) on the other. Recording was done directly to a Philips Audio CD-recorder, i.e., 16 bit linear coding at 44.1 kHz stereo. A standard sound source (white noise and pure 400 Hz tone) of 78 dB was recorded from a fixed position relative to the fixed microphone to be able to mark the recording level. These reference source recordings are stored with the speech as G[12]N and G[12]T. The head mounted microphone did not allow precise repositioning between sessions, and was even known to move during the sessions (which was noted).

Speakers

Speakers were selected at the Institute of Phonetic Sciences in Amsterdam (IFA) and consisted mostly of staff and students. Non-staff speakers were paid. In total, 18 speakers (9F/9M) completed both recording sessions. All speakers were mother-tongue speakers and none reported speaking or hearing problems. Recordings of 10 speakers (5F/5M) were selected and split into chunks (paragraphs), based on distribution of sex and age, and the quality of the recordings. Recordings of 4 women and 4 men were selected for phonemic segmentation. The ages of the selected speakers ranges from 15 to 66 years of age

Speaking styles

Eight speaking "styles" were recorded from each speaker.

From informal to formal these were:

Informal story telling face-to-face to an "interviewer" (I)
Retelling a previously read narrative story without sight contact (R)

And reading aloud:
A narrative story (T)
A random list of all sentences of the narrative stories (S)
"Pseudo-sentences" constructed by replacing all words in a sentence with randomly selected words from the text with the same POS tag (PS)
Lists of selected words from the texts (W)
Lists of all distinct syllables from the word lists (Sy)
A collection of idiomatic (the Alphabet, the numbers 0-12) and "diagnostic" sequences (isolated vowels, /hVd/ and /VCV/ lists) (Pr)

The last style was presented in a fixed order, all other lists (S, PS, W, Sy) were (pseudo-)randomized for each speaker before presentation.

Each speaker read aloud from two separate text collections based on narrative texts. During the first recording session, each speaker read from the same two texts (Fixed text type). These texts were based on the Dutch version of "The north wind and the sun", and on a translation of the fairy tale "Jorinde und Joringel". During the second session, each speaker read from texts based on the informal story told during the first recording session (Variable text type). A non-overlapping selection of words was made from each text type (W). Words were selected to maximize coverage of phonemes and diphones and also included the 50 most frequent words from the texts. The word lists were automatically transcribed into phonemes using a simple CELEX* word list lookup and were split into syllables. The syllables were transcribed back into a pseudo-orthography which was readable for Dutch subjects (Sy). The 70 "pseudo-sentences" (PS) were based on the Fixed texts and corrected for syntactic number and gender. They were "semantically unpredictable" and only marginally grammatical.

* Burnage, G. "CELEX - A Guide for Users." Nijmegen: Centre for Lexical Information, University of Nijmegen. 1990.

# All the annotation files as Praat TextGrid files: ASPEX, CELEX, Phonemes, POS, SPEX, Transliterations etc.
Labels-chunks.zip md5:80200e4952786e24c97c540123fb1bf4 626.0 kB
Labels-sentences.zip md5:c1e052b8b9ccf87a348008acfdd65d96 34.2 MB
Labels-validation.zip md5:b902ce5c9f76dd0607fa9087c1cc0170 286.5 kB

# All Transcriptions, scripts, and other auxiliary files
SLcorpus.zip md5:8c730878d4ef986bd092d1eea8234d32 4.1 MB

# Audio files: Chunks
SLspeech-chunks-F20N.zip md5:a1a9c02278bfcf163f4ba27220a8a761 644.8 MB
SLspeech-chunks-F24I.zip md5:325c6f84610a706be8b3072d91152979 624.2 MB
SLspeech-chunks-F28G.zip md5:3cea531286f508267136eb2c87a8b0f8 682.0 MB
SLspeech-chunks-F40L.zip md5:ea04e9a331dcdb678ee91eca24185247 551.9 MB
SLspeech-chunks-F60E.zip md5:0274ed0d369ce0ba272f69c469465800 731.1 MB
SLspeech-chunks-M15R.zip md5:8526497910a6722bce58eed0e520eefb 380.5 MB
SLspeech-chunks-M40K.zip md5:118da90e364e683ed80cbb3935fa2d06 513.1 MB
SLspeech-chunks-M56H.zip md5:239c1a50a55cd98c93e1dd5d9d7664f0 531.2 MB
SLspeech-chunks-M58D.zip md5:69d9825b19769a5bac49131fc8d2f37d 664.6 MB
SLspeech-chunks-M66O.zip md5:b81312dcc55642adcc566001fe216d84 702.8 MB

# Audio files: Sentences
SLspeech-sentences-fm-F20N.zip md5:ece88127c0d4e39732a2282f05205a26 358.2 MB
SLspeech-sentences-fm-F28G.zip md5:0a2b3cf1acf129335e990cf6d114fcdc 331.9 MB
SLspeech-sentences-fm-F40L.zip md5:c71d3714b4bd4d01dc1bb625e867cce0 258.4 MB
SLspeech-sentences-fm-F60E.zip md5:de2dafa6d57faf1949c8c4174600f74d 359.4 MB
SLspeech-sentences-fm-M15R.zip md5:57b0bd8e36eba55c8809122eeb20ef27 178.1 MB
SLspeech-sentences-fm-M40K.zip md5:0b52a72e7aac1a229c9e14a722f400b7 230.1 MB
SLspeech-sentences-fm-M56H.zip md5:56a2db59db3132f9f3094ab09bba036f 248.4 MB
SLspeech-sentences-fm-M66O.zip md5:6248b1c12791fbbb693e567638fe0518 308.7 MB
SLspeech-sentences-hm-F20N.zip md5:f631c7e6e6dcf81864a05c7f24b51f15 321.1 MB
SLspeech-sentences-hm-F28G.zip md5:ed29bcd23afe9d6b1bf3687adb5fd743 283.6 MB
SLspeech-sentences-hm-F40L.zip md5:dea73e749829d52429f82f2e14b1b706 245.6 MB
SLspeech-sentences-hm-F60E.zip md5:b4a3168e323dc840e13dcd164708d42e 319.8 MB
SLspeech-sentences-hm-M15R.zip md5:66ca4c3607d1dc460ed67ece3b1e8145 159.2 MB
SLspeech-sentences-hm-M40K.zip md5:b7fd9bec968f4240a3d412c8d093a24f 215.1 MB
SLspeech-sentences-hm-M56H.zip md5:7daa99cb65af3978b73f4c0ef1c394c9 212.3 MB
SLspeech-sentences-hm-M66O.zip md5:2644c73095f5b695b6e5379e7df2cf77 277.7 MB

Files

Articles.zip

Files (11.5 GB)

Name	Size
Additional Documents.zip md5:abadd44992ff5ec406ea4060020c56a9	519.1 kB	Preview Download
Articles.zip md5:9dcd8e32ecdcc93e3fa3fa49f6b0220b	925.2 kB	Preview Download
COCOSDA 2002 compressed audio.zip md5:87951aa19784a624c65993722c032f5e	700.2 MB	Preview Download
DatabaseFiles.zip md5:ee42e7317d8cc7c504536b01d9a7aecd	402.8 MB	Preview Download
LabelProtocol.zip md5:38dba02e2551ddaf8f4451c930423e35	325.7 kB	Preview Download
Labels-chunks.zip md5:80200e4952786e24c97c540123fb1bf4	626.0 kB	Preview Download
Labels-sentences.zip md5:c1e052b8b9ccf87a348008acfdd65d96	34.2 MB	Preview Download
Labels-validation.zip md5:b902ce5c9f76dd0607fa9087c1cc0170	286.5 kB	Preview Download
SLcorpus.zip md5:8c730878d4ef986bd092d1eea8234d32	4.1 MB	Preview Download
SLspeech-chunks-F20N.zip md5:a1a9c02278bfcf163f4ba27220a8a761	644.8 MB	Preview Download
SLspeech-chunks-F24I.zip md5:325c6f84610a706be8b3072d91152979	624.2 MB	Preview Download
SLspeech-chunks-F28G.zip md5:3cea531286f508267136eb2c87a8b0f8	682.0 MB	Preview Download
SLspeech-chunks-F40L.zip md5:ea04e9a331dcdb678ee91eca24185247	551.9 MB	Preview Download
SLspeech-chunks-F60E.zip md5:0274ed0d369ce0ba272f69c469465800	731.1 MB	Preview Download
SLspeech-chunks-M15R.zip md5:8526497910a6722bce58eed0e520eefb	380.5 MB	Preview Download
SLspeech-chunks-M40K.zip md5:118da90e364e683ed80cbb3935fa2d06	513.1 MB	Preview Download
SLspeech-chunks-M56H.zip md5:239c1a50a55cd98c93e1dd5d9d7664f0	531.2 MB	Preview Download
SLspeech-chunks-M58D.zip md5:69d9825b19769a5bac49131fc8d2f37d	664.6 MB	Preview Download
SLspeech-chunks-M66O.zip md5:b81312dcc55642adcc566001fe216d84	702.8 MB	Preview Download
SLspeech-sentences-fm-F20N.zip md5:ece88127c0d4e39732a2282f05205a26	358.2 MB	Preview Download
SLspeech-sentences-fm-F28G.zip md5:0a2b3cf1acf129335e990cf6d114fcdc	331.9 MB	Preview Download
SLspeech-sentences-fm-F40L.zip md5:c71d3714b4bd4d01dc1bb625e867cce0	258.4 MB	Preview Download
SLspeech-sentences-fm-F60E.zip md5:de2dafa6d57faf1949c8c4174600f74d	359.4 MB	Preview Download
SLspeech-sentences-fm-M15R.zip md5:57b0bd8e36eba55c8809122eeb20ef27	178.1 MB	Preview Download
SLspeech-sentences-fm-M40K.zip md5:0b52a72e7aac1a229c9e14a722f400b7	230.1 MB	Preview Download
SLspeech-sentences-fm-M56H.zip md5:56a2db59db3132f9f3094ab09bba036f	248.4 MB	Preview Download
SLspeech-sentences-fm-M66O.zip md5:6248b1c12791fbbb693e567638fe0518	308.7 MB	Preview Download
SLspeech-sentences-hm-F20N.zip md5:f631c7e6e6dcf81864a05c7f24b51f15	321.1 MB	Preview Download
SLspeech-sentences-hm-F28G.zip md5:ed29bcd23afe9d6b1bf3687adb5fd743	283.6 MB	Preview Download
SLspeech-sentences-hm-F40L.zip md5:dea73e749829d52429f82f2e14b1b706	245.6 MB	Preview Download
SLspeech-sentences-hm-F60E.zip md5:b4a3168e323dc840e13dcd164708d42e	319.8 MB	Preview Download
SLspeech-sentences-hm-M15R.zip md5:66ca4c3607d1dc460ed67ece3b1e8145	159.2 MB	Preview Download
SLspeech-sentences-hm-M40K.zip md5:b7fd9bec968f4240a3d412c8d093a24f	215.1 MB	Preview Download
SLspeech-sentences-hm-M56H.zip md5:7daa99cb65af3978b73f4c0ef1c394c9	212.3 MB	Preview Download
SLspeech-sentences-hm-M66O.zip md5:2644c73095f5b695b6e5379e7df2cf77	277.7 MB	Preview Download

Additional details

Dutch Research Council
Hoe efficiënt is spraak 355-75-001

van Son, R. J. J. H., Binnenpoorte, D., van den Heuvel, H., & Pols, L. C. (2001). The IFA Corpus: a Phonemically Segmented Dutch" Open Source" Speech Database. Proc. EUROSPEECH 2001, Aalborg, Denmark, Vol. 3, 2051− 2054.
Van Son, R. J. J. H., & Pols, L. C. (2001). Structure and access of the open source IFA Corpus. In Proceedings of the IRCS workshop on Linguistic Databases, Philadelphia (pp. 245-253).
Pols, L. C., & van Son, R. J. J. H. (2002). Accessing the IFA-corpus. Book in honor of the 70-th anniversary of Prof. LV Bondarko, 316-320.
Van Son, R. J. J. H. (2002). Can standard analysis tools be used on decompressed speech?. In COCOSDA 2002 Workshop of the International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques.
Van Son, R. J. J. H., & Pols, L. C. (2002). Evidence for efficiency in vowel production. In INTERSPEECH (pp. 37-40).
Van Son, R. J. (2005). A study of pitch, formant, and spectral estimation errors introduced by three lossy speech compression algorithms. Acta acustica united with acustica, 91(4), 771-778.

	All versions	This version
Views	104	104
Downloads	965	965
Data volume	310.4 GB	310.4 GB

Contributors

Annotator:

Other (3):

Project member (2):

Rights holder:

Sponsor:

Supervisor:

Summary information:

Net time in seconds (excluding all pauses)

Speech in tokens (total)

Articles.zip

Files (11.5 GB)

Funding

References

The IFA Spoken Language Corpus

Authors/Creators

Contributors

Annotator:

Other (3):

Project member (2):

Rights holder:

Sponsor:

Supervisor:

Description

Summary information:

Net time in seconds (excluding all pauses)

Speech in tokens (total)

Methods

Audio files

Recording equipment

Speakers

Speaking styles

Table of contents

Files

Articles.zip

Files (11.5 GB)

Additional details

Funding

References