Video/Audio Open Access

The Fharvard corpus

Aubanel, Vincent; Bayard, Clémence; Strauss, Antje; Schwartz, Jean-Luc

The Fharvard corpus is a collection of 700 sentences in French, phonetically balanced into 70 lists of 10 sentences each. Each sentence contains 5 keywords for scoring.

The list of sentences is contained in the file The Fharvard corpus.pdf with keywords in bold.

The phonetic transcription is provided in The Fharvard corpus - phonetic.txt. The ortho column contains the orthographic representation of the sentence with keywords in capital letters. The phono column contains the phonetic representation in SAMPA coding, with words separated by two successive space characters. Note that the phonetic representation is provided on an individual word basis, that is, discarding word-to-word liaisons. This is to provide an unambiguous basis for phonetic balancing at the keyword level, as the realisation of some liaisons can vary from talker to talker.

Audio recordings of the Fharvard sentences spoken by a female and a male talker are contained in the .zip archive files, and available with a 44.1 kHz and 16 kHz sampling rate.

A sample sentence for the female and the male talker is also attached.

 

 

Files (403.4 MB)
Name Size
Audio 16kHz Female.zip
md5:d2cdabe535076515953cc96b2b80994e
53.8 MB Download
Audio 16kHz Male.zip
md5:0a3cbe424f4c9bdbc0306712e5d2c939
56.6 MB Download
Audio 44kHz Female.zip
md5:a0f0838538b9a3527f8d275492b14e19
141.0 MB Download
Audio 44kHz Male.zip
md5:c534dc13acd06ec2d976e1b66cca7432
151.7 MB Download
Sample sentence Female.wav
md5:4d8da4bba06063c6827400f144223e44
108.5 kB Download
Sample sentence Male.wav
md5:7ae7b5d0f9e6890b9505ce79ed8ee96d
120.2 kB Download
The Fharvard corpus - phonetic.txt
md5:7e164b4cb4b8391befab08781a039ad9
88.2 kB Download
The Fharvard corpus.pdf
md5:190df22cd5a9962dc7379d10d6496c4d
90.1 kB Download
683
179
views
downloads
All versions This version
Views 683683
Downloads 179179
Data volume 5.1 GB5.1 GB
Unique views 639639
Unique downloads 9292

Share

Cite as