Video/Audio Open Access

The Fharvard corpus

Aubanel, Vincent; Bayard, Clémence; Strauss, Antje; Schwartz, Jean-Luc

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Aubanel, Vincent</dc:creator>
  <dc:creator>Bayard, Clémence</dc:creator>
  <dc:creator>Strauss, Antje</dc:creator>
  <dc:creator>Schwartz, Jean-Luc</dc:creator>
  <dc:description>The Fharvard corpus is a collection of 700 sentences in French, phonetically balanced into 70 lists of 10 sentences each. Each sentence contains 5 keywords for scoring.

The list of sentences is contained in the file The Fharvard corpus.pdf with keywords in bold.

The phonetic transcription is provided in The Fharvard corpus - phonetic.txt. The ortho column contains the orthographic representation of the sentence with keywords in capital letters. The phono column contains the phonetic representation in SAMPA coding, with words separated by two successive space characters. Note that the phonetic representation is provided on an individual word basis, that is, discarding word-to-word liaisons. This is to provide an unambiguous basis for phonetic balancing at the keyword level, as the realisation of some liaisons can vary from talker to talker.

Audio recordings of the Fharvard sentences spoken by a female and a male talker are contained in the .zip archive files, and available with a 44.1 kHz and 16 kHz sampling rate.

A sample sentence for the female and the male talker is also attached.


  <dc:subject>Speech in Noise</dc:subject>
  <dc:subject>Speech Intelligibility</dc:subject>
  <dc:title>The Fharvard corpus</dc:title>
All versions This version
Views 729729
Downloads 202202
Data volume 5.5 GB5.5 GB
Unique views 674674
Unique downloads 112112


Cite as