Published January 16, 2013 | Version v1
Dataset Restricted

MediaParl

Description

Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland. The data were recorded at the Valais Parliament. Valais is a bi-lingual Swiss canton with many local accents and dialects. Therefore, the database contains data with high variability and is suitable to study multilingual, accented and non-native speech recognition as well as language identification and language switch detection.

The corpus is partitioned into training, development and test sets. Since we focus on bilingual (accented, non-native) speech, the test set (MediaParl-TST) contains all the speakers who speak in both languages. The remaining speakers (non-bilingual) have been randomly assigned to the training (MediaParl-TRN) and development sets (MediaParl-DEV) in a proportion of 9 to 1.

MediaParl-TRN contains 11,425 sentences (5,471 in French and 5,955 in German) spoken by 180 different speakers. MediaParl-DEV contains 1,525 sentences (646 in French and 879 in German) from 17 different speakers. MediaParl-TST contains 2,617 sentences (925 in french and 1692 in German) from 7 different speakers. Each speaker uses both languages but we assume that each speaker is naturally speaking more often in his mother tongue. Four speakers are native German speakers and three speakers native French speakers.

 

Reference paper

MediaParl: Bilingual mixed language accented speech database, David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé and Alexandre Nanchen, in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, 2012"

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Access to the dataset is based on an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

Please provide us the following information about the authorized signatory (MUST hold a permanent position):

  • Full name
  • Name of organization
  • Position / job title
  • Academic / professional email address
  • URL where we can verify the information details

Only academic/professional email addresses from the same organization as the signatory are accepted for the online request. All online requests coming from generic email providers such as gmail will be rejected.

You are currently not logged in. Do you have an account? Log in here