Published March 16, 2025 | Version v2
Dataset Restricted

Dataset of Audiovisual Speech for AR Telepresence Studies (Speech Recordings)

  • 1. ROR icon Aalto University
  • 2. ROR icon Friedrich-Alexander-Universität Erlangen-Nürnberg

Description

Dataset of speech recordings made in the anechoic chamber "Lampio" at Aalto University.

21 Participant ("P1" - "P21")

Four parts are included:

1) Conversations: Ten different scripted three-part conversations ("C1" - "C10").
Each participant is in two of them. All of the three parts is played by all three participants ("S1" - "S3"). See assignment_conversations.xlsx

2) Harvard_Sets: Sets 25 and 36 of the Harvard sentence lists

3) Sentence 1 from List 25 in five different voice levels (from "barely not whispering" to "screaming as loud as you can")

4) Native_Language: List 25 translated to native languages of 12 of the participants

(French, Finnish, Hebrew, Hindi, Spanish (Mexico), Spanish (Chile), Catalan, Latvian, Italian, Polish, Romanian, German)


Each file contain data from three receivers:

Ch 1: GRAS 40 HF 1" low-noise meausurement microphone. 1.5 m away from the subject
Ch 2: RØDE NT1 large diaphragm condenser microphone. 2 m away from the subject
Ch 3: DPA 4060. Attached to the subject's clothes


Calibration.wav: Calibration data for Ch1. (Recorded using a B&K 4231 Calibrator 1kHz, 94 dB) 

Accompanying video data can be obtained by personal request from nils.meyer-kahlen@aalto.fi

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.