AnglistikVoices: L2 English speech dataset

Kakolu Ramarao, Akhilesh; Stein, Anna Sophia; Tahiri, Alba; Carvalho, Dalia Rodrigues; Antonia Weismann, Charlotte; Schäfer, Olivia Sophie; Kaczor, Julia; Tran, Nhut Ha; Elena Telaar, Christina; Bauer, Leonie; Jütten, Merle; Mafuta, Chimène; Agelopoulou, Vassiliki Vicky; Grabowski, Quinn Arin Gromek

doi:10.5281/zenodo.12525952

Published June 25, 2024 | Version v1.0.0

Dataset Open

AnglistikVoices: L2 English speech dataset

AnglistikVoices: an L2 English speech dataset

This repository contains an L2 (second language) English speech corpus consisting of 74 minutes of recorded audio from 15 non-native English speaking participants. The dataset was created as part of a university course, with all participants being students who are also the authors of this dataset.

Dataset Specifications

Total participants: 15 non-native English speakers
Total audio duration: 74 minutes
Recordings per participant: 60 audio samples each
Sentence alignment: Available for 8 out of 15 participant
Recording equipment: Audio-Technica ATM75 microphone
Stimuli: All sentences are from the Artie Bias Corpus (https://github.com/artie-inc/artie-bias-corpus)
Recording environment: Recording booth

The dataset contains individual recordings of non-native English speakers organized by participant ID. For 8 participants, sentence-level alignments are provided. All recordings were captured in a controlled acoustic environment using Audio-Technica ATM75 microphone to ensure high audio quality.

The recordings consist of spoken English utterances from each participant. Detailed linguistic profiles for each participant are available in the metadata.xlsx file, which is indexed by participant ID and contains information on native language, proficiency level, language learning history, and other relevant linguistic background data.

The audio files are organized by participant ID, matching the identifiers used in the metadata file for easy cross-referencing between the audio recordings and participant linguistic profiles.

Authors and Contributors

This dataset was created by the student participants themselves as part of their coursework.

Course Instructor: Akhilesh Kakolu Ramarao

Teaching Assistant: Anna Sophia Stein

If you have any questions, you can contact: kakolura@hhu.de

If you use this dataset in your research, please cite:

@dataset{kakolu_ramarao_2024_anglistikvoices,
  author       = {Kakolu Ramarao, A. and 
                  Stein, A. S. and 
                  Tahiri, A. and 
                  Rodrigues, D. C. and 
                  Antonia Weismann, C. and 
                  Schäfer, O. S. and 
                  Kaczor, J. and 
                  Tran, N. H. and 
                  Elena Telaar, C. and 
                  Bauer, L. and 
                  Jütten, M. and 
                  Mafuta, C. and 
                  Agelopoulou, V. V. and 
                  Grabowski, Q. A. G.},
  title        = {AnglistikVoices: L2 English speech dataset},
  publisher    = {Zenodo},
  version      = {v1.0.0},
  year         = {2024},
  month        = jun,
  doi          = {10.5281/zenodo.12525952},
  url          = {https://doi.org/10.5281/zenodo.12525952},
  note         = {LabPhon 19, Hanyang Institute for Phonetics and Cognitive Sciences of Language (HIPCS), Hanyang University in Seoul, Korea}
}

Files

raw.zip

Files (420.6 MB)

Name	Size
metadata.xlsx md5:bb9e7167bf7482c8ee1130770947cac9	7.7 kB	Download
raw.zip md5:7eb350ccdfd5c12d94783d8bf5e042c7	291.4 MB	Preview Download
sentence-aligned-audios.zip md5:a7195f8561555ef923e020b1de9b41c8	129.2 MB	Preview Download

Additional details

Available: 2024-06-26

	All versions	This version
Views	585	585
Downloads	268	268
Data volume	41.4 GB	41.4 GB

AnglistikVoices: L2 English speech dataset

Authors/Creators

Description

AnglistikVoices: an L2 English speech dataset

Dataset Specifications

Authors and Contributors

Files

raw.zip

Files (420.6 MB)

Additional details

Dates