Published June 25, 2024 | Version v1.0.0
Dataset Open

AnglistikVoices: L2 English speech dataset

Description

AnglistikVoices: an L2 English speech dataset 

This repository contains an L2 (second language) English speech corpus consisting of 74 minutes of recorded audio from 15 non-native English speaking participants. The dataset was created as part of a university course, with all participants being students who are also the authors of this dataset.

Dataset Specifications

  • Total participants: 15 non-native English speakers
  • Total audio duration: 74 minutes
  • Recordings per participant: 60 audio samples each
  • Sentence alignment: Available for 8 out of 15 participant
  • Recording equipment: Audio-Technica ATM75 microphone
  • Stimuli: All sentences are from the Artie Bias Corpus (https://github.com/artie-inc/artie-bias-corpus)
  • Recording environment: Recording booth

The dataset contains individual recordings of non-native English speakers organized by participant ID. For 8 participants, sentence-level alignments are provided. All recordings were captured in a controlled acoustic environment using Audio-Technica ATM75 microphone to ensure high audio quality. 

The recordings consist of spoken English utterances from each participant. Detailed linguistic profiles for each participant are available in the metadata.xlsx file, which is indexed by participant ID and contains information on native language, proficiency level, language learning history, and other relevant linguistic background data.

The audio files are organized by participant ID, matching the identifiers used in the metadata file for easy cross-referencing between the audio recordings and participant linguistic profiles.

Authors and Contributors

This dataset was created by the student participants themselves as part of their coursework.

Course Instructor: Akhilesh Kakolu Ramarao

Teaching Assistant: Anna Sophia Stein

If you have any questions, you can contact: kakolura@hhu.de

If you use this dataset in your research, please cite:

@dataset{kakolu_ramarao_2024_anglistikvoices,
  author       = {Kakolu Ramarao, A. and 
                  Stein, A. S. and 
                  Tahiri, A. and 
                  Rodrigues, D. C. and 
                  Antonia Weismann, C. and 
                  Schäfer, O. S. and 
                  Kaczor, J. and 
                  Tran, N. H. and 
                  Elena Telaar, C. and 
                  Bauer, L. and 
                  Jütten, M. and 
                  Mafuta, C. and 
                  Agelopoulou, V. V. and 
                  Grabowski, Q. A. G.},
  title        = {AnglistikVoices: L2 English speech dataset},
  publisher    = {Zenodo},
  version      = {v1.0.0},
  year         = {2024},
  month        = jun,
  doi          = {10.5281/zenodo.12525952},
  url          = {https://doi.org/10.5281/zenodo.12525952},
  note         = {LabPhon 19, Hanyang Institute for Phonetics and Cognitive Sciences of Language (HIPCS), Hanyang University in Seoul, Korea}
}

Files

raw.zip

Files (420.6 MB)

Name Size Download all
md5:bb9e7167bf7482c8ee1130770947cac9
7.7 kB Download
md5:7eb350ccdfd5c12d94783d8bf5e042c7
291.4 MB Preview Download
md5:a7195f8561555ef923e020b1de9b41c8
129.2 MB Preview Download

Additional details

Dates

Available
2024-06-26