AnglistikVoices: L2 English speech dataset
Description
AnglistikVoices: an L2 English speech dataset
This repository contains an L2 (second language) English speech corpus consisting of 74 minutes of recorded audio from 15 non-native English speaking participants. The dataset was created as part of a university course, with all participants being students who are also the authors of this dataset.
Dataset Specifications
- Total participants: 15 non-native English speakers
- Total audio duration: 74 minutes
- Recordings per participant: 60 audio samples each
- Sentence alignment: Available for 8 out of 15 participant
- Recording equipment: Audio-Technica ATM75 microphone
- Stimuli: All sentences are from the Artie Bias Corpus (https://github.com/artie-inc/artie-bias-corpus)
- Recording environment: Recording booth
The dataset contains individual recordings of non-native English speakers organized by participant ID. For 8 participants, sentence-level alignments are provided. All recordings were captured in a controlled acoustic environment using Audio-Technica ATM75 microphone to ensure high audio quality.
The recordings consist of spoken English utterances from each participant. Detailed linguistic profiles for each participant are available in the metadata.xlsx file, which is indexed by participant ID and contains information on native language, proficiency level, language learning history, and other relevant linguistic background data.
The audio files are organized by participant ID, matching the identifiers used in the metadata file for easy cross-referencing between the audio recordings and participant linguistic profiles.
Authors and Contributors
This dataset was created by the student participants themselves as part of their coursework.
Course Instructor: Akhilesh Kakolu Ramarao
Teaching Assistant: Anna Sophia Stein
If you have any questions, you can contact: kakolura@hhu.de
If you use this dataset in your research, please cite:
@dataset{kakolu_ramarao_2024_anglistikvoices,
author = {Kakolu Ramarao, A. and
Stein, A. S. and
Tahiri, A. and
Rodrigues, D. C. and
Antonia Weismann, C. and
Schäfer, O. S. and
Kaczor, J. and
Tran, N. H. and
Elena Telaar, C. and
Bauer, L. and
Jütten, M. and
Mafuta, C. and
Agelopoulou, V. V. and
Grabowski, Q. A. G.},
title = {AnglistikVoices: L2 English speech dataset},
publisher = {Zenodo},
version = {v1.0.0},
year = {2024},
month = jun,
doi = {10.5281/zenodo.12525952},
url = {https://doi.org/10.5281/zenodo.12525952},
note = {LabPhon 19, Hanyang Institute for Phonetics and Cognitive Sciences of Language (HIPCS), Hanyang University in Seoul, Korea}
}
Files
raw.zip
Additional details
Dates
- Available
-
2024-06-26