Published October 31, 2024 | Version v1
Dataset Restricted

Teflon Pronunciation Assessment Challenge

  • 1. ROR icon Norwegian University of Science and Technology
  • 2. ROR icon Aalto University

Description

Speech data for the Teflon Pronunciation Assessment Challenge.

This dataset contains recordings of isolated words in Norwegian spoken by children in the age 4-12. Some of the children are motherthongue Norwegian, others are immigrants with different linguisti backgrounds. The fine names were randomized to hide the identity of the speakers. The pronunciation of each utterance has been assessed by at least one assessor on a scale 1-5 with the following meaning:

  1. Not at all identifiable as the target word
  2. Difficult to identify as the target word
  3. Slight phonemic error(s)
  4. Subphonemic error(s) or "unexpected variants"
  5. Prototypical, adult-like

The data is divided into train and test. The corresponding CSV files contain, for each utterance, information about the spoken word and the score. Scores are only given for the training data.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

References

  • A-M Haug Olstad et al. "Collecting Linguistic Resources for Assessing Children's Pronunciation of Nordic Languages", LREC 2024, Turin, Italy