Published June 6, 2026 | Version v1

Listen Like Humans: A Temporal Benchmark Dataset for Phonological Competition in End-to-End ASR (audio)

  • 1. EDMO icon University of Connecticut

Description

This record contains the dataset and lexical-semantic targets for the benchmark introduced in "Do Machines Listen Like Humans? A Temporal Benchmark for Phonological Competition in End-to-End ASR" (Interspeech 2026).

Contents:
• Audio — 10,731 single-word utterances (16 kHz, mono WAV) of a controlled lexicon of 1,533 uninflected English words (1–16 phonemes), each produced by seven talkers: six synthetic Apple "Say" voices and one human speaker. 

Directory layout: en/<speaker>/<word>.wav and en/<speaker>/<word>.TextGrid. Train/test split manifests, and vocabulary are in the accompanying repository.

The benchmark compares the time course of lexical activation in ASR models against human eyetracking data from the Visual World Paradigm (Allopenna et al., 1998), quantifying cohort and rhyme competition dynamics.

Code: https://github.com/comp-cogneuro-lang/listen-like-humans

Files

Files (198.6 MB)

Name Size Download all
md5:1f398c6688acbf46e6424b324da8c776
198.6 MB Download

Additional details