Listen Like Humans: A Temporal Benchmark Dataset for Phonological Competition in End-to-End ASR (audio)
Description
This record contains the dataset and lexical-semantic targets for the benchmark introduced in "Do Machines Listen Like Humans? A Temporal Benchmark for Phonological Competition in End-to-End ASR" (Interspeech 2026).
Contents:
• Audio — 10,731 single-word utterances (16 kHz, mono WAV) of a controlled lexicon of 1,533 uninflected English words (1–16 phonemes), each produced by seven talkers: six synthetic Apple "Say" voices and one human speaker.
Directory layout: en/<speaker>/<word>.wav and en/<speaker>/<word>.TextGrid. Train/test split manifests, and vocabulary are in the accompanying repository.
The benchmark compares the time course of lexical activation in ASR models against human eyetracking data from the Visual World Paradigm (Allopenna et al., 1998), quantifying cohort and rhyme competition dynamics.
Code: https://github.com/comp-cogneuro-lang/listen-like-humans
Files
Files
(198.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1f398c6688acbf46e6424b324da8c776
|
198.6 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/comp-cogneuro-lang/listen-like-humans