Listen Like Humans: A Temporal Benchmark Dataset for Phonological Competition in End-to-End ASR (audio)

Linkai, Peng

doi:10.5281/zenodo.20564345

Published June 6, 2026 | Version v1

Dataset Open

Listen Like Humans: A Temporal Benchmark Dataset for Phonological Competition in End-to-End ASR (audio)

Linkai, Peng (Contact person)¹

1. University of Connecticut

This record contains the dataset and lexical-semantic targets for the benchmark introduced in "Do Machines Listen Like Humans? A Temporal Benchmark for Phonological Competition in End-to-End ASR" (Interspeech 2026).

Contents:
• Audio — 10,731 single-word utterances (16 kHz, mono WAV) of a controlled lexicon of 1,533 uninflected English words (1–16 phonemes), each produced by seven talkers: six synthetic Apple "Say" voices and one human speaker.

Directory layout: en/<speaker>/<word>.wav and en/<speaker>/<word>.TextGrid. Train/test split manifests, and vocabulary are in the accompanying repository.

The benchmark compares the time course of lexical activation in ASR models against human eyetracking data from the Visual World Paradigm (Allopenna et al., 1998), quantifying cohort and rhyme competition dynamics.

Code: https://github.com/comp-cogneuro-lang/listen-like-humans

Files

Files (198.6 MB)

Name	Size	Download all
dataset.tar.gz md5:1f398c6688acbf46e6424b324da8c776	198.6 MB	Download

Additional details

Repository URL: https://github.com/comp-cogneuro-lang/listen-like-humans

	All versions	This version
Views	12	12
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Listen Like Humans: A Temporal Benchmark Dataset for Phonological Competition in End-to-End ASR (audio)

Authors/Creators

Description

Files

Files (198.6 MB)

Additional details

Software