The ultimate goal of the Zero Resource Speech Challenge is to construct a system that learn an end-to-end Spoken Dialog (SD) system, in an unknown language, from scratch, using only information available to a language learning infant. "Zero resource" refers to zero linguistic expertise (e.g., orthographic/linguistic transcriptions), not zero information besides audio (visual, limited human feedback, etc). The fact that 4 year olds spontaneously learn language without supervision from language experts show that this goal is theoretically reachable.

The Zero Resource speech challenge addresses a fundamental scientific question (how can a system autonomously acquire language?) which is interesting in its own right, but has also three main practical benefits:

  • Traditional speech and language technologies are trained with massive amounts of textual information. However, most of the world’s languages do not have textual resources or even a reliable orthography. Systems constructed with zero expert resources could serve millions of users of these so-called ‘low-resource’ languages.
  • Languages are disappearing faster than what can be preserved through the language documentation effort. Zero resource technologies could help field linguists with tools to (semi-)automatically analyze and annotate audio recordings of these endangered languages with automatically discovered linguistic units (phonemes, lexicon, grammar).
  • Zero Resource Speech technologies provide predictive models of language growth for psychologists/clinicians interested in the impact of sociolinguistic variations in input on subsequent normal or abnormal language and cognitive development.

The Zero Resource Challenge series is constructed to progress incrementally towards this goal, by proposing achievable but progressively harder objectives, building and open sourcing the core technological components that are needed for an autonomous SD system along the way.

Weakly/Un- supervised learning is tricky to evaluate. We use two kinds of evaluation principles: (1). Unit testing: Each core component is evaluated by a specific set of metrics, largely inspired by psychometrics and linguistics. These tests do not guarantee that an entire system will work well, but they are useful to check and debug the systems. (2). Application testing. As the challenge progress in aggregating more components, useful applications will be possible to construct (e.g. keyword search, document classification, image retrieval from speech, speech to speech translation, etc), making it possible to use more standard evaluation techniques.

New upload


The ZeroSpeech challenge targets the unsupervised discovery of linguistic units from raw speech in an unknown language. As in the 2015 edition, it concerns two core components: the discovery of subword units (Track 1) and the discovery of word units (Track 2), respectively. For both tracks, the same evaluation metrics as in the former edition are used.

More information can be found at

Read more

Curated by:
Curation policy:

Contributions are accepted from participants of the challenge.

May 30, 2017
Harvesting API:
OAI-PMH Interface

Want your upload to appear in this community?

  • Click the button above to upload straight to this community.
  • The community curator is notified, and will either accept or reject your upload (see community curation policy above).
  • If your upload is rejected by the curator, it will still be available on Zenodo, just not in this community.