Published December 7, 2023 | Version v1
Dataset Restricted


  • 1. ROR icon Idiap Research Institute



This is the real test portion of the SLURP-Fr dataset, which is a part of the dataset created for the studies on interpreter-aided spoken language understanding (SLU) in the paper below, with three different parts:

  1. SLURP-Fr, an end-to-end SLU dataset based on the French portion of MASSIVE (, containing 16,521 synthetic audio samples created using Google TTS, accompanied with 477 real test samples collected from two French speakers at Idiap.
  2. SLURP -Es, a similar dataset based on the parallel Spanish portion of MASSIVE, containing only synthetic samples.
  3. Spoken Gigaword, a speech summarization dataset generated from Gigaword (, containing 51,385 synthetic audio samples created using Google TTS.



If you use this dataset, please cite the following publication:

He, Mutian, and Philip N. Garner. "The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation." Findings of EMNLP 2023.



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

We will provide an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

Please provide us the following information about the authorized signatory (MUST hold a permanent position):

  • Full name
  • Name of organization
  • Position / job title
  • Academic / email address
  • URL where we can verify the information details

Only valid academic email addresses from the same organization as the signatory are accepted for the online request. All online requests coming from generic email providers such as gmail will be rejected.

Additional details


