Published May 10, 2023 | Version 1
Dataset Open

OfficeDial Dataset

  • 1. University of Illinois at Chicago
  • 2. Arizona State University

Description

# OfficeDial Dataset

 

## EXPLANATION OF DATA FILES

 

We are releasing this dataset as a json file containing dialogues between a user and an IVA in different noise levels for different scenarios. The format of the dataset is adapted from [Taskmaster](https://github.com/google-research-datasets/Taskmaster) dataset.

 

The dataset is a dictionary of filenames and an array of conversations.

Each conversation contains the following attributes:

- conversation_id: a unique id

- scenario: scenario of this conversation, could be S1_A, S1_B, S2_A, S2_B, S3_A, S3_B

- noise: noise level played of during this conversation, values are SILENCE, NON_VERBAL, VERBAL

- utterances: an array of utterances

 

Each utterance contains the following fields:

- index: index representing the order of this conversation, starts at 0

- speaker: speaker of this specific line, values are USER, ASSISTANT

- text: The transcription of the spoken words

 

## License

 

Creative Commons Attribution License (cc-by).

Files

officedial_dataset.json

Files (238.4 kB)

Name Size Download all
md5:7836824e640554f5806362e3a17a2e87
237.4 kB Preview Download
md5:a55e0c2081a27e08ba1d00a3ee2746fd
959 Bytes Preview Download