OfficeDial Dataset

Mohammad Arvan; Mina Valizadeh; Parian Haghighat; Toan Nguyen; Heejin Jeong; Natalie Parde

doi:10.5281/zenodo.7922480

Published May 10, 2023 | Version 1

Dataset Open

OfficeDial Dataset

1. University of Illinois at Chicago
2. Arizona State University

# OfficeDial Dataset

## EXPLANATION OF DATA FILES

We are releasing this dataset as a json file containing dialogues between a user and an IVA in different noise levels for different scenarios. The format of the dataset is adapted from [Taskmaster](https://github.com/google-research-datasets/Taskmaster) dataset.

The dataset is a dictionary of filenames and an array of conversations.

Each conversation contains the following attributes:

- conversation_id: a unique id

- scenario: scenario of this conversation, could be S1_A, S1_B, S2_A, S2_B, S3_A, S3_B

- noise: noise level played of during this conversation, values are SILENCE, NON_VERBAL, VERBAL

- utterances: an array of utterances

Each utterance contains the following fields:

- index: index representing the order of this conversation, starts at 0

- speaker: speaker of this specific line, values are USER, ASSISTANT

- text: The transcription of the spoken words

## License

Creative Commons Attribution License (cc-by).

Files

officedial_dataset.json

Files (238.4 kB)

Name	Size	Download all
officedial_dataset.json md5:7836824e640554f5806362e3a17a2e87	237.4 kB	Preview Download
README.md md5:a55e0c2081a27e08ba1d00a3ee2746fd	959 Bytes	Preview Download

	All versions	This version
Views	75	75
Downloads	25	25
Data volume	3.3 MB	3.3 MB

OfficeDial Dataset

Authors/Creators

Description

Files

officedial_dataset.json

Files (238.4 kB)