Published April 29, 2022 | Version 1.0.0
Dataset Open

EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems

Description

This is the dataset created for the paper, "EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems" (https://arxiv.org/abs/2109.04919).

EmoWOZ is based on MultiWOZ, a multi-domain task-oriented dialogue dataset (https://github.com/budzianowski/multiwoz). It contains more than 11K task-oriented dialogues with more than 83K emotion annotations of user utterances. In addition to Wizard-of-Oz dialogues from MultiWOZ, we collect human-machine dialogues within the same set of domains to sufficiently cover the space of various emotions that can happen during the lifetime of a data-driven dialogue system. There are 7 emotion labels, which are adapted from the OCC emotion models.

For data format and label definition, please refer to README.md. 

Notes

S. Feng, N. Lubis, M. Heck, and C. van Niekerk are supported by funding provided by the Alexander von Humboldt Foundation in the framework of the Sofja Kovalevskaja Award endowed by the Federal Ministry of Education and Research, while C. Geishauser and H-C. Lin are supported by funds from the European Research Council (ERC) provided under the Horizon 2020 research and innovation programme (Grant agreement No. STG2018 804636). Computing resources were provided by Google Cloud.

Files

data-split.json

Files (178.0 MB)

Name Size Download all
md5:1490939e90c3c7b656e000bc212fd8fe
328.5 kB Preview Download
md5:9c6b7a91fa93851d4bee6f34947d07f5
18.1 MB Preview Download
md5:8b06d935ec69dd21ba654848c9000293
159.6 MB Preview Download
md5:eafca80e565f4815f50570946e19f472
3.2 kB Preview Download

Additional details

Related works

Is published in
Dataset: arXiv:2109.04919 (arXiv)