Published March 15, 2024 | Version 1.0
Dataset Open

NoVAGraphS FSA User-Agent Corpus

Description

  • Paper:  Di Nuovo E., Sanguinetti M., Balestrucci P.F,Anselma L., Bernareggi C., Mazzei A. (2024),Educational Dialogue Systems for Visually Impaired Students: Introducing a Task-Oriented User-Agent Corpus. Accepted paper at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
  • Contact person: Elisa Di Nuovo, elisa.dinuovo@gmail.com

Dataset Summary

Collection of user-agent interactions revolving around the description of Finite State Automata.

Daset Description

The corpus consists of a CSV file encoded in UTF-8 comprising the following columns:

  • CODE_ID: the id of the interaction
  • Turn: the turn number within the interaction
  • Participant: it identifies the sender (U for the user, S for the agent)
  • Text: the utterance content
  • VIP: it determines whether the user is a Visually-Impaired Person
  • Token count: the number of tokens in the utterance (counted using Spacy tokenizer)
  • DAs_GOLD and Errors_GOLD: the columns including the assigned labels for Dialog Acts and Errors, respectively
  • FSA_ID: the id of the Finite State Automaton that is being referred to within the conversation (it corresponds to the PNG and HTML file names containing the relevant information on the FSA)

Additional Data

  • Two PNG files with the graphical representation of the automata
  • Two HTML files containing the state tables of the automata
  • RASA configuration files used to train the DIET classifier on the DAs

Access Request

To access the data users need to fill out the following Google form

Files

FSA01.png

Files (349.5 kB)

Name Size Download all
md5:4f85f6186f94fe0d2dc58a76f562c1d2
3.8 kB Download
md5:e5f58ddf97a292818d870ba180f35e9f
158.4 kB Preview Download
md5:51c39f48aefffa7b356da5c24b4b3e51
3.4 kB Download
md5:f7eae3a4a147f94f31fe4d049811975f
182.7 kB Preview Download
md5:7a524e53257eba1ceb73b5ca5ada9b81
1.2 kB Preview Download

Additional details

Funding

Fondazione CRT
NoVAGraphS (Non-Visual Access to Graphical Structure) Progetto CRT 2021.1930