Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published April 25, 2019 | Version v5
Dataset Open

Japanese FAQ dataset for e-learning system

  • 1. Tokyo Metropolitan University

Description

This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

  • FAQ data (*.csv)
    1. Answer2Category.csv: Categories of answers.
    2. Answer2Tag.csv: Titles of answers.
    3. Answers.csv: IDs for answers and texts of answers.
    4. Categories.csv: Names of categories for answers.
    5. Questions.csv: Texts of questions and their corresponding answer IDs.
    6. Answers_english.csv: IDs for answers and texts of answers written in English.
    7. Categories_english.csv: Names of categories for answers and their corresponding English names.
    8. Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

  • Statistics (*.tsv)

     Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057

Files

Answer2Category.csv

Files (1.5 MB)

Name Size Download all
md5:7b4a6df4d21cf0224afc59036e28489a
957 Bytes Preview Download
md5:6bbf666f172e70554378dda505c7718a
2.0 kB Preview Download
md5:619191d7606fd27a4c4b6a14a30421ec
3.7 kB Preview Download
md5:cec025dc40c4fe318cd316c1d9672e3a
42.9 kB Preview Download
md5:5dc299c7a8134b272265468af9b53219
37.3 kB Preview Download
md5:49241c18fa7fffd147fb192569c87def
525 Bytes Download
md5:4d9931b795191c15e500c885ed7786c8
238 Bytes Preview Download
md5:547e8367871003ce439dcfa13e2f92bb
403 Bytes Preview Download
md5:d2039dea3640c31087d18e1a1f0d0d48
825 Bytes Download
md5:a78b1ac8d33eb04d6fe87be773d178b7
78.6 kB Download
md5:741bbb73072c9c4229d3856add03fc26
7.2 kB Download
md5:f45a29171b2ef0ed71e0f42d515dd04d
8.3 kB Download
md5:0c84d4a8dafd1876b3f2eff3cd0bf329
1.3 MB Download
md5:15f0eb96a972473f8ff61e5f85b19f85
34.9 kB Preview Download
md5:a17f464df2afcc1b714b520dfead9b63
30.9 kB Preview Download
md5:d5eb47d34a830d33109bdce1c8c56bc7
2.5 kB Download