Japanese FAQ dataset for e-learning system
- 1. Tokyo Metropolitan University
Description
This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.
Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.
Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.
This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.
We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.
File contents:
- FAQ data (*.csv)
- Answer2Category.csv: Categories of answers.
- Answer2Tag.csv: Titles of answers.
- Answers.csv: IDs for answers and texts of answers.
- Categories.csv: Names of categories for answers.
- Questions.csv: Texts of questions and their corresponding answer IDs.
- Answers_english.csv: IDs for answers and texts of answers written in English.
- Categories_english.csv: Names of categories for answers and their corresponding English names.
- Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.
- Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.
Grants: JSPS KAKENHI Grant Number 18H01057
Files
Answer2Category.csv
Files
(1.5 MB)
Name | Size | Download all |
---|---|---|
md5:7b4a6df4d21cf0224afc59036e28489a
|
957 Bytes | Preview Download |
md5:6bbf666f172e70554378dda505c7718a
|
2.0 kB | Preview Download |
md5:619191d7606fd27a4c4b6a14a30421ec
|
3.7 kB | Preview Download |
md5:cec025dc40c4fe318cd316c1d9672e3a
|
42.9 kB | Preview Download |
md5:5dc299c7a8134b272265468af9b53219
|
37.3 kB | Preview Download |
md5:49241c18fa7fffd147fb192569c87def
|
525 Bytes | Download |
md5:4d9931b795191c15e500c885ed7786c8
|
238 Bytes | Preview Download |
md5:547e8367871003ce439dcfa13e2f92bb
|
403 Bytes | Preview Download |
md5:d2039dea3640c31087d18e1a1f0d0d48
|
825 Bytes | Download |
md5:a78b1ac8d33eb04d6fe87be773d178b7
|
78.6 kB | Download |
md5:741bbb73072c9c4229d3856add03fc26
|
7.2 kB | Download |
md5:f45a29171b2ef0ed71e0f42d515dd04d
|
8.3 kB | Download |
md5:0c84d4a8dafd1876b3f2eff3cd0bf329
|
1.3 MB | Download |
md5:15f0eb96a972473f8ff61e5f85b19f85
|
34.9 kB | Preview Download |
md5:a17f464df2afcc1b714b520dfead9b63
|
30.9 kB | Preview Download |
md5:d5eb47d34a830d33109bdce1c8c56bc7
|
2.5 kB | Download |