Japanese FAQ dataset for e-learning system

doi:10.5281/zenodo.2783642

Published April 25, 2019 | Version v5

Dataset Open

Japanese FAQ dataset for e-learning system

1. Tokyo Metropolitan University

This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

FAQ data (*.csv)
1. Answer2Category.csv: Categories of answers.
2. Answer2Tag.csv: Titles of answers.
3. Answers.csv: IDs for answers and texts of answers.
4. Categories.csv: Names of categories for answers.
5. Questions.csv: Texts of questions and their corresponding answer IDs.
6. Answers_english.csv: IDs for answers and texts of answers written in English.
7. Categories_english.csv: Names of categories for answers and their corresponding English names.
8. Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057

Files

Answer2Category.csv

Files (1.5 MB)

Name	Size	Download all
Answer2Category.csv md5:7b4a6df4d21cf0224afc59036e28489a	957 Bytes	Preview Download
Answer2Tag.csv md5:6bbf666f172e70554378dda505c7718a	2.0 kB	Preview Download
Answer2Tag_english.csv md5:619191d7606fd27a4c4b6a14a30421ec	3.7 kB	Preview Download
Answers.csv md5:cec025dc40c4fe318cd316c1d9672e3a	42.9 kB	Preview Download
Answers_english.csv md5:5dc299c7a8134b272265468af9b53219	37.3 kB	Preview Download
basic_stats_chvals.tsv md5:49241c18fa7fffd147fb192569c87def	525 Bytes	Download
Categories.csv md5:4d9931b795191c15e500c885ed7786c8	238 Bytes	Preview Download
Categories_english.csv md5:547e8367871003ce439dcfa13e2f92bb	403 Bytes	Preview Download
inner_class.tsv md5:d2039dea3640c31087d18e1a1f0d0d48	825 Bytes	Download
inner_class_inter_tag.tsv md5:a78b1ac8d33eb04d6fe87be773d178b7	78.6 kB	Download
inner_tag.tsv md5:741bbb73072c9c4229d3856add03fc26	7.2 kB	Download
inter_class.tsv md5:f45a29171b2ef0ed71e0f42d515dd04d	8.3 kB	Download
inter_class_inter_tag.tsv md5:0c84d4a8dafd1876b3f2eff3cd0bf329	1.3 MB	Download
Questions.csv md5:15f0eb96a972473f8ff61e5f85b19f85	34.9 kB	Preview Download
Questions_english.csv md5:a17f464df2afcc1b714b520dfead9b63	30.9 kB	Preview Download
README.html md5:d5eb47d34a830d33109bdce1c8c56bc7	2.5 kB	Download

	All versions	This version
Views	4,384	3,295
Downloads	4,082	3,563
Data volume	339.5 MB	291.0 MB

Japanese FAQ dataset for e-learning system

Creators

Description

Files

Answer2Category.csv

Files (1.5 MB)