There is a newer version of the record available.

Published June 3, 2019 | Version v2
Dataset Open

Past Causalities and Event Categories for Connecting Similar Past and Present Causalities

  • 1. The University of Tokyo
  • 2. Tokyo Metropolitan Universitytan

Description

This dataset includes past causalities and their categories to connect similar past and present causalities. We report how to use this dataset in the following papers.

Ryohei Ikejiri, Yasunobu Sumikawa: "Developing world history lessons to foster authentic social participation by searching for historical causation in relation to current issues dominating the news". Journal of Educational Research on Social Studies 84, 37–48 (2016). (in Japanese).

Yasunobu Sumikawa and Ryohei Ikejiri, "Mining Historical Social Issues", Intelligent Decision Technologies, Smart Innovation, IDT'15, Systems and Technologies, Vol. 39, Springer, pp. 587--597, 2015.

This dataset is based on some textbooks that are popular ones in Japanese high-school. We first collect past causalities by referencing the textbooks. We then select the causalities if they can be useful for considering solutions for present social issues. To enhance the analogy, we describe each causality in three kinds of texts: background including problems, solution ways, and their results. From the selected causalities and an Encyclopedia of Historiography, we define categories for them. Finally, the created dataset contains 138 past causalities and 13 categories. Each past causality has more than one categories.

To help training machine learning models, this dataset additionally provides 900 past event data in past_events_wikipedia.tsv. The event data were collected from Wikipedia, and then were assigned one or more categories from the above 13 ones. We have confirmed that SVM-RBF equipped with the above all categorized data obtained 73.6% precision, 55.8% recall and 63.5% F1 score

 

File contents:

  • Past causality data
    1. historical_causalities_data.tsv: Detail of stored causalities.
    2. historical_causalities_regions.tsv: Regions where the causalities happened.
    3. historical_causalities_categories.tsv: Categories of the causalities.
  • Past event data
    1. past_events_wikipedia.tsv: Descriptions of past events stored in Wikipedia. This file is useful for training machine learning model such as SVM.
  • Statistics (Statistics.tsv)

     Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+JS divergence, and Meta-data Similarity that counts how many common categories two causalities share in order to measure qualities of the dataset.

Grants: JSPS KAKENHI Grant Number 26750076, 17K12792, and 19K20631

Files

Files (692.7 kB)

Name Size Download all
md5:7a68f67e64ab4e3167f60a9539d615e1
4.7 kB Download
md5:0d732388c68a61a975e5b25c354b548b
67.3 kB Download
md5:e4879578c66bcf042d906fd72ee8e50d
1.4 kB Download
md5:39b0071bc483de0ae5480decf68c8410
607.1 kB Download
md5:480fbb5fd26f520f29c4c5c8275f0c7d
12.1 kB Download