Dataset Open Access

Past Causalities and Event Categories for Connecting Similar Past and Present Causalities

Ikejiri Ryohei; Sumikawa Yasunobu

This dataset includes past causalities and their categories to connect similar past and present causalities. We report how to use this dataset in the following papers.

Ryohei Ikejiri, Yasunobu Sumikawa: "Developing world history lessons to foster authentic social participation by searching for historical causation in relation to current issues dominating the news". Journal of Educational Research on Social Studies 84, 37–48 (2016). (in Japanese).

Yasunobu Sumikawa and Ryohei Ikejiri, "Mining Historical Social Issues", Intelligent Decision Technologies, Smart Innovation, IDT'15, Systems and Technologies, Vol. 39, Springer, pp. 587--597, 2015.

This dataset is based on some textbooks that are popular ones in Japanese high-school. We first collect past causalities by referencing the textbooks. We then select the causalities if they can be useful for considering solutions for present social issues. To enhance the analogy, we describe each causality in three kinds of texts: background including problems, solution ways, and their results. From the selected causalities and an Encyclopedia of Historiography, we define categories for them. Finally, the created dataset contains 138 past causalities and 13 categories. Each past causality has more than one categories.

To help training machine learning models, this dataset additionally provides 900 past event data in past_events_wikipedia.tsv. The event data were collected from Wikipedia, and then were assigned one or more categories from the above 13 ones. We have confirmed that SVM-RBF equipped with the above all categorized data obtained 73.6% precision, 55.8% recall and 63.5% F1 score


File contents:

  • Past causality data
    1. historical_causalities_data.tsv: Detail of stored causalities.
    2. historical_causalities_regions.tsv: Regions where the causalities happened.
    3. historical_causalities_categories.tsv: Categories of the causalities.
  • Past event data
    1. past_events_wikipedia.tsv: Descriptions of past events stored in Wikipedia. This file is useful for training machine learning model such as SVM.
  • Statistics (Statistics.tsv)

     Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+JS divergence, and Meta-data Similarity that counts how many common categories two causalities share in order to measure qualities of the dataset.

Grants: JSPS KAKENHI Grant Number 26750076, 17K12792, and 19K20631

Files (693.2 kB)
Name Size
381 Bytes Download
144 Bytes Download
4.7 kB Download
67.3 kB Download
1.4 kB Download
607.1 kB Download
12.1 kB Download
All versions This version
Views 8122
Downloads 293
Data volume 660.6 kB2.0 kB
Unique views 6820
Unique downloads 102


Cite as