Dataset Open Access

Past Causalities and Event Categories for Connecting Similar Past and Present Causalities

Ikejiri Ryohei; Sumikawa Yasunobu

This dataset includes past causalities and their categories to connect similar past and present causalities. We report how to use this dataset in the following papers.

Ryohei Ikejiri, Yasunobu Sumikawa: "Developing world history lessons to foster authentic social participation by searching for historical causation in relation to current issues dominating the news". Journal of Educational Research on Social Studies 84, 37–48 (2016). (in Japanese).

Yasunobu Sumikawa and Ryohei Ikejiri, "Mining Historical Social Issues", Intelligent Decision Technologies, Smart Innovation, IDT'15, Systems and Technologies, Vol. 39, Springer, pp. 587--597, 2015.

This dataset is based on some textbooks that are popular ones in Japanese high-school. We first collect past causalities by referencing the textbooks. We then select the causalities if they can be useful for considering solutions for present social issues. To enhance the analogy, we describe each causality in three kinds of texts: background including problems, solution ways, and their results. From the selected causalities and an Encyclopedia of Historiography, we define categories for them. Finally, the created dataset contains 138 past causalities and 13 categories. Each past causality has more than one categories.

To help training machine learning models, this dataset additionally provides 900 past event data in past_events_wikipedia.tsv. The event data were collected from Wikipedia, and then were assigned one or more categories from the above 13 ones. We have confirmed that SVM-RBF equipped with the above all categorized data obtained 73.6% precision, 55.8% recall and 63.5% F1 score

 

File contents:

  • Past causality data
    1. historical_causalities_data.tsv: Detail of stored causalities.
    2. historical_causalities_regions.tsv: Regions where the causalities happened.
    3. historical_causalities_categories.tsv: Categories of the causalities.
  • Past event data
    1. past_events_wikipedia.tsv: Descriptions of past events stored in Wikipedia. This file is useful for training machine learning model such as SVM.
  • Statistics (Statistics.tsv)

     Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+JS divergence, and Meta-data Similarity that counts how many common categories two causalities share in order to measure qualities of the dataset.

Grants: JSPS KAKENHI Grant Number 26750076, 17K12792, and 19K20631

Files (693.2 kB)
Name Size
causality_regional_distribution.tsv
md5:75b039ceb7b2fe288868e927190c822f
381 Bytes Download
causality_temporal_distribution.tsv
md5:47e20c1c1d90dccd744d0764e03e29a8
144 Bytes Download
historical_causalities_categories.tsv
md5:7a68f67e64ab4e3167f60a9539d615e1
4.7 kB Download
historical_causalities_data.tsv
md5:0d732388c68a61a975e5b25c354b548b
67.3 kB Download
historical_causalities_regions.tsv
md5:e4879578c66bcf042d906fd72ee8e50d
1.4 kB Download
past_events_wikipedia.tsv
md5:39b0071bc483de0ae5480decf68c8410
607.1 kB Download
statistics_all_data.tsv
md5:a295a1f00ddb86b03e821b441f5f72b8
12.1 kB Download
81
29
views
downloads
All versions This version
Views 8122
Downloads 293
Data volume 660.6 kB2.0 kB
Unique views 6820
Unique downloads 102

Share

Cite as