Published December 26, 2023 | Version v3
Dataset Open

Representation of crowd accidents in popular media

  • 1. The University of Tokyo
  • 2. Eindhoven University of Technology
  • 3. The University of New South Wales

Description

This repository contains results related to the analysis of a corpus of news reports covering the topic of crowd accidents. To facilitate online visualization and offline analysis, the files are organized by assigning a number to each. The number system and the details of each set of files are described as follows:

  • Class 0 – This contains the same files provided in this repository, but they are organized into folders to make analysis easier. If you intend to analyze the data from our lexical analysis, we suggest using this file since it is better organized and can be directly downloaded.
  • Class 1 – This contains the sources and relevant information for people who are interested in replicating our dataset or accessing the news reports used in our analysis. Please note that due to copyright regulations, the texts cannot be shared. However, you can refer to the links provided in these files to access the news articles and Wikipedia pages. Some links have stopped working during the time we were working on this study, and others may be unreachable in the future.
  • Class 2 – This contains the results from a lexical analysis of the corpus. The HTML page allows you to visualize each result interactively through the online VOSviewer app (you need to download the file and open it using a browser since Zenodo does not recognize this as a link). It is possible that this service (VOSviewer app) may be discontinued at some point in the future. PNG images of lexical maps are, therefore, available for download through the ZIP archive, although they do not allow interactive access. If you plan to read our results using the offline VOSviewer software or perform a more systematic analysis, JSON files are available for each category (time period, geographical area of the reporting institution, and purpose of gathering). The same files can be also find in the ZIP archive in class 0.
  • Class 3 – These are the results of the sentiment analysis. For each report, a single result is generated for the title. However, for the body, the text is divided into parts, which are analyzed independently.
  • Class 4 – These two files contains the corpus of Wikipedia relative to 68 crowd accidents which occurred between 1990 and 2019. The text for all accidents were scraped on October 15th, 2022 (before the tragedy in Itaewon) and on May 25th, 2023 (after the tragedy). Sources relative to the content in Wikipedia are listed in the file contained in Class 1 ("1_list_wiki_report.csv"). More generally, accidents listed on dedicated Wikipedia pages on https://en.wikipedia.org/wiki/List_of_fatal_crowd_crushes are reported in the corpus provided here (the period 1900-2019 is considered here).

The format of CSV and JSON files should be self-explanatory after reading our publication. For specific questions or queries, please contact one of the authors, and we will try to assist you.

Files

0_data_all.zip

Files (297.6 MB)

Name Size Download all
md5:30ea10d2f06e93d51c27195011e8e6ab
2.3 MB Preview Download
md5:19612684182639c75c52bb7e8b606071
63.4 kB Preview Download
md5:5341c565fd23b6f4ec354e386d055a1e
7.0 kB Preview Download
md5:0b5a52d1705ae0e99ad6c3238a8b9f69
975 Bytes Preview Download
md5:799abf5b04ffd2a2546937c87c64d349
9.0 kB Download
md5:e1953d39b285190b8ff2343e1d23f051
2.0 MB Preview Download
md5:f639f8662ce7dc3a923ae00e0451d198
292.3 MB Preview Download
md5:b566e958369101f15b58bef84bac7212
129.1 kB Preview Download
md5:910e0950babb0dc93ee74c7670eed93d
10.6 kB Preview Download
md5:63236e37c7484b7f4d2ff4a7b0be99ad
396.4 kB Preview Download
md5:1ac05dc96ea6dab0f2328158809540d6
395.2 kB Preview Download

Additional details

Related works

Is published in
Journal article: 10.1016/j.ssci.2024.106423 (DOI)