Published December 26, 2023
| Version v3
Dataset
Open
Representation of crowd accidents in popular media
- 1. The University of Tokyo
- 2. Eindhoven University of Technology
- 3. The University of New South Wales
Description
This repository contains results related to the analysis of a corpus of news reports covering the topic of crowd accidents. To facilitate online visualization and offline analysis, the files are organized by assigning a number to each. The number system and the details of each set of files are described as follows:
- Class 0 – This contains the same files provided in this repository, but they are organized into folders to make analysis easier. If you intend to analyze the data from our lexical analysis, we suggest using this file since it is better organized and can be directly downloaded.
- Class 1 – This contains the sources and relevant information for people who are interested in replicating our dataset or accessing the news reports used in our analysis. Please note that due to copyright regulations, the texts cannot be shared. However, you can refer to the links provided in these files to access the news articles and Wikipedia pages. Some links have stopped working during the time we were working on this study, and others may be unreachable in the future.
- Class 2 – This contains the results from a lexical analysis of the corpus. The HTML page allows you to visualize each result interactively through the online VOSviewer app (you need to download the file and open it using a browser since Zenodo does not recognize this as a link). It is possible that this service (VOSviewer app) may be discontinued at some point in the future. PNG images of lexical maps are, therefore, available for download through the ZIP archive, although they do not allow interactive access. If you plan to read our results using the offline VOSviewer software or perform a more systematic analysis, JSON files are available for each category (time period, geographical area of the reporting institution, and purpose of gathering). The same files can be also find in the ZIP archive in class 0.
- Class 3 – These are the results of the sentiment analysis. For each report, a single result is generated for the title. However, for the body, the text is divided into parts, which are analyzed independently.
- Class 4 – These two files contains the corpus of Wikipedia relative to 68 crowd accidents which occurred between 1990 and 2019. The text for all accidents were scraped on October 15th, 2022 (before the tragedy in Itaewon) and on May 25th, 2023 (after the tragedy). Sources relative to the content in Wikipedia are listed in the file contained in Class 1 ("1_list_wiki_report.csv"). More generally, accidents listed on dedicated Wikipedia pages on https://en.wikipedia.org/wiki/List_of_fatal_crowd_crushes are reported in the corpus provided here (the period 1900-2019 is considered here).
The format of CSV and JSON files should be self-explanatory after reading our publication. For specific questions or queries, please contact one of the authors, and we will try to assist you.
Files
0_data_all.zip
Files
(297.6 MB)
Name | Size | Download all |
---|---|---|
md5:30ea10d2f06e93d51c27195011e8e6ab
|
2.3 MB | Preview Download |
md5:19612684182639c75c52bb7e8b606071
|
63.4 kB | Preview Download |
md5:5341c565fd23b6f4ec354e386d055a1e
|
7.0 kB | Preview Download |
md5:0b5a52d1705ae0e99ad6c3238a8b9f69
|
975 Bytes | Preview Download |
md5:799abf5b04ffd2a2546937c87c64d349
|
9.0 kB | Download |
md5:e1953d39b285190b8ff2343e1d23f051
|
2.0 MB | Preview Download |
md5:f639f8662ce7dc3a923ae00e0451d198
|
292.3 MB | Preview Download |
md5:b566e958369101f15b58bef84bac7212
|
129.1 kB | Preview Download |
md5:910e0950babb0dc93ee74c7670eed93d
|
10.6 kB | Preview Download |
md5:63236e37c7484b7f4d2ff4a7b0be99ad
|
396.4 kB | Preview Download |
md5:1ac05dc96ea6dab0f2328158809540d6
|
395.2 kB | Preview Download |
Additional details
Related works
- Is published in
- Journal article: 10.1016/j.ssci.2024.106423 (DOI)