Published July 1, 2025 | Version v1.0.0
Dataset Open

A Synthetic Dataset for Predictive Risk Analysis and Path Optimization of Arbaeen Pilgrimage Crowds

Description

General Overview:
This dataset was synthetically generated to support research on crowd management, predictive risk analysis, and smart path optimization, specifically for the scenario of the Arbaeen pilgrimage in Iraq. The data simulates various environmental, behavioral, and geographical factors to create a realistic environment for testing and validating crowd analysis models. The primary goal of this dataset is to provide a basis for developing and evaluating intelligent systems that can enhance the safety and efficiency of crowd movements during large-scale pilgrimages. This dataset was specifically generated to support the findings of our research paper, titled "Predictive Risk Analysis for the Arbaeen Pilgrimage Crowds" which has been submitted for publication.

Methodology:
The dataset was created using a custom Python script that employs a hierarchical generation strategy:

  1. Base Network: Predefined, real-world routes (such as the traditional Baghdad-Karbala path) were established as a guaranteed core.

  2. Local Network: A dense, realistic local network was built by systematically connecting each geographical node (both real cities and synthetic points) to its nearest neighbors.

  3. Highway Network: A higher-level network was constructed by connecting only the major, real-world cities to each other, simulating main travel arteries.

  4. Data Attributes: For each area, attributes such as visitor count, pressure, weather, and the presence of barriers or events were generated based on a set of rules to simulate realistic conditions. The final Risk_Degree for each area is a calculated metric based on these attributes.

Dataset Contents:

The dataset is provided as a single, comprehensive CSV file: artificial generated dataset for crowd.csv.

This file is ready for direct use and contains all the necessary data used in our study, fully merged into one table. It consists of 5,000 unique records, where each record represents a connection between two areas ("from" and "to"). Each row includes the following detailed information for both the origin and destination points:

  • Route Information: The specific from_area and to_area for each path segment.

  • Area Attributes: Key metrics such as Visitors, Pressure, Speed, and environmental factors like Weather and Event.

  • Calculated Risk Metrics: The final Risk_Degree and Actual_Behavior classification for each area.

  • Geographical Coordinates: The Latitude and Longitude for each area.

Files

artificial generated dataset for crowd.csv

Files (721.0 kB)

Name Size Download all
md5:a0bf1f088cca1854ebb6633970f8bf16
721.0 kB Preview Download

Additional details

Software

Programming language
Python