Published February 10, 2026 | Version v1
Dataset Open

crisesStorylinesRAG

  • 1. ROR icon Joint Research Centre
  • 2. ROR icon European Commission
  • 3. ROR icon Directorate-General Joint Research Centre
  • 4. ROR icon UCLouvain

Description

This Zenodo record provides the data and code supporting the generation and validation of AI-derived disaster storylines and knowledge graphs from news articles.

The dataset and pipeline are designed to augment historical disaster records with fact-based narratives and structured representations extracted at scale from media reports, using large language models (LLMs) with retrieval-augmented generation (RAG).

The workflow uses disaster events from the EM-DAT database (2014–2024) as input to retrieve relevant news from the European Media Monitor (EMM). Retrieved articles are processed using LLMs provided by the GPT@JRC service, which generate coherent disaster storylines and corresponding knowledge graphs capturing hazards, impacts, drivers, and response actions.

The Zenodo archive includes:

  • input_emdat_1424.xlsx
    Subset of EM-DAT disaster events (2014–2024) used as input to the pipeline, providing event type, location, and temporal information for news retrieval.

  • DisasterStory.csv
    Output of the full pipeline, containing AI-generated disaster storylines and associated knowledge graph representations for events retrieved from EMM.

  • triplet_expert_val.xlsx
    A labeled validation dataset of 1,000 factual triplets randomly sampled from the generated knowledge graphs and annotated by six independent experts, indicating whether each relationship is supported by the corresponding storyline text.

  • survey.xlsx
    Contains the results of the knowledge graph evaluation performed by DRM experts, summarizing expert assessments and consensus metrics for the generated graphs.

  • Source code
    The complete pipeline for event selection, news retrieval, RAG-based storyline generation, knowledge graph construction, triplet extraction, and quantitative validation. Available as a Zenodo snapshot for reproducibility. Additional access points include:

This resource enables quantitative evaluation of factual consistency and inter-annotator agreement for AI-generated disaster knowledge representations and provides a reusable framework applicable to other event catalogs beyond EM-DAT.

Files

DisasterStory.csv

Files (11.4 MB)

Name Size Download all
md5:c4f002d16128b9b03c3ca7c51e275876
3.4 MB Preview Download
md5:bd47fdf75d7c7264c8aabea097dbc530
5.2 MB Preview Download
md5:925b7c561ce197baf38dfb6964067e15
2.3 MB Download
md5:f7a4a5dbeb16c4c05adabeb42728f181
10.2 kB Download
md5:1383ea7d17ac0dbfce08b80236601432
430.9 kB Download