crisesStorylinesRAG
Authors/Creators
Description
This Zenodo record provides the data and code supporting the generation and validation of AI-derived disaster storylines and knowledge graphs from news articles.
The dataset and pipeline are designed to augment historical disaster records with fact-based narratives and structured representations extracted at scale from media reports, using large language models (LLMs) with retrieval-augmented generation (RAG).
The workflow uses disaster events from the EM-DAT database (2014–2024) as input to retrieve relevant news from the European Media Monitor (EMM). Retrieved articles are processed using LLMs provided by the GPT@JRC service, which generate coherent disaster storylines and corresponding knowledge graphs capturing hazards, impacts, drivers, and response actions.
The Zenodo archive includes:
-
input_emdat_1424.xlsx
Subset of EM-DAT disaster events (2014–2024) used as input to the pipeline, providing event type, location, and temporal information for news retrieval. -
DisasterStory.csv
Output of the full pipeline, containing AI-generated disaster storylines and associated knowledge graph representations for events retrieved from EMM. -
triplet_expert_val.xlsx
A labeled validation dataset of 1,000 factual triplets randomly sampled from the generated knowledge graphs and annotated by six independent experts, indicating whether each relationship is supported by the corresponding storyline text. -
survey.xlsx
Contains the results of the knowledge graph evaluation performed by DRM experts, summarizing expert assessments and consensus metrics for the generated graphs. -
Source code
The complete pipeline for event selection, news retrieval, RAG-based storyline generation, knowledge graph construction, triplet extraction, and quantitative validation. Available as a Zenodo snapshot for reproducibility. Additional access points include:-
Hugging Face Space: https://huggingface.co/spaces/roncmic/crisesStorylinesRAG
This resource enables quantitative evaluation of factual consistency and inter-annotator agreement for AI-generated disaster knowledge representations and provides a reusable framework applicable to other event catalogs beyond EM-DAT.
Files
DisasterStory.csv
Files
(11.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c4f002d16128b9b03c3ca7c51e275876
|
3.4 MB | Preview Download |
|
md5:bd47fdf75d7c7264c8aabea097dbc530
|
5.2 MB | Preview Download |
|
md5:925b7c561ce197baf38dfb6964067e15
|
2.3 MB | Download |
|
md5:f7a4a5dbeb16c4c05adabeb42728f181
|
10.2 kB | Download |
|
md5:1383ea7d17ac0dbfce08b80236601432
|
430.9 kB | Download |