LLM-GeoDis
Authors/Creators
Description
This repository contains the code and datasets used to produce the LLM-GeoDis dataset, a global database of subnationally geocoded disaster events from the EM-DAT International Disaster Database. The workflow uses a large language model (GPT-4o) to extract and standardize textual disaster location descriptions and match them to administrative units using GADM, OpenStreetMap, and Wikidata.
The resulting dataset provides subnational geocoding for global disaster events recorded in EM-DAT (2000–2024). Each record has been automatically processed to extract location entities and link them to administrative units. The dataset contains 14,215 disaster events across 17,948 unique locations, each associated with GADM administrative levels 1–2. It includes point geometries from Wikidata and OSM as well as harmonized GADM geometries to ensure consistent spatial coverage. Due to its size (~30 GB), the full LLM-GeoDis database is distributed across five compressed files.
This Zenodo record contains the following files:
-
LLMGeoDis_part1–5.zip – compressed parts of the main LLM-GeoDis dataset containing the geoparsed and geocoded disaster locations.
-
geoemdat_gaul.gpkg – GeoPackage containing EM-DAT events mapped to FAO GAUL administrative boundaries.
-
pend-gdis-1960-2018-disasterlocations.csv – intermediate dataset of disaster location strings extracted from EM-DAT during preprocessing.
-
reliability_db.csv – annotations used to evaluate geoparsing reliability and agreement between sources.
-
input_emdat.csv – EM-DAT input dataset used for the geoparsing workflow.
-
241204_emdat_archive.xlsx – archived version of EM-DAT used for validation and reproducibility.
-
gdis_disnos.csv – mapping between EM-DAT event identifiers and events in the GDIS disaster dataset used for benchmarking.
-
synthetic_EMDAT_locations.csv – synthetic location examples used during development and testing.
-
emdat_geocoding-zenodo.zip – archive of the full code repository used to reproduce the geoparsing, geocoding, and validation workflows.
- Instructions_LLM-GeoDis.pdf - instructions to run the full code and reproduce the analysis in the manuscript.
External datasets required to fully reproduce the workflow include GADM 4.1 administrative boundaries and the full GDIS dataset, which must be downloaded separately from their respective sources.
Files
input_emdat.csv
Files
(14.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e78b986f8234a48de376b8026017d1dc
|
6.7 MB | Download |
|
md5:6d177d89cc6ceacf790068213974b5b7
|
29.2 MB | Preview Download |
|
md5:41e6a666c7e26323ae6853ef753f960d
|
135.3 kB | Preview Download |
|
md5:79b9e864059c19e6bf76b084a4aba9ee
|
6.1 GB | Download |
|
md5:97384f5817103569d2b1b18de5be4ea4
|
2.8 MB | Preview Download |
|
md5:2e8cb9f813877cd10dcbaa326b9ed01f
|
275.2 kB | Preview Download |
|
md5:e0b90fa7ffff0ca9a4ef102118db8056
|
1.9 GB | Preview Download |
|
md5:e8137895a92159c6d9b19f21a4559709
|
1.9 GB | Preview Download |
|
md5:817e75737a4834b782568e6a079e3dc5
|
1.8 GB | Preview Download |
|
md5:667f69347acd41261bf397cadd8f11a3
|
1.5 GB | Preview Download |
|
md5:be427a1290f6eea5eb88a7bf87b2141d
|
1.6 GB | Preview Download |
|
md5:c6a669466c815a0882deb5b4cc648acb
|
4.9 MB | Preview Download |
|
md5:fbb290425afbb64a3de62d8373fb839d
|
3.5 MB | Preview Download |
|
md5:391085a505a50c948b95ffdb257ab058
|
104.3 kB | Preview Download |