Cloud Seeding Activities in the United States (2000–2025)
Description
This dataset contains 832 records of cloud seeding activities reported to NOAA in the United States between 2000 and 2025. Each record includes the following fields: original filename, project name, year, season, state, operator, seeding agent, apparatus used for deployment, stated purpose, target area, control area, start date, and end date.
The dataset was constructed using a multi-stage PDF-to-text extraction pipeline combining PyMuPDF, pytesseract, and Unstract's LLM Whisperer. Extracted text was parsed and structured using OpenAI’s o3 language model, which inferred the relevant data fields from semi-structured report content.
To assess data quality, we manually reviewed a random sample of 200 records (n = 200). The dataset achieved 98.38% average accuracy across all six fields.
Files
cloud_seeding_us_2000_2025.csv
Files
(222.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:dbe21a725418c739353e5f74a4b673f0
|
222.7 kB | Preview Download |
Additional details
Related works
- Is described by
- Data paper: arXiv:2505.01555 (arXiv)
Dates
- Updated
-
2025-08-06
Software
- Repository URL
- https://github.com/jdonohue44/NOAA-Weather-Modification-Forms-LLM-Extractor
- Programming language
- Python
- Development Status
- Active