Published August 6, 2025 | Version v3
Dataset Open

Cloud Seeding Activities in the United States (2000–2025)

  • 1. ROR icon Columbia University

Contributors

Supervisor:

  • 1. ROR icon Columbia University

Description

This dataset contains 832 records of cloud seeding activities reported to NOAA in the United States between 2000 and 2025. Each record includes the following fields: original filename, project name, year, season, state, operator, seeding agent, apparatus used for deployment, stated purpose, target area, control area, start date, and end date.

The dataset was constructed using a multi-stage PDF-to-text extraction pipeline combining PyMuPDF, pytesseract, and Unstract's LLM Whisperer. Extracted text was parsed and structured using OpenAI’s o3 language model, which inferred the relevant data fields from semi-structured report content.

To assess data quality, we manually reviewed a random sample of 200 records (n = 200). The dataset achieved 98.38% average accuracy across all six fields.

Files

cloud_seeding_us_2000_2025.csv

Files (222.7 kB)

Name Size Download all
md5:dbe21a725418c739353e5f74a4b673f0
222.7 kB Preview Download

Additional details

Related works

Is described by
Data paper: arXiv:2505.01555 (arXiv)

Dates

Updated
2025-08-06

Software