Published November 13, 2025 | Version v1
Software Open

Oregon State University Superfund Research Center: Automated Python Superfund NPL Site Scraper

Description

The Superfund NPL Site Scraper automates the collection and standardization of data from U.S. EPA Superfund resources. Built in Python, the tool retrieves site-level information from EPA online tables, Microsoft Excel files, and individual site profile pages. It uses requests, BeautifulSoup, and pandas to parse structured and semi-structured content, extract cleanup milestones, and normalize outputs into consistent CSV schemas (e.g., site ID, site name, location, operational status, milestone history).
The scraper is fully configurable, enabling users to add or modify target data fields without restructuring the codebase. Designed for repeated use, it supports research tracking, program reporting, and integration with Google Sheets and other database systems.

Files

README.md

Files (12.6 kB)

Name Size Download all
md5:309b5029db59fb7c1ff6800fb753d005
3.4 kB Preview Download
md5:0b660262f2b61c640e4f0ce66e4927b3
126 Bytes Preview Download
md5:bcabff2cd42582df00cbb449a6e0f427
9.1 kB Download

Additional details

Funding

National Institutes of Health
Identification of Remediation Technologies and Conditions that Minimize Formation of Hazardous PAH Breakdown Products at Superfund Sites 5P42ES016465-15

Software

Repository URL
https://github.com/bartonmike/superfund-npl-scraper
Programming language
Python
Development Status
Active