Code for "Machine Learning Predicts which Rivers, Streams, and Wetlands the Clean Water Act Regulates"
Creators
Description
This repository contains code used to produce the results in Grennhill, S., Hannah Druckenmiller, Sherrie Wang, David A. Keiser, Manuela Girotto, Jason K. Moore, Nobuhiro Yamaguchi, Alberto Todeschini, and Joseph S. Shapiro, "What Does the Clean Water Act Protect? Machine Learning About Regulation," 2023.
Hardware and software requirements
The code for this paper is written in Python, R, and Stata. We used conda environments for package management. See `environment/`.
A server with GPUs and multiple terabytes of storage is required for running the deep learning models. For example, we used various configurations of a Google Cloud n1-standard VMs with between 16 and 96 vCPUs, between 60 and 360 GBs of RAM, up to 4 NVIDIA T4 GPUS, and a 10 TB boot disk. On an n1-standard-96 machine with 4 T4 GPUs, it takes approximately 48 hours to fully train one of the deep learning models, and about one hour to predict on one grid (~80,000 points). Post-processing and figure creation can be run on a modern laptop.
Repository structure
This repository has five main subdirectories:
- `1_environment/`, containing specifications for conda environments used in this project,
- `2_data/`, containing code for preparing training and prediction data,
- `3_src/`, containing code for creating model architectures, preparing data, and training deep learning models,
- `4_dl_models/`, containing code for training and predicting deep learning models, and
- `5_analysis/`, containing code for analyzing deep learning model outputs, including producing all displays presented in the paper.
Each subdirectory has a readme file explaining the files and subdirectories within. Where appropriate, scripts and directories are numbered to reflect the order in which they should be run.
Data availability
All data we used is publicly available and can be accessed through Google Earth Engine or other channels. In addition, we have made the full set of processed inputs used for model training and a subset of the data used for prediction available on Dryad. Training data is here and prediction data is here. Note that you will need to download the data and modify file paths in order to be able to run the code in this repository.
Reuse
All authors of this paper are inventors on a patent pending, submitted by UC Berkeley, which covers WOTUS-ML. This software has a Creative Commons Attribution Non Commercial No Derivatives 4.0 International license, which freely allows use for research and other non-commercial purposes. Potential commercial users should contact the Office of Technology Licensing at UC Berkeley.
Copyright ©2023. The Regents of the University of California (Regents). All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for educational, research, and not-for-profit purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following two paragraphs appear in all copies, modifications, and distributions. Contact The Office of Technology Licensing, UC Berkeley, 2150 Shattuck Avenue, Suite 510, Berkeley, CA 94720-1620, (510) 643-7201, otl@berkeley.edu, http://ipira.berkeley.edu/industry-info for commercial licensing opportunities.
IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Files
WOTUS-ML-Zenodo.zip
Files
(13.2 MB)
Name | Size | Download all |
---|---|---|
md5:5557e793461572842e5acb3fdc38fc04
|
13.2 MB | Preview Download |
Additional details
Related works
- Cites
- Journal article: 10.1126/science.adi3794 (DOI)
- Dataset: 10.5061/dryad.m63xsj47s (DOI)
- Dataset: 10.5061/dryad.z34tmpgm7 (DOI)
Funding
- Google (United States)
- National Institutes of Health