Published September 30, 2022 | Version v1
Software Open

Code to Download and Harmonize Discrete Metals and Ancillary Data in Three Hydrologic Basins (Delaware River, Illinois River and Upper Colorado River)

  • 1. U.S. Geological Survey
  • 2. University of Wisconsin-Madison

Description

This code retrieves discrete surface water data from the Water Quality Portal (www.waterqualitydata.us/) and performs a series of data harmonization and cleaning steps using R version 4.1.0. There are five steps in the R code (each described below) organized into two different code repositories, metals-data-download and metals-data-cleanup. To run the code, refer to the detailed instructions contained in the associated README.md files, starting with metals-data-download. Note that there is a circular dependency between the two, so you should first setup both repositories locally and follow the README instructions carefully.

Detailed step descriptions: Step 1 (contained in metals-data-download > 1a_fetch_metals.R) downloads physical/chemical metadata for 12 metals (Al, As, Cd, Cr, Cu, Fe, Hg, Mn, Pb, Se, U, Zn) from five hydrologic units associated with three river basins (Delaware R., Illinois R. and Upper Colorado R.), retrieves additional site information for all the sampling locations that were returned from the previous metals data retrieval, and merges both data retrievals into a single data frame. Step 2 (contained in metals-data-cleanup > 2a_clean_harmonize.R) harmonizes the compiled data for multiple columns in the data frame. Newly created columns associated with this harmonization step have the word “ADDED” appended as a prefix to the column name. Step 3 (contained in metals-data-cleanup > 2b_clean_filter.R) performs filtering and removal of some of the rows/columns based on defined criteria and outputs the data into three separate files, organized by river basin. Step 4 (contained in metals-data-cleanup > 3_log.R) creates a log that identifies any values in the download that were not in the expected list and outputs a separate file identifying values were not expected in the current code, for potential review. Step 5 (contained in metals-data-download > 1a_fetch_ancillary.R & metals-data-cleanup > 2c_clean_match_ancillary.R) retrieves ancillary discrete surface water data for 18 different physical/chemical metadata parameters that were co-collected with the primary metals data. This fifth step also performs several data cleaning functions on the ancillary data, including:  removal of duplicate rows, deletion of multiple columns, removal of certain rows based on defined criteria, creation of new harmonized columns, and the elimination of any data outside of a ±1 hour window relative to the time metals data was collected on the same date. This fifth step also outputs the ancillary data into three separate files, organized by river basin. 

This provisional code release was used to create the metals and ancillary datasets published in the following U.S. Geological Survey (USGS) product:  

Marvin-DiPasquale, M.C., Sullivan, S.L., Platt, L.R.C., Gorsky, A., Agee, J.L., McCleskey, B.R., Kakouros, E., Walton-Day, K., Runkel, R. L., Morriss, M. C., Wakefield, B. F., and Bergamaschi, B.. 2022. Discrete Metals and Ancillary Data Used in the Development of Surrogate Models for Estimating Metals Concentration in Surface Water of Three Hydrologic Basins (Delaware River, Illinois River and Upper Colorado River): U.S. Geological Survey, data release, https://doi.org/10.5066/P9L06M3G.  

This work was completed as part of the USGS Proxies Project, an effort supported by the Water Mission Area (WMA) Water Quality Processes (WQP) program to develop estimation methods for PFAS, harmful algal blooms, and metals, at multiple spatial and temporal scales.

Files

metals-data-cleanup.zip

Files (102.4 kB)

Name Size Download all
md5:a8c9f84efca4e1cceca7ba8ba115039c
58.9 kB Preview Download
md5:2c5cf9e31aa3223036a8c8892d7d2fac
43.5 kB Preview Download

Additional details

Related works

Compiles
Dataset: 10.5066/P9L06M3G (DOI)