This package provides access to water quality and hydrological data from the dbhydro database provided by the South Florida Water Management District. Although the package has a regional focus, the database could be used to address more general ecological or environmental management questions. Having a package to more easily acquire the database will no doubt facilitate use and reach of the data.
The package is lightweight and is built around two main functions. The documentation and examples could be improved in some cases (see below), but overall I do not have any major concerns about package structure or functionality. However, access to metadata for the database is a serious limitation that needs to be addressed. Sure I could go online to find what I need but it seems like the package should facilitate that process in some way. The package does a good job retrieving data if the station id and test name are known but it would be much more helpful if there was some way to query the available information. Here are some options to consider:
** I believe that wild-cards only work with station-id(s). Are you thinking that test-names would also be exposed to wildcard selection?**
I added a link to the available test-name listing in the getwq
docs. I think there may be too many stations to include a managable listing in the vignette but see response to the next comment.
Thank you for the suggestion. I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.
https://github.com/SFWMD/dbhydroR/commit/11c57f3a4f31b3804244ab269926798713f8aaa6
cleanhydro(gethydro(dbkey = “15081”, date_min = “2013-01-01”, date_max = “2013-02-02”))
Error: is.response(x) is not TRUE
I ran the check to include examples in \dontrun tags, as for the above. It makes sense to not run the example if it’s supposed to fail but there is no description exaplaining what the example shows. Either fix the example above or provide some explanation as to why it fails. All other examples ran without issues.
The example has been fixed and now runs without error
https://github.com/SFWMD/dbhydroR/commit/ca676ef38c598ac6038dcd4217df15745ef67260 https://github.com/SFWMD/dbhydroR/commit/86436e321bce70501a1aa04e5f5e0a3b890b3940
Great suggestion! More detail has been added to the docs for getwq and gethydro.
https://github.com/SFWMD/dbhydroR/commit/8b67b622a96df66f4bab968f0cf17787de7792ce
This has been added to the README and vignette.
https://github.com/SFWMD/dbhydroR/commit/6a277de115df38dd4ff214f2f2b7e5a81ccfb272 https://github.com/SFWMD/dbhydroR/commit/7a7d4142192d62a55d6757dd5510889b21055f28
Links to the equivalent ArcGIS Online station map have been added to the vignette and README, and function docs. Hopefully this will make it more generally accessible.
https://github.com/SFWMD/dbhydroR/commit/11c57f3a4f31b3804244ab269926798713f8aaa6 https://github.com/SFWMD/dbhydroR/commit/3750b847e029373a2c8a2d8f5d9d3863e3a2bed0 https://github.com/SFWMD/dbhydroR/commit/e66be8f4773e50f7f29505895e785fa80c42de54
I think it is best to have these functions exported. I envision a use-case where dplyr/SQL-like data would be preferable to the default format. Exposing these functions would allow for the ability to switch to non-dplyr/SQL-like data without re-pulling the data. Also, I added more detail to the clean* docs and refactored cleanhydro
to be more consistent with cleanwq
https://github.com/SFWMD/dbhydroR/commit/86436e321bce70501a1aa04e5f5e0a3b890b3940
Hmm, possibly you are referring to the “QA blanks” discussion? This refers not to missing data but to analytical samples run to check for equipment/process contamination. I have added text to the vignette to this effect.
https://github.com/SFWMD/dbhydroR/commit/5d4fe25aec7b4c0b5aaf4ecffc0832196d57ca9a
My comments above address most of the compliance concerns for rOpenSci. Below are some minor, additional points.
I guess the CRAN gatekeepers don’t care about this but I was having a mental block with getwq given it’s similarity to getwd. You might consider changing the function name to make it more distinct.
Also consider adding the code of conduct badge with devtools::use_code_of_conduct.
A COC has been added.
https://github.com/SFWMD/dbhydroR/commit/0758c0cc3763f22cfcec2090ca3ed784699e870c
Hopefully these comments are helpful. Let me know if you have any questions.
This package provides scriptable access to an extensive Florida water quality/quantity database, DBHYDRO. It’s also noticeably faster than the browser-based interface. dbhydroR will enhance the usability of the database and the reproducibility of analyses of DBHYDRO data.
I agree with @fawda123 that the largest room for improvement is in the documentation. I have also made suggestions below for increasing the flexibility of the data-cleaning functions to serve several common use cases, and I’ve pointed out a few edge cases where the package’s functionality might surprise a user.
As @fawda123 noted, it can be hard to identify the station IDs and tests you want if you don’t know what the options are. There are a few ways you could helper users:
Great suggestion. I ported these links and added links to the ArcGIS Online Station Map.
https://github.com/SFWMD/dbhydroR/commit/8b67b622a96df66f4bab968f0cf17787de7792ce
I am working on wrapping the station-search utility at http://my.sfwmd.gov/dbhydroplsql/water_quality_data.show_group_station_characters. For now, I tried to make map-based selection via the ArcGIS online or Google Earth interface more prominent in the vignette and README.
Questions that arose on my first encounter with the package/dataset:
gethydro
docs.
https://github.com/SFWMD/dbhydroR/commit/8b67b622a96df66f4bab968f0cf17787de7792ce
A notice about citation()
has been added to the README
https://github.com/SFWMD/dbhydroR/commit/bbf0246398ef1446feca5be4fde0e98126e39b7b
Other documentation thoughts
I expanded the docs for the get* and clean* functions to describe what goes on in the respective functions. I also refactored cleanhydro
to be more consistent with cleanwq
(they both operate on data.frame
objects now).
https://github.com/SFWMD/dbhydroR/commit/8b67b622a96df66f4bab968f0cf17787de7792ce https://github.com/SFWMD/dbhydroR/commit/86436e321bce70501a1aa04e5f5e0a3b890b3940
The contributing guidelines of the other rOpenSci repos appear to be fairly specific to rOpenSci. I wonder if I should hold off on this until we are closer to acceptance?
The repo name has been switched to match the package name.
devtools::run_examples(run=FALSE)
…
cleanhydro(gethydro(dbkey = “15081”, date_min = “2013-01-01”, date_max = “2013-02-02”))
Error: is.response(x) is not TRUE
The example has been fixed and now runs without error
https://github.com/SFWMD/dbhydroR/commit/ca676ef38c598ac6038dcd4217df15745ef67260 https://github.com/SFWMD/dbhydroR/commit/86436e321bce70501a1aa04e5f5e0a3b890b3940
These lines have been removed.
https://github.com/SFWMD/dbhydroR/commit/4cc1b2d14357502d7f248a3e78558690e83cd357
These files have been removed.
https://github.com/SFWMD/dbhydroR/commit/088884276228f665e2b42d48a2ad318d71d8fea0 https://github.com/SFWMD/dbhydroR/commit/9186979e9ca2a5694d68bcdf863c326d98f304b6
getwq + gethydro
These functions have a clean, intuitive interface. A few thoughts:
I included some additional information about dbkey
in the gethydro
docs.
https://github.com/SFWMD/dbhydroR/commit/8b67b622a96df66f4bab968f0cf17787de7792ce
getdbkey(stationid = “C111%”, stat = ‘MEAN’, category = “WQ”)
Search results for ‘C111% WQ’
Dbkey Group Data Type Freq Recorder Start Date End Date
1 38097 C111WETL H2OT DA ???? 04-OCT-2003 10-OCT-2011
2 38104 C111WETL SALIN DA ???? 04-OCT-2003 10-OCT-2011
3 38100 C111WETL SCOND DA ???? 04-OCT-2003 10-OCT-2011
I realize that this is a bit confusing. The getwq
function pulls data from a seperate set of DBHYDRO tables than gethydro
. Only the gethydro
tables contain dbkeys. Then, confusingly, the gethydro
tables have a category called "WQ"
. The getwq
tables do not support selection by dbkey. The getwq
water quality data is generated from laboratory tests while the gethydro
"WQ"
data is generated from things like automated sondes.
getwq(“FLAB01”, “2014-09-14”, “2014-09-18”, “NITRATE+NITRITE-N”, raw = TRUE)$Value
[1] -0.01
getwq(“FLAB01”, “2014-09-14”, “2014-09-18”, “NITRATE+NITRITE-N”, raw = TRUE, mdl_handling=‘half’)$Value
[1] -0.01
I refactored getwq
and cleanwq
so that MDL handling occurs regardless of how raw
is set. Also, the “No data found” message should now appear regardless of how raw
is set.
https://github.com/SFWMD/dbhydroR/commit/8a629183c03003175918640937ed18666a56f5eb
getdbkey
Though not advertised in the README, the getdbkey function looks like a useful way to identify datasets to download. With that in mind, I have a few suggestions that are mostly illustrated by this example:
getdbkey(stationid = “C111%”, stat = ‘MEAN’, category = “WQ”, detail.level=“full”)
Search results for ‘C111% WQ’
Get Data Dbkey \nStation\n Group Data Type Freq Stat Recorder Agency
1 38097 C111WETLND C111WETL H2OT DA MEAN ???? USGS
2 38104 C111WETLND C111WETL SALIN DA MEAN ???? USGS
3 38100 C111WETLND C111WETL SCOND DA MEAN ???? USGS
Start Date End Date Strata County Op Num Latitude Longitude X Coord
1 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
2 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
3 04-OCT-2003 10-OCT-2011 0 DAD Â 251740 803112 814755.394
Y Coord Basin Struct Section Township Range
1 349549.418 C111_CO Â 15 59 38
2 349549.418 C111_CO Â 15 59 38
3 349549.418 C111_CO Â 15 59 38
I added some advertisement for getdbkey
to the README.
I removed the extra newlines from “Station” and formatted Latitude and Longitude as decimal degrees.
https://github.com/SFWMD/dbhydroR/commit/3df0ad17f4e1b4f78f765b5bd182b1b3d4d34446
I am not getting Â
printed in any of the columns. Maybe I fixed it as a side-effect of another commit? The following line prints a reasonable (non-blank) value of Struct
for me:
getdbkey(category = “SW”, stationid = “ALLIGAT”, param = “STG”, detail.level = “full”)
Good catch. No it doesn’t. This is an artifact from the web interface. I removed this column.
https://github.com/SFWMD/dbhydroR/commit/b68cc5ed0c3814455bd104c22ff828687031546a
Good suggestion. I made this change.
https://github.com/SFWMD/dbhydroR/commit/3df0ad17f4e1b4f78f765b5bd182b1b3d4d34446
# we’d expect >100 dbkeys in this query
nrow(getdbkey(category = “GW”, detail.level = “full”))
Search results for ‘NA GW’
[1] 100
# because we easily get >100 sites combined in these two queries:
dba <- getdbkey(stationid = ‘A%’, category = “GW”, detail.level = “full”)
Search results for ‘A% GW’
dbi <- getdbkey(stationid = ‘I%’, category = “GW”, detail.level = “full”)
Search results for ‘I% GW’
nrow(dba) + nrow(dbi)
[1] 134
I circumvented this limitation.
https://github.com/SFWMD/dbhydroR/commit/b68cc5ed0c3814455bd104c22ff828687031546a
cleanwq + cleanhydro
h <- gethydro(dbkey = “IY639”, date_min = “2009-01-30”, date_max = “2009-01-31”)
Warning messages:
1: In cleanhydro(res) : Column headings missing. Guessing…
2: In cleanhydro(res) : Instantaneous data detected
head(h$date)
[1] “2009-01-30 00:40:00 MST” “2009-01-30 00:55:00 MST” “2009-01-30 01:10:00 MST”
[4] “2009-01-30 01:25:00 MST” “2009-01-30 01:40:00 MST” “2009-01-30 01:55:00 MST”
I forced hydro and wq date/times to return in EST.
https://github.com/SFWMD/dbhydroR/commit/4d1dd1552c08568bc429959dd6b52898438d25b5 https://github.com/SFWMD/dbhydroR/commit/239c03857d721d363fc373396c0e3e442a920d59
This has been changed to a message rather tha a warning. The raw server response for instantaneous data returns no column headers. These are assumed, hence the warning/message. The column assignments seem pretty reliable. Maybe I can drop these messages.
https://github.com/SFWMD/dbhydroR/commit/2d9902d75d2ed2ec5ef90d7a0bda18b8b778525d
Maybe I have captured the spirit of this comment by standardizing the behavior of the clean* functions, and forcing mdl_handling to occur regardless of the value of raw
? Also, timestamps are parsed to the date
column regardless of the value of raw
.
I aded examples to the README and vignette.
https://github.com/SFWMD/dbhydroR/commit/cf5ebd7d134a5eaf9107f190aa931d8af25f1226
Signing off
This package is already a nice contribution - thanks for developing it. Let me know if you’d like to clarify/discuss any of the above points. Best, Alison (@aappling-usgs)