Accessing hydrological data using web APIs

with a demo of the rnrfa package

Claudia Vitolo
Scientist, European Center for Medium-range Weather Forecasts

https://github.com/hydrosoc/rhydro_EGU18/blob/master/rnrfa_ClaudiaVitolo.Rmd

Download

Outline



  • About me & announcements
  • UK National River Flow Archive and its API
    • example of open hydro-meteorological database that uses RESTful web services
    • exercises: API calls to retrieve NRFA data and metadata
  • The rnrfa package: R interface to the NRFA

About me & announcements


Scientist

  • @ECMWF (https://www.ecmwf.int/) developing products for high-impact weather.

  • ANYWHERE (http://anywhere-h2020.eu/) - EU funded H2020 project: employing cutting edge technologies to help first responders act quickly and efficiently in case natural hazards occur (e.g. wildfires, flood, extreme precipitations, droughts, etc.)

SC1.17 - Using R for natural hazard risk modelling, with applications to wildfire risk forecasting - Claudia Vitolo, Francesca Di Giuseppe, Julia Wagemann, Mark Parrington - Wed 11 Apr, 15:30–17:00 / Room 2.16 http://meetingorganizer.copernicus.org/EGU2018/session/28648




About me & announcements


R-Ladies Global co-founder & co-organiser of R-Ladies London

  • R-Ladies (https://rladies.org/):
  • R-Consortium top-level project
  • 90 chapters (cities) worldwide
  • ~19000 members
  • Want to become a member of R-Ladies global community and start a chapter in your city? Email info@rladies.org


About me & announcements

National River Flow Archive and its APIs


The UK National River Flow Archive (http://nrfa.ceh.ac.uk/) serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors.


There are currently data APIs under development that provide access to the following services:

  • metadata catalogue (JSON),

  • catalogue filters based on a geographical bounding-box,

  • catalogue filters based on metadata entries,

  • gauged daily data and catchment mean rainfall for about 400 stations (WaterML, the OGC standard used to describe hydrological time series).

  • Experimental services, based on the following Open Geospatial Consortium standards: Web Feature Service (WFS), Web Mapping Service (WMS), Sensor and Observation Service (SOS)

RESTful web services, APIs and data requests


Some data providers implement RESTful web services, and data requests are made via HTTP GET method.



Sintax of a typical HTTP GET data request: server_end_point/format/service?X=1&Y=2

Exercise #1


How do I get information on station "18019" from the NRFA catalogue?

server_end_point/format/service?X=1&Y=2

Exercise #1


How do I get information on station "18019" from the NRFA catalogue?

server_end_point/format/service?X=1&Y=2

http://nrfaapps.ceh.ac.uk/nrfa/json/stationSummary?db=nrfa_public&stn=18019

Exercise #1


How do I get information on station "18019" from the NRFA catalogue?

server_end_point/format/service?X=1&Y=2

http://nrfaapps.ceh.ac.uk/nrfa/json/stationSummary?db=nrfa_public&stn=18019

Exercise #2


How do I get the time series of daily flows for station "18019"?

server_end_point/format/service?X=1&Y=2&Z=3

Exercise #2


How do I get the time series of daily flows for station "18019"?

server_end_point/format/service?X=1&Y=2&Z=3

http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=18019&dt=gdf

Exercise #2


How do I get the time series of daily flows for station "18019"?

server_end_point/format/service?X=1&Y=2&Z=3

http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=18019&dt=gdf

Challenges


  • assemble data requests

server_end_point/format/service?X=1&Y=2&Z=3

Challenges


  • assemble data requests
  • parse server responses


Challenges


  • assemble data requests
  • parse server responses
  • make the process scalable (e.g. run multiple requests efficiently)

The rnrfa package


The name rnrfa stands for: r interface for the national river flow archive

The rnrfa package aims to achieve a simpler and more efficient access to data by providing wrapper functions to assemble HTTP GET requests and parse XML/JSON responses.


Claudia Vitolo, Matthew Fry, and Wouter Buytaert. rnrfa: An r package to retrieve, filter and visualize data from the uk national river flow archive. The R Journal, 8(2):102–116, 2016, url: https://journal.r-project.org/archive/2016-2/vitolo-fry-buytaert.pdf.

Installation


The stable version of the rnrfa package is available from CRAN:

# Install stable version from CRAN
install.packages("rnrfa")


The development version is available from GitHub via devtools. This is preferred option to test this demo.

# Install dev version
devtools::install_github("cvitolo/rnrfa")

Load the package


# Load the rnrfa package
library(rnrfa)
## 
## +----------------------------------------------------------------+
## |  If you wish to use NRFA data, please refer to the following   |
## |  Terms & Conditions:                                           |
## |  http://nrfa.ceh.ac.uk/costs-terms-and-conditions              |
## +----------------------------------------------------------------+

List of monitoring stations


The function catalogue(), used with no inputs, requests the full list of gauging stations.

# Retrieve information for all the stations in the catalogue:
allStations <- catalogue()

dim(allStations)
## [1] 1563   24

The output is a data object of type data.frame, containing 1563 records (total number of monitored gauging stations) and 24 columns (total number of metadata entries available).

List of monitoring stations


# Select columns: id, name, river, gridReference, catchmentArea
selectedInfo <- c(1, 3, 5, 9, 11)

head(allStations[, selectedInfo])
##     id                    name     river gridReference catchmentArea
## 1 1001         Wick at Tarroul      Wick      ND262549         161.9
## 2 2001  Helmsdale at Kilphedir Helmsdale      NC998181         551.4
## 3 2002    Brora at Bruachrobie     Brora      NC891039         434.4
## 4 3001           Shin at Lairg      Shin      NC581062         494.6
## 5 3002    Carron at Sgodachail    Carron      NH491921         241.1
## 6 3003 Oykel at Easter Turnaig     Oykel      NC403001         330.7

Station information (1)


The function catalogue() can be used to filter stations based on various criteria, listed below:

(1) id = Station identification number
(2) ma-station-id = Measuring Authority (local station number)
(3) name = Name of the station
(4) location = Area in which the station is located
(5) river = River catchment
(6) hydrometricArea = UK hydrometric area identification number
(7) operator = UK measuring authorities
(8) haName = Hydrometric Area name
(9) gridReference = OS Grid Reference number (10) stationType = Type of station (e.g. flume, weir, etc.)
(11) catchmentArea = Catchment area in (Km2)
(12) gdfStart = Year in which recordings started

Station information (2)


(13) gdfEnd = Year in which recordings ended
(14) farText = Information on the regime (e.g. natural, regulated, etc.)
(15) categories = various tags (e.g. FEH_POOLING, FEH_QMED)
(16) altitude = Altitude measured in metres above Ordnance Datum or, in Northern Ireland, Malin Head.
(17) sensitivity = Sensitivity index calculated as the percentage change in flow associated with a 10 mm increase in stage at the \(Q_{95}\) flow.
(18) benchmark2 = placeholder variable (?) currently all NAs
(19) maximum-gauging-stage = level in m
(20) maximum-gauging-stage-date-time = in the format dd/mm/yyyy
(21) maximum-gauging-flow = flow in m3/s
(22) maximum-gauging-flow-date-time = in the format dd/mm/yyyy
(23) lat = a numeric vector of latitude coordinates.
(24) lon = a numeric vector of longitude coordinates.

Filter catalogue: bounding box


# Define a bounding box:
bbox <- list(lonMin = -3.82, lonMax = -3.63, latMin = 52.43, latMax = 52.52)

# Filter stations based on bounding box
catalogue(bbox = bbox)[, selectedInfo]
##      id                         name     river gridReference catchmentArea
## 1 54022    Severn at Plynlimon flume    Severn      SN853872           8.7
## 2 54090 Tanllwyth at Tanllwyth Flume Tanllwyth      SN843876           0.9
## 3 54091       Severn at Hafren Flume    Severn      SN843878           3.6
## 4 54092           Hore at Hore Flume      Hore      SN846873           3.2
## 5 54097     Hore at Upper Hore flume      Hore      SN831869           1.6
## 6 55008            Wye at Cefn Brwyn       Wye      SN829838          10.6
## 7 55033             Wye at Gwy flume       Wye      SN824853           3.9
## 8 55034           Cyff at Cyff flume      Cyff      SN824842           3.1
## 9 55035           Iago at Iago flume      Iago      SN826854           1.1

Filter catalogue: minimum recorded years


# Filter based on minimum number of recording years
catalogue(minRec=30)

Filter catalogue: station id numbers


Generate a subset of the catalogue only containing the catchments to be used in this short course!

# Filter stations based on identification number
stations <- catalogue(columnName = "id",
                      columnValue = c(7001, 12001, 25006, 39001, 50002))

Filter catalogue: hydrometric area (haName)


# Filter stations belonging to a certain hydrometric area
catalogue(columnName = "haName", columnValue = "Wye (Hereford)")

Filter catalogue: combine multiple selection criteria


catalogue(bbox = bbox,
          columnName = "id", 
          columnValue = c(54022,54090,54091,54092,54097), 
          minRec = 35)

Conversions


NRFA stations are located based on the OS grid reference (column 10, "gridRef"). The rnrfa package allows convenient conversion to more standard coordinate systems. The function osg_parse() converts the string to easting and northing in the British/Irish National Grid coordinate system (EPSG code: 27700/29902) by default.

# Convert OS Grid reference to BNG
osg_parse(gridRefs = "SN853872")
## $easting
## [1] 285300
## 
## $northing
## [1] 287200

Conversions


To get coordinates in latitude and longitude (WSGS84 coordinate system, EPSG code: 4326) use the parameter CoordSystem = "WGS84".

# Convert BNG to WSGS84
osg_parse(gridRefs = "SN853872", CoordSystem = "WGS84")
## $lon
## [1] -3.689987
## 
## $lat
## [1] 52.47065

Conversions


osg_parse() also works with multiple references:

osg_parse(gridRefs = stations$gridReference)
## $easting
## [1] 282500 363400 403300 517700 249900
## 
## $northing
## [1] 833500 795600 512200 169800 118500

Get time series data


Stations id numbers can be used to retrieve time series data. These data are automatically converted from WaterML2 format to time series object of class zoo.

The National River Flow Archive serves two types of time series data:

  • Gauged Daily Flows, get data using the function gdf()

  • Catchment Mean Rainfall, get data using the function cmr()

Gauged Daily Flows, gdf()


This function accepts one input, the station id. Here is how to retrieve daily flows and metadata for the Findhorn at Shenachie (id = 7001) catchment.

# Fetch time series data and metadata from the waterml2 service
gdfdata7001 <- gdf(id = "7001")
gdfmeta7001 <- gdf(id = "7001", metadata = TRUE)$meta

Gauged Daily Flows, gdf()


plot(gdfdata7001, 
     main = paste("Daily flow data for", gdfmeta7001$stationName, "catchment"),
     xlab = "", 
     ylab = expression(m^3/s))

plot of chunk unnamed-chunk-13

Catchment Mean Rainfall, cmr()


This function accepts one input, the station id. Here is how to retrieve rainfall data for Findhorn at Shenachie (id = 7001) catchment.

# Fetch time series data from the waterml2 service
cmrdata7001 <- cmr(id = "7001")
cmrmeta7001 <- cmr(id = "7001", metadata = TRUE)$meta

Catchment Mean Rainfall, cmr()


plot(cmrdata7001, 
     main = paste("Monthly rainfall for", cmrmeta7001$stationName, "catchment"), 
     xlab = "",
     ylab = expression(mm))

plot of chunk unnamed-chunk-15

Get GDF and CMR data for the next parts of the course!

stations <- catalogue(columnName = "id",
                      columnValue = c(7001, 12001, 25006, 39001, 50002))

gdfdata7001 <- gdf(id = "7001"); cmrdata7001 <- cmr(id = "7001")
gdfdata12001 <- gdf(id = "12001"); cmrdata12001 <- cmr(id = "12001")
gdfdata25006 <- gdf(id = "25006"); cmrdata25006 <- cmr(id = "25006")
gdfdata39001 <- gdf(id = "39001"); cmrdata39001 <- cmr(id = "39001")
gdfdata50002 <- gdf(id = "50002"); cmrdata50002 <- cmr(id = "50002")


Thank You

For more information, contact me:

e-mailclaudia.vitolo@ecmwf.int
twitter@clavitolo